ModelServe vllmserve-text Serve
The serve action deploys vLLM text generation models as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.
Overview
The vllmserve-text function kind specializes vLLM serving for text generation models.
Quick example
function = dh.new_function(
name="my-vllm-text-service",
kind="vllmserve-text",
url="hf://meta-llama/Meta-Llama-3-8B-Instruct",
)
run = function.run(
action="serve",
replicas=1,
)
Parameters
Function Parameters
Must be specified when creating the function.
| Name | Type | Description |
|---|---|---|
| project | str | Project name. Required only when creating from the library; otherwise MUST NOT be set. |
| name | str | Name that identifies the object. Required. |
| kind | str | Function kind. Must be vllmserve-text. Required. |
| uuid | str | Object ID in UUID4 format. |
| description | str | Description of the object. |
| labels | list[str] | List of labels. |
| embedded | bool | Whether the object should be embedded in the project. |
| model_name | str | Name of the model. |
| image | str | Docker image where to serve the model. |
| url | str | Model source URL. |
| adapters | list[dict] | List of adapters. |
Adapters
Adapters is a list of dictionaries with the following keys:
Task Parameters
Can only be specified when calling function.run().
| Name | Type | Description |
|---|---|---|
| action | str | Task action. Required. Must be serve |
| node_selector | list[dict] | Node selector. |
| volumes | list[dict] | List of volumes. |
| resources | dict | Resource limits/requests. |
| affinity | dict | Affinity configuration. |
| tolerations | list[dict] | Tolerations. |
| envs | list[dict] | Environment variables. |
| secrets | list[str] | List of secret names. |
| profile | str | Profile template. |
| replicas | int | Number of replicas. |
| service_type | str | Service type. |
| service_name | str | Service name. |
Run Parameters
Can only be specified when calling function.run().
| Name | Type | Description |
|---|---|---|
| url | str | URL of the vLLM service. |
| args | list[str] | Extra arguments passed to the vLLM server. |
| enable_telemetry | bool | Enable or disable telemetry. |
| use_cpu_image | bool | Use a CPU image for serving. |
| storage_space | str | Storage space for model artifacts. |
Entity methods
Run methods
Once the run is created, you can access its attributes and methods through the run object.
invoke
Invoke served model. The method defaults to "POST" if data or json is provided in kwargs, otherwise it defaults to "GET". The function returns a requests.Response object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
Method of the request (e.g., "GET", "POST"). |
'POST'
|
url
|
str
|
URL to invoke. If specified, it must start with the service URL (http:// or https:// prefixes are required and stripped before comparison). |
None
|
**kwargs
|
dict
|
Keyword arguments to pass to the request. |
{}
|
Returns:
| Type | Description |
|---|---|
Response
|
Response from the request. |