ModelServe vllmserve-text Serve

The serve action deploys vLLM text generation models as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.

Overview

The vllmserve-text function kind specializes vLLM serving for text generation models.

Quick example

function = dh.new_function(
    name="my-vllm-text-service",
    kind="vllmserve-text",
    url="hf://meta-llama/Meta-Llama-3-8B-Instruct",
)

run = function.run(
    action="serve",
    replicas=1,
)

Parameters

Function Parameters

Must be specified when creating the function.

Name	Type	Description
project	str	Project name. Required only when creating from the library; otherwise MUST NOT be set.
name	str	Name that identifies the object. Required.
kind	str	Function kind. Must be `vllmserve-text`. Required.
uuid	str	Object ID in UUID4 format.
description	str	Description of the object.
labels	list[str]	List of labels.
embedded	bool	Whether the object should be embedded in the project.
model_name	str	Name of the model.
image	str	Docker image where to serve the model.
url	str	Model source URL.
adapters	list[dict]	List of adapters.

Adapters

Adapters is a list of dictionaries with the following keys:

adapters = [{
    "name": "adapter-name",
    "url": "adapter-url"
}]

Task Parameters

Can only be specified when calling function.run().

Name	Type	Description
action	str	Task action. Required. Must be `serve`
node_selector	list[dict]	Node selector.
volumes	list[dict]	List of volumes.
resources	dict	Resource limits/requests.
affinity	dict	Affinity configuration.
tolerations	list[dict]	Tolerations.
envs	list[dict]	Environment variables.
secrets	list[str]	List of secret names.
profile	str	Profile template.
replicas	int	Number of replicas.
service_type	str	Service type.
service_name	str	Service name.

Run Parameters

Can only be specified when calling function.run().

Name	Type	Description
url	str	URL of the vLLM service.
args	list[str]	Extra arguments passed to the vLLM server.
enable_telemetry	bool	Enable or disable telemetry.
use_cpu_image	bool	Use a CPU image for serving.
storage_space	str	Storage space for model artifacts.

Entity methods

Run methods

Once the run is created, you can access its attributes and methods through the run object.

`invoke`

Invoke served model. The method defaults to "POST" if data or json is provided in kwargs, otherwise it defaults to "GET". The function returns a requests.Response object.

Parameters:

Name	Type	Description	Default
`method`	`str`	Method of the request (e.g., "GET", "POST").	`'POST'`
`url`	`str`	URL to invoke. If specified, it must start with the service URL (http:// or https:// prefixes are required and stripped before comparison).	`None`
`**kwargs`	`dict`	Keyword arguments to pass to the request.	`{}`

Returns:

Type	Description
`Response`	Response from the request.