Skip to content

ModelServe vllmserve-polling Serve

The serve action deploys vLLM models with polling support as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.

Overview

The vllmserve-polling function kind uses the same parameters as vllmserve, with a polling-based serving flow.

Quick example

function = dh.new_function(
    name="my-vllm-polling-service",
    kind="vllmserve-polling",
    url="hf://mistralai/Mistral-7B-v0.1",
)

run = function.run(
    action="serve",
    replicas=1,
)

Parameters

Function Parameters

Must be specified when creating the function.

Name Type Description
project str Project name. Required only when creating from the library; otherwise MUST NOT be set.
name str Name that identifies the object. Required.
kind str Function kind. Must be vllmserve-polling. Required.
uuid str Object ID in UUID4 format.
description str Description of the object.
labels list[str] List of labels.
embedded bool Whether the object should be embedded in the project.
model_name str Name of the model.
image str Docker image where to serve the model.
url str Model source URL.
adapters list[dict] List of adapters.

Adapters

Adapters is a list of dictionaries with the following keys:

adapters = [{
    "name": "adapter-name",
    "url": "adapter-url"
}]

Task Parameters

Can only be specified when calling function.run().

Name Type Description
action str Task action. Required. Must be serve
node_selector list[dict] Node selector.
volumes list[dict] List of volumes.
resources dict Resource limits/requests.
affinity dict Affinity configuration.
tolerations list[dict] Tolerations.
envs list[dict] Environment variables.
secrets list[str] List of secret names.
profile str Profile template.
replicas int Number of replicas.
service_type str Service type.
service_name str Service name.

Run Parameters

Can only be specified when calling function.run().

Name Type Description
url str URL of the vLLM service.
args list[str] Extra arguments passed to the vLLM server.
enable_telemetry bool Enable or disable telemetry.
use_cpu_image bool Use a CPU image for serving.
storage_space str Storage space for model artifacts.

Entity methods

Run methods

Once the run is created, you can access its attributes and methods through the run object.

invoke

Invoke served model. The method defaults to "POST" if data or json is provided in kwargs, otherwise it defaults to "GET". The function returns a requests.Response object.

Parameters:

Name Type Description Default
method str

Method of the request (e.g., "GET", "POST").

'POST'
url str

URL to invoke. If specified, it must start with the service URL (http:// or https:// prefixes are required and stripped before comparison).

None
**kwargs dict

Keyword arguments to pass to the request.

{}

Returns:

Type Description
Response

Response from the request.