ModelServe kubeai-text Serve

The serve action deploys text generation models via KubeAI as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.

Overview

The kubeai-text function kind supports deploying text generation and processing models via KubeAI. It supports various features including text generation, text embedding, and can work with different engines like Ollama, VLLM, etc.

Quick example

function = dh.new_function(
    name="my-kubeai-text-service",
    kind="kubeai-text",
    url="hf://microsoft/DialoGPT-medium",
    features=["TextGeneration"],
    engine="VLLM"
)

run = function.run(
    action="serve",
    replicas=1
)

Parameters

Function Parameters

Must be specified when creating the function.

Name	Type	Description
project	str	Project name. Required only when creating from the library; otherwise MUST NOT be set.
name	str	Name that identifies the object. Required.
kind	str	Function kind. Must be `kubeai-text`. Required.
uuid	str	Object ID in UUID4 format.
description	str	Description of the object.
labels	list[str]	List of labels.
embedded	bool	Whether the object should be embedded in the project.
model_name	str	Name of the model.
image	str	Docker image where to serve the model.
url	str	Model url. Required.
adapters	list[str]	Adapters.
features	list[str]	Features. Required.
engine	KubeaiEngine	Engine. Required.

Adapters

Adapters is a list of dictionaries with the following keys:

adapters = [{
    "name": "adapter-name",
    "url": "adapter-url"
}]

Features

Features is a list of strings. It accepts the following values:

TextGeneration
TextEmbedding
SpeechToText

Engine

The engine is a KubeaiEngine object that represents the engine to use for the function. The engine can be one of the following:

OLlama
VLLM
FasterWhisper
Infinity

Model URL

The model url must follow the pattern:

regexp = (
    r"^(store://([^/]+)/model/huggingface/.*)"
    + r"|"
    + r"^pvc?://.*$"
    + r"|"
    + r"^s3?://.*$"
    + r"|"
    + r"^ollama?://.*$"
    + r"|"
    + r"^hf?://.*$"
)

Task Parameters

Can only be specified when calling function.run().

Shared Parameters

Name	Type	Description
action	str	Task action. Required. Must be `serve`
node_selector	list[dict]	Node selector.
volumes	list[dict]	List of volumes.
resources	dict	Resource limits/requests.
affinity	dict	Affinity configuration.
tolerations	list[dict]	Tolerations.
envs	list[dict]	Environment variables.
secrets	list[str]	List of secret names.
profile	str	Profile template.
replicas	int	Number of replicas.
service_type	str	Service type.
service_name	str	Service name.

Run Parameters

Can only be specified when calling function.run().

Run Function Kind-Specific Parameters

KubeAI Text

Name	Type	Description
env	dict	Environment variables.
args	list[str]	Arguments.
cache_profile	str	Cache profile.
files	list[KubeaiFile]	Files.
scaling	Scaling	Scaling parameters.
processors	int	Number of processors.

Files

Files is a list of dict with the following keys:

files = [
    {
        "path": "file-path"
        "content": "file-content"
    }
]

Scaling

Scaling is a Scaling object that represents the scaling parameters for the run. Its structure is as follows:

scaling = {
    "replicas": int,
    "min_replicas": int,
    "max_replicas": int,
    "autoscaling_disabled": bool,
    "target_request": int,
    "scale_down_delay_seconds": int,
    "load_balancing": {
        "strategy": str,  # "LeastLoad" or "PrefixHash"
        "prefix_hash": {
            "mean_load_factor": int,
            "replication": int,
            "prefix_char_length": int
        }
    }
}

Entity methods

Run methods

Once the run is created, you can access its attributes and methods through the run object.

`invoke`

Invoke served model. By default it exposes infer v2 endpoint.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name of the model.	required
`method`	`str`	Method of the request.	`'POST'`
`url`	`str`	URL of the request.	`None`
`**kwargs`	`dict`	Keyword arguments to pass to the request.	`{}`

Returns:

Type	Description
`Response`	Response from the request.