ModelServe kubeai-text Serve
The serve action deploys text generation models via KubeAI as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.
Overview
The kubeai-text function kind supports deploying text generation and processing models via KubeAI. It supports various features including text generation, text embedding, and can work with different engines like Ollama, VLLM, etc.
Quick example
function = dh.new_function(
name="my-kubeai-text-service",
kind="kubeai-text",
url="hf://microsoft/DialoGPT-medium",
features=["TextGeneration"],
engine="VLLM"
)
run = function.run(
action="serve",
replicas=1
)
Parameters
Function Parameters
Must be specified when creating the function.
| Name | Type | Description |
|---|---|---|
| project | str | Project name. Required only when creating from the library; otherwise MUST NOT be set. |
| name | str | Name that identifies the object. Required. |
| kind | str | Function kind. Must be kubeai-text. Required. |
| uuid | str | Object ID in UUID4 format. |
| description | str | Description of the object. |
| labels | list[str] | List of labels. |
| embedded | bool | Whether the object should be embedded in the project. |
| model_name | str | Name of the model. |
| image | str | Docker image where to serve the model. |
| url | str | Model url. Required. |
| adapters | list[str] | Adapters. |
| features | list[str] | Features. Required. |
| engine | KubeaiEngine | Engine. Required. |
Adapters
Adapters is a list of dictionaries with the following keys:
Features
Features is a list of strings. It accepts the following values:
TextGenerationTextEmbeddingSpeechToText
Engine
The engine is a KubeaiEngine object that represents the engine to use for the function. The engine can be one of the following:
OLlamaVLLMFasterWhisperInfinity
Model URL
The model url must follow the pattern:
regexp = (
r"^(store://([^/]+)/model/huggingface/.*)"
+ r"|"
+ r"^pvc?://.*$"
+ r"|"
+ r"^s3?://.*$"
+ r"|"
+ r"^ollama?://.*$"
+ r"|"
+ r"^hf?://.*$"
)
Task Parameters
Can only be specified when calling function.run().
Shared Parameters
| Name | Type | Description |
|---|---|---|
| action | str | Task action. Required. Must be serve |
| node_selector | list[dict] | Node selector. |
| volumes | list[dict] | List of volumes. |
| resources | dict | Resource limits/requests. |
| affinity | dict | Affinity configuration. |
| tolerations | list[dict] | Tolerations. |
| envs | list[dict] | Environment variables. |
| secrets | list[str] | List of secret names. |
| profile | str | Profile template. |
| replicas | int | Number of replicas. |
| service_type | str | Service type. |
| service_name | str | Service name. |
Run Parameters
Can only be specified when calling function.run().
Run Function Kind-Specific Parameters
KubeAI Text
| Name | Type | Description |
|---|---|---|
| env | dict | Environment variables. |
| args | list[str] | Arguments. |
| cache_profile | str | Cache profile. |
| files | list[KubeaiFile] | Files. |
| scaling | Scaling | Scaling parameters. |
| processors | int | Number of processors. |
Files
Files is a list of dict with the following keys:
Scaling
Scaling is a Scaling object that represents the scaling parameters for the run. Its structure is as follows:
scaling = {
"replicas": int,
"min_replicas": int,
"max_replicas": int,
"autoscaling_disabled": bool,
"target_request": int,
"scale_down_delay_seconds": int,
"load_balancing": {
"strategy": str, # "LeastLoad" or "PrefixHash"
"prefix_hash": {
"mean_load_factor": int,
"replication": int,
"prefix_char_length": int
}
}
}
Entity methods
Run methods
Once the run is created, you can access its attributes and methods through the run object.
invoke
Invoke served model. By default it exposes infer v2 endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
Name of the model. |
required |
method
|
str
|
Method of the request. |
'POST'
|
url
|
str
|
URL of the request. |
None
|
**kwargs
|
dict
|
Keyword arguments to pass to the request. |
{}
|
Returns:
| Type | Description |
|---|---|
Response
|
Response from the request. |