ModelServe kubeai-speech Serve

The serve action deploys speech processing models via KubeAI as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.

Overview

The kubeai-speech function kind supports deploying speech processing models via KubeAI. It supports speech-to-text functionality and can work with different engines for speech processing.

Quick example

function = dh.new_function(
    name="my-kubeai-speech-service",
    kind="kubeai-speech",
    url="hf://openai/whisper-tiny",
    adapters=[{"name": "whisper-adapter", "url": "hf://adapter-url"}]
)

run = function.run(
    action="serve",
    replicas=1
)

Parameters

Function Parameters

Must be specified when creating the function.

Name	Type	Description
project	str	Project name. Required only when creating from the library; otherwise MUST NOT be set.
name	str	Name that identifies the object. Required.
kind	str	Function kind. Must be `kubeai-speech`. Required.
uuid	str	Object ID in UUID4 format.
description	str	Description of the object.
labels	list[str]	List of labels.
embedded	bool	Whether the object should be embedded in the project.
model_name	str	Name of the model.
image	str	Docker image where to serve the model.
url	str	Model url. Required.
adapters	list[str]	Adapters.

Model URL

The model url must follow the pattern:

regexp = (
    r"^(store://([^/]+)/model/huggingface/.*)"
    + r"|"
    + r"^pvc?://.*$"
    + r"|"
    + r"^s3?://.*$"
    + r"|"
    + r"^ollama?://.*$"
    + r"|"
    + r"^hf?://.*$"
)

Adapters

Adapters is a list of dictionaries with the following keys:

adapters = [{
    "name": "adapter-name",
    "url": "adapter-url"
}]

Task Parameters

Can only be specified when calling function.run().

Shared Parameters

Name	Type	Description
action	str	Task action. Required. Must be `serve`
node_selector	list[dict]	Node selector.
volumes	list[dict]	List of volumes.
resources	dict	Resource limits/requests.
affinity	dict	Affinity configuration.
tolerations	list[dict]	Tolerations.
envs	list[dict]	Environment variables.
secrets	list[str]	List of secret names.
profile	str	Profile template.
replicas	int	Number of replicas.
service_type	str	Service type.
service_name	str	Service name.

Run Parameters

Can only be specified when calling function.run().

Run Function Kind-Specific Parameters

KubeAI Speech

Name	Type	Description
env	dict	Environment variables.
args	list[str]	Arguments.
cache_profile	str	Cache profile.
files	list[KubeaiFile]	Files.
scaling	Scaling	Scaling parameters.
processors	int	Number of processors.

Files

Files is a list of dict with the following keys:

files = [
    {
        "path": "file-path"
        "content": "file-content"
    }
]

Scaling

Scaling is a Scaling object that represents the scaling parameters for the run. Its structure is as follows:

scaling = {
    "replicas": int,
    "min_replicas": int,
    "max_replicas": int,
    "autoscaling_disabled": bool,
    "target_request": int,
    "scale_down_delay_seconds": int,
    "load_balancing": {
        "strategy": str,  # "LeastLoad" or "PrefixHash"
        "prefix_hash": {
            "mean_load_factor": int,
            "replication": int,
            "prefix_char_length": int
        }
    }
}

Entity methods

Run methods

Once the run is created, you can access its attributes and methods through the run object.

`invoke`

Invoke served model. By default it exposes infer v2 endpoint.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	Name of the model.	required
`method`	`str`	Method of the request.	`'POST'`
`url`	`str`	URL of the request.	`None`
`**kwargs`	`dict`	Keyword arguments to pass to the request.	`{}`

Returns:

Type	Description
`Response`	Response from the request.