Skip to content

ModelServe kubeai-speech Serve

The serve action deploys speech processing models via KubeAI as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.

Overview

The kubeai-speech function kind supports deploying speech processing models via KubeAI. It supports speech-to-text functionality and can work with different engines for speech processing.

Quick example

function = dh.new_function(
    name="my-kubeai-speech-service",
    kind="kubeai-speech",
    url="hf://openai/whisper-tiny",
    adapters=[{"name": "whisper-adapter", "url": "hf://adapter-url"}]
)

run = function.run(
    action="serve",
    replicas=1
)

Parameters

Function Parameters

Must be specified when creating the function.

Name Type Description
project str Project name. Required only when creating from the library; otherwise MUST NOT be set.
name str Name that identifies the object. Required.
kind str Function kind. Must be kubeai-speech. Required.
uuid str Object ID in UUID4 format.
description str Description of the object.
labels list[str] List of labels.
embedded bool Whether the object should be embedded in the project.
model_name str Name of the model.
image str Docker image where to serve the model.
url str Model url. Required.
adapters list[str] Adapters.

Model URL

The model url must follow the pattern:

regexp = (
    r"^(store://([^/]+)/model/huggingface/.*)"
    + r"|"
    + r"^pvc?://.*$"
    + r"|"
    + r"^s3?://.*$"
    + r"|"
    + r"^ollama?://.*$"
    + r"|"
    + r"^hf?://.*$"
)

Adapters

Adapters is a list of dictionaries with the following keys:

adapters = [{
    "name": "adapter-name",
    "url": "adapter-url"
}]

Task Parameters

Can only be specified when calling function.run().

Shared Parameters

Name Type Description
action str Task action. Required. Must be serve
node_selector list[dict] Node selector.
volumes list[dict] List of volumes.
resources dict Resource limits/requests.
affinity dict Affinity configuration.
tolerations list[dict] Tolerations.
envs list[dict] Environment variables.
secrets list[str] List of secret names.
profile str Profile template.
replicas int Number of replicas.
service_type str Service type.
service_name str Service name.

Run Parameters

Can only be specified when calling function.run().

Run Function Kind-Specific Parameters

KubeAI Speech
Name Type Description
env dict Environment variables.
args list[str] Arguments.
cache_profile str Cache profile.
files list[KubeaiFile] Files.
scaling Scaling Scaling parameters.
processors int Number of processors.

Files

Files is a list of dict with the following keys:

files = [
    {
        "path": "file-path"
        "content": "file-content"
    }
]

Scaling

Scaling is a Scaling object that represents the scaling parameters for the run. Its structure is as follows:

scaling = {
    "replicas": int,
    "min_replicas": int,
    "max_replicas": int,
    "autoscaling_disabled": bool,
    "target_request": int,
    "scale_down_delay_seconds": int,
    "load_balancing": {
        "strategy": str,  # "LeastLoad" or "PrefixHash"
        "prefix_hash": {
            "mean_load_factor": int,
            "replication": int,
            "prefix_char_length": int
        }
    }
}

Entity methods

Run methods

Once the run is created, you can access its attributes and methods through the run object.

invoke

Invoke served model. By default it exposes infer v2 endpoint.

Parameters:

Name Type Description Default
model_name str

Name of the model.

required
method str

Method of the request.

'POST'
url str

URL of the request.

None
**kwargs dict

Keyword arguments to pass to the request.

{}

Returns:

Type Description
Response

Response from the request.