Speech to Text Serving Runtime
Speech to Text serving runtime (kubeai-speech) aims at supporing the possibility to expose the automated speech recognition functionality as OpenAI-compatible transcriptions API.
For this purpose the runtime that relies on KubeAI operator to expose model using the FasterWhisper engine. The serving is performed by KubeAI as in case of KubeAI Text runtime.
The specification of the KubeAI speech runtime amounts to defining
model URL (from S3 storage or from HuggingFace catalog, e.g., hf://Systran/faster-whisper-medium.en
)
- name of the model to expose
- optional base image for serving
The serve
action allows for deploying the model, and a set of extra properties may be configured, including
- inference server-specific arguments
- load balancing strategy and properties
- scaling configuration (min/max/default replicas, scale delays and request targets)
- Resource confguration (e.g., run profile), environments and secrets (e.g., reference to
HF_TOKEN
if needed for accessing Huggingface resources)
Management with SDK
Check the SDK runtime documentation for more information.