Managing Speech-to-Text Models with KubeAI Runtime
To support the speech-to-text scenario within the platform it is possible to use the KubeAI runtime when the KubeAI operator is enabled.
To accomplish this, it is possible to use the KubeAI-supported runtime, namely FasterWhisper.
For details about the specification, see the corresponding section of Modelserve reference.
Exposing Speech-to-Text Models
To expose the speech-to-text model, it is possible to use Core UI or Python SDK. To define the corresponding function, the following parameters should be specified:
- model name
- model URL. Currently the model can be loaded either from HuggingFace (
hf://
prefix) or from S3 storage of the platform (s3://
)).
To serve the text speech-to-text model, the function should be run with the serve
action, specifying additional parameters. In particular, it may be necessary to specify the HW profile to use with number of processors or resource specification, and further parameters and arguments accepted by the KubeAI model specification:
args
: command-line arguments to pass to the engineenv
: custom environment values (key-value pair)secrets
: project secrets to pass the valuesfiles
: extra file specifications for the deploymentscaling
: scaling specification as of KubeAI documentationcaching_profile
: cache profile as of KubeAI documentation.
For example to deploy a model from HuggingFace, the following procedure may be used:
audio_function = project.new_function("audio",
kind="kubeai-speech",
model_name="audiomodel",
url="hf://Systran/faster-whisper-medium.en")
run = audio_function.run(action="serve")
Once deployed, the model is available and it is possible to call the OpenAI-compatible API from within the platform (/openai/v1/transcriptions
endpoint):
from openai import OpenAI
client = OpenAI(base_url=f"http://{KUBEAI_ENDPOINT}/openai/v1", api_key="ignore")
audio_file= open("kubeai.mp4", "rb")
transcription = client.audio.transcriptions.create(
model=f"audiomodel-123zxc",
file=audio_file
)
print(transcription.text)
By default, the KUBEAI_ENDPOINT
is kubeai
.
Model name
Please note how the model name is defined: it is composed of the name of the model as specified in the function and a random value: <model_name>-<run_id>
.
The name of the generated model as well as the endpoint information can be seen in the run specification (see openai
and service
section)