ModelServe Runtime
The ModelServe runtime enables deploying ML models as services on Kubernetes. It registers multiple Function kinds for different model formats and supports the serve action for model deployment.
sklearnserve: Serve scikit-learn modelsmlflowserve: Serve MLflow modelshuggingfaceserve: Serve HuggingFace modelskubeai-text: Serve text generation models via KubeAIkubeai-speech: Serve speech to text models via KubeAI
Prerequisites
Supported Python versions:
- Python ≥ 3.9, < 3.13
Required packages:
digitalhub-runtime-modelserve
Install from PyPI:
Usage overview
To deploy ML models as services on the platform:
- Prepare your trained model in the supported format.
- Create a
Functionresource that references your model. - Call
function.run()to deploy the model as a service. - Use the run's
invoke()method to send inference requests.
Service responsiveness
It may take some time for the service to become ready. Use run.refresh() and inspect run.status. When ready, the status will include a service attribute.
After the service is ready, call the inference endpoint with run.invoke(). By default the url is taken from the run object; override it with an explicit url parameter if needed.
Note
If you set model_name in the function spec and run remotely, pass model_name to invoke() so the runtime can target the model with the MLServer V2 endpoint ("http://{url-from-k8s}/v2/models/{model_name}/infer").
data = [[...]] # some array
json = {
"inputs": [
{
"name": "input-0",
"shape": [x, y],
"datatype": "FP32",
"data": data
}
]
}
run.invoke(json=json)
See how to for detailed instructions on deploying different types of models. See Examples for code samples.