ModelServe Runtime

The ModelServe runtime enables deploying ML models as services on Kubernetes. It registers multiple Function kinds for different model formats and supports the serve action for model deployment.

sklearnserve: Serve scikit-learn models
mlflowserve: Serve MLflow models
huggingfaceserve: Serve HuggingFace models
kubeai-text: Serve text generation models via KubeAI
kubeai-speech: Serve speech to text models via KubeAI

Prerequisites

Supported Python versions:

Python ≥ 3.9, < 3.13

Required packages:

digitalhub-runtime-modelserve

Install from PyPI:

pip install digitalhub-runtime-modelserve

Usage overview

To deploy ML models as services on the platform:

Prepare your trained model in the supported format.
Create a Function resource that references your model.
Call function.run() to deploy the model as a service.
Use the run's invoke() method to send inference requests.

Service responsiveness

It may take some time for the service to become ready. Use run.refresh() and inspect run.status. When ready, the status will include a service attribute.

run.refresh()
run.status

After the service is ready, call the inference endpoint with run.invoke(). By default the url is taken from the run object; override it with an explicit url parameter if needed.

Note

If you set model_name in the function spec and run remotely, pass model_name to invoke() so the runtime can target the model with the MLServer V2 endpoint ("http://{url-from-k8s}/v2/models/{model_name}/infer").

data = [[...]]  # some array
json = {
    "inputs": [
        {
            "name": "input-0",
            "shape": [x, y],
            "datatype": "FP32",
            "data": data
        }
    ]
}

run.invoke(json=json)

See how to for detailed instructions on deploying different types of models. See Examples for code samples.