Task
The ModelServe runtime supports a serve
task action to deploy ML models on Kubernetes or locally. A Task
is created by calling run()
on the Function; task parameters are passed through that call and may vary by action.
Parameters (Shared)
Name | Type | Description |
---|---|---|
action | str | Task action. One of: serve . Required. |
node_selector | list[dict] | Node selector. |
volumes | list[dict] | List of volumes. |
resources | dict | Resource limits/requests. |
affinity | dict | Affinity configuration. |
tolerations | list[dict] | Tolerations. |
envs | list[dict] | Environment variables. |
secrets | list[str] | List of secret names. |
profile | str | Profile template. |
replicas | int | Number of replicas. |
service_type | str | Service type. |
Function Kind-Specific Parameters
HuggingFace Serve
Name | Type | Description |
---|---|---|
huggingface_task | str | Huggingface task type. |
backend | str | Backend type. |
tokenizer_revision | str | Tokenizer revision. |
max_length | int | Huggingface max sequence length for the tokenizer. |
disable_lower_case | bool | Do not use lower case for the tokenizer. |
disable_special_tokens | bool | The sequences will not be encoded with the special tokens relative to their model. |
dtype | str | Data type to load the weights in. |
trust_remote_code | bool | Allow loading of models and tokenizers with custom code. |
tensor_input_names | list[str] | The tensor input names passed to the model. |
return_token_type_ids | bool | Return token type ids. |
return_probabilities | bool | Return all probabilities. |
disable_log_requests | bool | Disable log requests. |
max_log_len | int | Max number of prompt characters or prompt. |
Task Actions
Supported actions:
serve
— deploy a service
HuggingFace Task
You can specify the task type for the Huggingface model. The task type must be one of the following:
sequence_classification
token_classification
fill_mask
text_generation
text2text_generation
text_embedding
Backend
You can specify the backend type for the Huggingface model. The backend type must be one of the following:
AUTO
VLLM
HUGGINGFACE
Dtype
You can specify the data type to load the weights in. The data type must be one of the following:
AUTO
FLOAT32
FLOAT16
BFLOAT16
FLOAT
HALF