ModelServe huggingfaceserve Serve
The serve action deploys HuggingFace ML models as services on Kubernetes. A Task is created by calling run() on the Function; task parameters are passed through that call.
Overview
The huggingfaceserve function kind supports deploying HuggingFace models as REST API services. It supports various model formats and tasks including text generation, classification, and embedding.
Quick example
function = dh.new_function(
name="my-huggingface-service",
kind="huggingfaceserve",
path="s3://my-bucket/path-to-model"
)
run = function.run(
action="serve",
replicas=1,
huggingface_task="text_generation"
)
Parameters
Function Parameters
Must be specified when creating the function.
| Name | Type | Description |
|---|---|---|
| project | str | Project name. Required only when creating from the library; otherwise MUST NOT be set. |
| name | str | Name that identifies the object. Required. |
| kind | str | Function kind. Must be huggingfaceserve. Required. |
| uuid | str | Object ID in UUID4 format. |
| description | str | Description of the object. |
| labels | list[str] | List of labels. |
| embedded | bool | Whether the object should be embedded in the project. |
| path | str | Path to the model files. Required. |
| model_name | str | Name of the model. |
| image | str | Docker image where to serve the model. |
Model Path
The model path must follow the pattern:
path_regex = (
r"^(store://([^/]+)/model/huggingface/.*)"
+ r"|"
+ r".*\\/$"
+ r"|"
+ r".*\\.zip$"
+ r"|"
+ r"^huggingface?://.*$"
+ r"|"
+ r"^hf?://.*$"
)
Model Image
Model image must follow the pattern:
Task Parameters
Can only be specified when calling function.run().
| Name | Type | Description |
|---|---|---|
| action | str | Task action. Required. Must be serve |
| node_selector | list[dict] | Node selector. |
| volumes | list[dict] | List of volumes. |
| resources | dict | Resource limits/requests. |
| affinity | dict | Affinity configuration. |
| tolerations | list[dict] | Tolerations. |
| envs | list[dict] | Environment variables. |
| secrets | list[str] | List of secret names. |
| profile | str | Profile template. |
| replicas | int | Number of replicas. |
| service_type | str | Service type. |
| service_name | str | Service name. |
| huggingface_task | str | Huggingface task type. |
| backend | str | Backend type. |
| tokenizer_revision | str | Tokenizer revision. |
| max_length | int | Huggingface max sequence length for the tokenizer. |
| disable_lower_case | bool | Do not use lower case for the tokenizer. |
| disable_special_tokens | bool | The sequences will not be encoded with the special tokens relative to their model. |
| dtype | str | Data type to load the weights in. |
| trust_remote_code | bool | Allow loading of models and tokenizers with custom code. |
| tensor_input_names | list[str] | The tensor input names passed to the model. |
| return_token_type_ids | bool | Return token type ids. |
| return_probabilities | bool | Return all probabilities. |
| disable_log_requests | bool | Disable log requests. |
| max_log_len | int | Max number of prompt characters or prompt. |
HuggingFace Task
You can specify the task type for the Huggingface model. The task type must be one of the following:
sequence_classificationtoken_classificationfill_masktext_generationtext2text_generationtext_embedding
Backend
You can specify the backend type for the Huggingface model. The backend type must be one of the following:
AUTOVLLMHUGGINGFACE
Dtype
You can specify the data type to load the weights in. The data type must be one of the following:
AUTOFLOAT32FLOAT16BFLOAT16FLOATHALF
Run Parameters
Can only be specified when calling function.run().
No specific parameters for run of this action.
Entity methods
Run methods
Once the run is created, you can access its attributes and methods through the run object.
invoke
Invoke served model. By default it exposes infer v2 endpoint.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
Name of the model. |
None
|
method
|
str
|
Method of the request. |
'POST'
|
url
|
str
|
URL of the request. |
None
|
**kwargs
|
dict
|
Keyword arguments to pass to the request. |
{}
|
Returns:
| Type | Description |
|---|---|
Response
|
Response from the request. |