Skip to content

ML scenario introduction

This is a scenario that comes as an official tutorial of MLRun. In fact, its related notebook can be found in your Jupyter instance: /tutorial/01-mlrun-basics.ipynb. However, we skip a number of cells to keep it concise and to the point, while preserving the same functionality.

To run this notebook, use the Python 3 (ipykernel) kernel. To do this, select kernel in the top bar and change kernel in the dropdown menu. A new window will open, where you can select the desired kernel.

The resulting edited notebook, as well as a file for the function we will create, are available in the documentation/examples/ml path within the repository of this documentation.

We will prepare data, train a model and expose it as a service. Access Jupyter from your Coder instance and create a new notebook.

Set-up

Let's initialize our working environment. Import required libraries:

import mlrun
import os

Load environment variables for MLRun:

ENV_FILE = ".mlrun.env"
if os.path.exists(ENV_FILE):
    mlrun.set_env_from_file(ENV_FILE)

Create a MLRun project:

PROJECT = "demo-ml"
project = mlrun.get_or_create_project(PROJECT, "./")

Generate data

Define the following function, which generates the dataset as required by the model:

%%writefile data-prep.py

import pandas as pd
from sklearn.datasets import load_breast_cancer

import mlrun


@mlrun.handler(outputs=["dataset", "label_column"])
def breast_cancer_generator():
    breast_cancer = load_breast_cancer()
    breast_cancer_dataset = pd.DataFrame(
        data=breast_cancer.data, columns=breast_cancer.feature_names
    )
    breast_cancer_labels = pd.DataFrame(data=breast_cancer.target, columns=["label"])
    breast_cancer_dataset = pd.concat(
        [breast_cancer_dataset, breast_cancer_labels], axis=1
    )

    return breast_cancer_dataset, "label"

Register it:

data_gen_fn = project.set_function("data-prep.py", name="data-prep", kind="job", image="mlrun/mlrun", handler="breast_cancer_generator")
project.save()

Run it locally:

gen_data_run = project.run_function("data-prep", local=True)

You can view the state of the execution with gen_data_run.state() or its output with gen_data_run.outputs. You can see a few records from the output artifact:

gen_data_run.artifact("dataset").as_df().head()

We will now proceed to training a model.