Skip to content

CRUD

The CRUD methods are used to create, read, update and delete dataitems. There are two ways to use them. The first is through the SDK and the second is through the Project object. The syntax is the same for all CRUD methods. If you want to manage dataitems from the project, you can use the Project object and avoid to specify the project parameter. In this last case, you need to specify every parameter as keyword argument. In any case, you need to first import the SDK and instantiate a Project object that will be the context in which you can manage entities.

Example:

import digitalhub as dh

project = dh.get_or_create_project("my-project")

# Use CRUD method on project

dataitem = project.new_dataitem(name="my-dataitem",
                                kind="table",
                                path="path-to-some-data")

# Use CRUD method from SDK

dataitem = dh.new_dataitem(project="my-project",
                           name="my-dataitem",
                           kind="table",
                           path="path-to-some-data")

A dataitem entity can be managed with the following methods.

Create:

Read:

Update:

Delete:

Create

You can create a dataitem with the new_dataitem() or with log_dataitem() method. The kwargs parameters are determined by the kind of the object, and are described in the kinds section. The kwargs parameters are the same for both new and log methods.

New

This function create a new entity and saves it into the backend.

new_dataitem

Parameters:

Name Type Description Default
project str

Project name.

required
name str

Object name.

required
kind str

Kind the object.

required
uuid str

ID of the object.

None
description str

Description of the object (human readable).

None
labels list[str]

List of labels.

None
embedded bool

Flag to determine if object spec must be embedded in project spec.

False
path str

Object path on local file system or remote storage. It is also the destination path of upload() method.

None
**kwargs dict

Spec keyword arguments.

{}

Returns:

Type Description
Dataitem

Object instance.

Examples:

>>> obj = new_dataitem(project="my-project",
>>>                    name="my-dataitem",
>>>                    kind="dataitem",
>>>                    path="s3://my-bucket/my-key")

Log

This function create a new entity into the backend and also upload a local file into a dataitem store (eg. S3).

log_dataitem

Parameters:

Name Type Description Default
project str

Project name.

required
name str

Object name.

required
kind str

Kind the object.

required
source str

Dataitem location on local path.

None
data Any

Dataframe to log. Alternative to source.

None
extension str

Extension of the output dataframe.

None
path str

Destination path of the dataitem. If not provided, it's generated.

None
**kwargs dict

New dataitem spec parameters.

{}

Returns:

Type Description
Dataitem

Object instance.

Examples:

>>> obj = log_dataitem(project="my-project",
>>>                    name="my-dataitem",
>>>                    kind="table",
>>>                    data=df)

Read

To read dataitems you can use the get_dataitem(), get_dataitem_versions(), list_dataitems() or import_dataitem() functions.

Get

This function searches for a single dataitem into the backend. If you want to collect a dataitem from the backend using get_dataitem(), you have two options:

  • The first one is to use the key parameter which has the pattern store://<project-name>/<entity-type>/<entity-kind>/<entity-name>:<entity-id>.
  • The second one is to use the entity name as identifier, the project name as project and the entity id as entity_id parameters. If you do not specify the entity id, you will get the latest version.
get_dataitem

Parameters:

Name Type Description Default
identifier str

Entity key (store://...) or entity name.

required
project str

Project name.

None
entity_id str

Entity ID.

None
**kwargs dict

Parameters to pass to the API call.

{}

Returns:

Type Description
Dataitem

Object instance.

Examples:

Using entity key:

>>> obj = get_dataitem("store://my-dataitem-key")

Using entity name:

>>> obj = get_dataitem("my-dataitem-name"
>>>                    project="my-project",
>>>                    entity_id="my-dataitem-id")

Get versions

This function returns all the versions of a dataitem from the backend.

get_dataitem_versions

Parameters:

Name Type Description Default
identifier str

Entity key (store://...) or entity name.

required
project str

Project name.

None
**kwargs dict

Parameters to pass to the API call.

{}

Returns:

Type Description
list[Dataitem]

List of object instances.

Examples:

Using entity key:

>>> objs = get_dataitem_versions("store://my-dataitem-key")

Using entity name:

>>> objs = get_dataitem_versions("my-dataitem-name",
>>>                              project="my-project")

List

This function returns all the latest dataitems from the backend related to a project.

list_dataitems

Parameters:

Name Type Description Default
project str

Project name.

required
**kwargs dict

Parameters to pass to the API call.

{}

Returns:

Type Description
list[Dataitem]

List of object instances.

Examples:

>>> objs = list_dataitems(project="my-project")

Import

This function load the dataitem from a local yaml file descriptor.

import_dataitem

Parameters:

Name Type Description Default
file str

Path to YAML file.

required

Returns:

Type Description
Dataitem

Object instance.

Examples:

>>> obj = import_dataitem("my-dataitem.yaml")

Update

To update a dataitem you can use the update_dataitem() method.

update_dataitem

Parameters:

Name Type Description Default
entity Dataitem

Object to update.

required

Returns:

Type Description
Dataitem

Entity updated.

Examples:

>>> obj = update_dataitem(obj)

Delete

To delete a dataitem you can use the delete_dataitem() method.

delete_dataitem

Parameters:

Name Type Description Default
identifier str

Entity key (store://...) or entity name.

required
project str

Project name.

None
entity_id str

Entity ID.

None
delete_all_versions bool

Delete all versions of the named entity. If True, use entity name instead of entity key as identifier.

False
**kwargs dict

Parameters to pass to the API call.

{}

Returns:

Type Description
dict

Response from backend.

Examples:

If delete_all_versions is False:

>>> obj = delete_dataitem("store://my-dataitem-key")

Otherwise:

>>> obj = delete_dataitem("my-dataitem-name",
>>>                       project="my-project",
>>>                       delete_all_versions=True)