Skip to content

Dataitem kinds

At the moment, we support the following kinds:

  • table: represents a table

For each different kind, the Dataitem object has its own subclass with different spec and status attributes.

Table

The table kind indicates that the dataitem is a generic table. It's usefull if you intend to manipulate the dataitem as a dataframe, in fact it has some methods to do so. The default dataframe framework we use to represent a table as dataframe is pandas.

Table spec parameters

Parameter Type Description Default
path str Path of the dataitem, can be a local path or a remote path, a single filepath or a directory/partition. required
schema TableSchema Frictionless table schema None

Table methods

The table kind has the following additional methods:

as_df

Read dataitem file (csv or parquet) as a DataFrame from spec.path. It's possible to pass additional arguments to the this function. These keyword arguments will be passed to the DataFrame reader function such as pandas's read_csv or read_parquet.

Parameters:

Name Type Description Default
file_format str

Format of the file to read. By default, it will be inferred from the extension of the file.

None
engine str

Dataframe framework, by default pandas.

'pandas'
**kwargs dict

Keyword arguments passed to the read_df function.

{}

Returns:

Type Description
Any

DataFrame.

write_df

Write DataFrame as parquet/csv/table into dataitem spec.path. keyword arguments will be passed to the DataFrame reader function such as pandas's to_csv or to_parquet. Note that by default the index is dropped when writing the dataframe. To keep the index, you can pass index=True as a keyword argument. If the dataitem path is a SQL scheme, the dataframe will be written to the table specified in the path (sql://(/)/).

Parameters:

Name Type Description Default
df Any

DataFrame to write.

required
extension str

Extension of the file (supported parquet and csv).

None
**kwargs dict

Keyword arguments passed to the write_df function.

{}

Returns:

Type Description
str

Path to the written dataframe.

Examples:

>>> import digitalhub as dh
>>> import pandas as pd
>>>
>>> p = dh.get_project("my_project")
>>> df = pd.read_df("data/my_data.csv")
>>> di = p.new_dataitem(
...     name="my_dataitem",
...     kind="table",
...     path="s3://my-bucket/my-data.parquet",
... )
>>> di.write_df(
...     df,
...     extension="parquet",
...     index=True,
... )
's3://my-bucket/my-data.parquet'