Dataitem kinds

At the moment, we support the following kinds:

table: represents a table

For each different kind, the Dataitem object has its own subclass with different spec and status attributes.

Table

The table kind indicates that the dataitem is a generic table. It's usefull if you intend to manipulate the dataitem as a dataframe, in fact it has some methods to do so. The default dataframe framework we use to represent a table as dataframe is pandas.

Table spec parameters

Parameter	Type	Description	Default
`path`	str	Path of the dataitem, can be a local path or a remote path, a single filepath or a directory/partition.	required
`schema`	TableSchema	Frictionless table schema	`None`

Table methods

The table kind has the following additional methods:

`as_df`

Read dataitem file (csv or parquet) as a DataFrame from spec.path. It's possible to pass additional arguments to the this function. These keyword arguments will be passed to the DataFrame reader function such as pandas's read_csv or read_parquet.

Parameters:

Name	Type	Description	Default
`file_format`	`str`	Format of the file to read. By default, it will be inferred from the extension of the file.	`None`
`engine`	`str`	Dataframe framework, by default pandas.	`'pandas'`
`**kwargs`	`dict`	Keyword arguments passed to the read_df function.	`{}`

Returns:

Type	Description
`Any`	DataFrame.

`write_df`

Write DataFrame as parquet/csv/table into dataitem spec.path. keyword arguments will be passed to the DataFrame reader function such as pandas's to_csv or to_parquet. Note that by default the index is dropped when writing the dataframe. To keep the index, you can pass index=True as a keyword argument. If the dataitem path is a SQL scheme, the dataframe will be written to the table specified in the path (sql://(/)/).

Parameters:

Name	Type	Description	Default
`df`	`Any`	DataFrame to write.	required
`extension`	`str`	Extension of the file (supported parquet and csv).	`None`
`**kwargs`	`dict`	Keyword arguments passed to the write_df function.	`{}`

Returns:

Type	Description
`str`	Path to the written dataframe.

Examples:

>>> import digitalhub as dh
>>> import pandas as pd
>>>
>>> p = dh.get_project("my_project")
>>> df = pd.read_df("data/my_data.csv")
>>> di = p.new_dataitem(
...     name="my_dataitem",
...     kind="table",
...     path="s3://my-bucket/my-data.parquet",
... )
>>> di.write_df(
...     df,
...     extension="parquet",
...     index=True,
... )
's3://my-bucket/my-data.parquet'