Dataitem kinds
At the moment, we support the following kinds:
table: represents a table
For each different kind, the Dataitem object has its own subclass with different spec and status attributes.
Table
The table kind indicates that the dataitem is a generic table. It's usefull if you intend to manipulate the dataitem as a dataframe, in fact it has some methods to do so. The default dataframe framework we use to represent a table as dataframe is pandas.
Table spec parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
path |
str | Path of the dataitem, can be a local path or a remote path, a single filepath or a directory/partition. | required |
schema |
TableSchema | Frictionless table schema | None |
Table methods
The table kind has the following additional methods:
as_df
Read dataitem file (csv or parquet) as a DataFrame from spec.path. It's possible to pass additional arguments to the this function. These keyword arguments will be passed to the DataFrame reader function such as pandas's read_csv or read_parquet.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_format
|
str
|
Format of the file to read. By default, it will be inferred from the extension of the file. |
None
|
engine
|
str
|
Dataframe framework, by default pandas. |
'pandas'
|
**kwargs
|
dict
|
Keyword arguments passed to the read_df function. |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
DataFrame. |
write_df
Write DataFrame as parquet/csv/table into dataitem spec.path.
keyword arguments will be passed to the DataFrame reader function such as
pandas's to_csv or to_parquet.
Note that by default the index is dropped when writing the dataframe. To
keep the index, you can pass index=True as a keyword argument.
If the dataitem path is a SQL scheme, the dataframe will be written to the
table specified in the path (sql://
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
Any
|
DataFrame to write. |
required |
extension
|
str
|
Extension of the file (supported parquet and csv). |
None
|
**kwargs
|
dict
|
Keyword arguments passed to the write_df function. |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
Path to the written dataframe. |
Examples:
>>> import digitalhub as dh
>>> import pandas as pd
>>>
>>> p = dh.get_project("my_project")
>>> df = pd.read_df("data/my_data.csv")
>>> di = p.new_dataitem(
... name="my_dataitem",
... kind="table",
... path="s3://my-bucket/my-data.parquet",
... )
>>> di.write_df(
... df,
... extension="parquet",
... index=True,
... )
's3://my-bucket/my-data.parquet'