Entity and methods
Dataitem
Bases: MaterialEntity
A class representing a dataitem.
Source code in digitalhub_data/entities/dataitem/entity/_base.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
DataitemDataitem
Bases: Dataitem
Dataitem dataitem.
Source code in digitalhub_data/entities/dataitem/entity/dataitem.py
6 7 8 9 |
|
DataitemIceberg
Bases: Dataitem
Iceberg dataitem.
Source code in digitalhub_data/entities/dataitem/entity/iceberg.py
4 5 6 7 |
|
DataitemTable
Bases: Dataitem
Table dataitem.
Source code in digitalhub_data/entities/dataitem/entity/table.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|
as_df(file_format=None, engine=None, clean_tmp_path=True, **kwargs)
Read dataitem file (csv or parquet) as a DataFrame from spec.path. If the dataitem is not local, it will be downloaded to a temporary folder named tmp_dir in the project context folder. If clean_tmp_path is True, the temporary folder will be deleted after the method is executed. It's possible to pass additional arguments to the this function. These keyword arguments will be passed to the DataFrame reader function such as pandas's read_csv or read_parquet.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_format |
str
|
Format of the file. (Supported csv and parquet). |
None
|
engine |
str
|
Dataframe framework, by default pandas. |
None
|
clean_tmp_path |
bool
|
If True, the temporary folder will be deleted. |
True
|
**kwargs |
dict
|
Keyword arguments passed to the read_df function. |
{}
|
Returns:
Type | Description |
---|---|
Any
|
DataFrame. |
Source code in digitalhub_data/entities/dataitem/entity/table.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
write_df(df, extension=None, **kwargs)
Write DataFrame as parquet/csv/table into dataitem spec.path. keyword arguments will be passed to the DataFrame reader function such as pandas's to_csv or to_parquet.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
Any
|
DataFrame to write. |
required |
extension |
str
|
Extension of the file. |
None
|
**kwargs |
dict
|
Keyword arguments passed to the write_df function. |
{}
|
Returns:
Type | Description |
---|---|
str
|
Path to the written dataframe. |
Source code in digitalhub_data/entities/dataitem/entity/table.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
dataitem_from_dict(obj)
Create a new object from dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj |
dict
|
Dictionary to create object from. |
required |
Returns:
Type | Description |
---|---|
Dataitem
|
Object instance. |
Source code in digitalhub_data/entities/dataitem/builder.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
|
dataitem_from_parameters(project, name, kind, uuid=None, description=None, labels=None, embedded=True, path=None, **kwargs)
Create a new object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project |
str
|
Project name. |
required |
name |
str
|
Object name. |
required |
kind |
str
|
Kind the object. |
required |
uuid |
str
|
ID of the object (UUID4, e.g. 40f25c4b-d26b-4221-b048-9527aff291e2). |
None
|
description |
str
|
Description of the object (human readable). |
None
|
labels |
list[str]
|
List of labels. |
None
|
embedded |
bool
|
Flag to determine if object spec must be embedded in project spec. |
True
|
path |
str
|
Object path on local file system or remote storage. It is also the destination path of upload() method. |
None
|
**kwargs |
dict
|
Spec keyword arguments. |
{}
|
Returns:
Type | Description |
---|---|
Dataitem
|
Object instance. |
Source code in digitalhub_data/entities/dataitem/builder.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
|