Table Data
There are multiple readers available in Aiviro for processing ‘.xlsx’, ‘.csv’ files or table-like data from clipboard. Choose the one that best fits your needs.
Excel Actions
Note
It’s required to install reader from Optional dependencies section.
- class aiviro.actions.documents.tables.ExcelAutoReader(file_path: Path | str, expected_header: list[str], sheet: str | int | None = None, mapping_key: str | int | tuple[str | int, ...] | None = None, skip_same_mapping_keys: bool = False)
Automatically reads the loaded Excel file and extracts tables from it, based on the argument expected_header. If an empty file is provided, an empty list is returned. Supports reading the same file extensions as openpyxl package.
- Parameters:
file_path – Path to the Excel file
expected_header – Header which should be found in the table
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with the same mapping_key
- Example:
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> # read multiple tables from the same file >>> excel_auto_r = ExcelAutoReader( ... file_path="path/to/file.xlsx", ... expected_header=["Name", "Identifier", "Amount", "km"], ... sheet="sheet1", ... mapping_key="Identifier", ... ) >>> detected_tables = excel_auto_r()
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> # when reading an empty xlsx file, an empty list is returned >>> excel_auto_r = ExcelAutoReader( ... file_path="path/to/empty-file.xlsx", ... expected_header=["Name", "Identifier", "Amount", "km"], ... sheet="sheet2", ... ) >>> tables = excel_auto_r() >>> print(tables) # will print []
- class aiviro.actions.documents.tables.ExcelTableReader(file_path: Path | str, header_row: int, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: str | int | None = None, mapping_key: str | int | tuple[str | int, ...] | None = None, skip_same_mapping_keys: bool = False, header_fields: list[str] | None = None, table_name: str = '')
Returns extracted table from the required file’s sheet.
- Parameters:
file_path – Path to the Excel file
header_row – Row that contains header, starting from 1.
end_row – The last row to read, if None read until end of the file
start_col – The first column to read, if None read from the first column. Starting from 1.
end_col – The last column to read, if None read until end of the file
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
mapping_key – Key used for unique identification of the loaded rows, if None row_index is used
skip_same_mapping_keys – Ignore rows with same mapping_key
header_fields – Custom names for header that can be used for data identification
table_name – Name of the table
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)() >>> for row in table: ... # process row
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> excel_table_r = ExcelTableReader( ... file_path="path/to/file.xlsx", ... header_row=1, ... end_row=23, ... start_col=5, ... sheet="Super Sheet Name", ... mapping_key=("item_name", "item_category"), ... ) >>> t1 = excel_table_r() >>> r1 = t1[("item1", "small")] # returns row by specified mapping_key >>> r1_amount = r1["amount"] # returns row's value from column "amount" ... >>> another_excel_table_r = ExcelTableReader( ... file_path="path/to/file.xlsx", ... header_row=5, ... mapping_key="a", # map based on values from the column "a" ... header_fields=["a", "b", "c", "d"], # custom header names ... ) >>> t2 = excel_table_r() >>> r2 = t2["name"]["c"]
- class aiviro.actions.documents.tables.ExcelLineReader(file_path: Path | str, start_row: int | None = None, end_row: int | None = None, start_col: int | None = None, end_col: int | None = None, sheet: str | int | None = None)
Reads provided sheet, line by line. If the provided file is empty or the coordinates are set incorrectly, it returns an iterator with empty TableRow objects’ values.
- Parameters:
file_path – Path to the Excel file
start_row – The first row to read, if None read from the first row. Starting from 1.
end_row – The last row to read, if None read until the end of the file
start_col – The first column to read, if None read from the first column. Starting from 1.
end_col – The last column to read, if None read until the end of the file
sheet – Sheet to use for reading. If not provided, it uses the first sheet created in the workbook, or the last sheet which was explicitly set as active
- Example:
>>> from aiviro.actions.documents.tables import ExcelLineReader >>> excel_line_r = ExcelLineReader( ... file_path="path/to/file.xlsx", ... start_row=5, ... start_col=1, ... end_col=8, ... sheet=3, ... ) >>> for row in excel_line_r(): ... # process row
>>> from aiviro.actions.documents.tables import ExcelLineReader >>> # iterator can be transformed into the list >>> excel_line_r = ExcelLineReader( ... file_path="path/to/some-file.xlsx", ... start_row=3, ... end_row=8, ... start_col=1, ... end_col=5, ... sheet="my-sheet", ... ) >>> rows = list(excel_line_r()) >>> print(len(rows)) # print number of rows >>> # or process rows here
- class aiviro.actions.documents.tables.ExcelGetSheetNames(file_path: Path | str)
Simple action that returns names of all sheets in the provided Excel file.
- Example:
>>> from aiviro.actions.documents.tables import ExcelGetSheetNames >>> sheet_names = ExcelGetSheetNames(file_path="path/to/file.xlsx")() >>> print(sheet_names[0]))
CSV Actions
- class aiviro.actions.documents.tables.csv_reader.CSVTableReader(file_path: Path | str, expected_header: list[str], delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)
Reads the provided CSV file and extracts tables from it, based on the argument expected_header, which is expected to be in the first row.
- Parameters:
file_path – Path to the CSV file
expected_header – Header that csv file should have in its first row
delimiter – Delimiter used in the CSV file (by default “,”)
encoding – Encoding used in the CSV file (by default “utf-8”)
kwargs – Additional keyword arguments passed to the csv.reader function
- Raises:
FileNotFoundError – If the file does not exist.
TableHeaderNotProvided – If the expected_header is empty.
- Example:
>>> from aiviro.actions.documents import CSVTableReader >>> # gets data from the csv file, where the first row is taken as a header >>> table = CSVTableReader(file_path="path/to/your/file.csv")() >>> # print the table extracted from the csv file >>> print(table)
>>> from aiviro.actions.documents import CSVTableReader >>> table = CSVTableReader( ... file_path="path/to/file.csv", expected_header=["Name", "Amount"], delimiter=";", encoding="utf-16" ... )() >>> # print data from the third row of column 'Amount' >>> print(table[2]["Amount"])
- class aiviro.actions.documents.tables.csv_reader.CSVLineReader(file_path: Path | str, delimiter: str = ',', encoding: str = 'utf-8', **kwargs: Any)
Yields rows from the provided CSV file, line by line.
- Parameters:
file_path – Path to the CSV file
delimiter – Delimiter used in the CSV file (by default “,”)
encoding – Encoding used in the CSV file (by default “utf-8”)
kwargs – Additional keyword arguments passed to the csv.reader function
- Raises:
FileNotFoundError – If the file does not exist.
>>> from aiviro.actions.documents import CSVLineReader >>> res = CSVLineReader(file_path="path/to/your/file.csv")() >>> # process each row one by one >>> for row in res: >>> # process row
>>> from aiviro.actions.documents import CSVLineReader >>> # can be processed as a list >>> res = list(CSVLineReader( ... file_path="path/to/file.csv", ... delimiter="-", ... skipinitialspace=True, # any other kwargs for csv.reader can be provided here ... )()) >>> # prints out values from the third row in a list format >>> print(res[2].values)
String Actions
- class aiviro.actions.documents.tables.string_reader.StringTableReader(string_table: str, line_delimiter: str = '\r\n', column_delimiter: str = '\t', include_header: bool = True)
Reads provided string and parses it into a Table object. It strips every cell of the string, therefor it removes all leading and trailing whitespaces. It validates that each row has the same number of columns.
- Parameters:
string_table – String to be parsed
line_delimiter – Delimiter used to split the string into lines. When processing Windows data, it is recommended to use “rn” as the line delimiter.
column_delimiter – Delimiter used to split the string into columns.
include_header – Whether the first line of the string should be used as the header. If not provided, the header will be generated in an Excel-style, e.g., A, B, C, …, Z, AA, AB, …
- Raises:
ValueError – If the number of columns in each row is not the same.
- Example:
>>> import aiviro >>> from aiviro.actions.documents.tables import StringTableReader >>> >>> r = aiviro.create_rdp_robot_v2() >>> string_table = r.get_clipboard_text() # get data from clipboard >>> table = StringTableReader(string_table=string_table)() >>> for i, row in table: ... print(row["A"]) # print value from column A ... print(row[1]) # print value from the second column
Data Schemas
- pydantic model aiviro.actions.documents.tables.schemas.TableHeader
- pydantic model aiviro.actions.documents.tables.schemas.TableRow
- Example:
>>> from aiviro.actions.documents.tables import ExcelAutoReader >>> excel_auto_r = ExcelAutoReader(file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"]) >>> table = excel_auto_r() >>> row_value = table[0]["b"]
- field header: TableHeader | None [Required]
- pydantic model aiviro.actions.documents.tables.schemas.Table
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader( ... file_path="path/to/file.xlsx", header_row=1, header_fields=["a", "b", "c"] ... )() >>> for map_key, row in table: ... # iterates through all rows >>> row = table[0] # access row at index 0
- field header: TableHeader [Required]
- dump_by_columns(use_normalized_keys: bool = True) dict[str, list[Any]]
- Parameters:
use_normalized_keys – if True, the normalized keys are used. Otherwise, the keys are the same as in the header.
- Returns:
dictionary where each key represents a column name and its value is a list of values from that column.
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=3)() >>> res = table.dump_by_columns() >>> print(res) ... # { ... # "columnA": [10, 11], # columnA with row 1 and 2 values ... # "columnB": [20, 21], # columnB with row 1 and 2 values ... # } ... ... # print out value of the 'columnB' from the second row >>> print(res["columnB"][1]) # prints 21
- dump_by_rows(use_normalized_keys: bool = True) list[dict[str, Any]]
The first dictionary in the list represents the row with index 0.
- Parameters:
use_normalized_keys – If True, the normalized keys are used. Otherwise, the keys are the same as in the header.
- Returns:
list of dictionaries where each dictionary represents a row with column name as its key.
- Example:
>>> from aiviro.actions.documents.tables import ExcelTableReader >>> table = ExcelTableReader(file_path="path/to/file.xlsx", header_row=1)() >>> result = table.dump_by_rows() >>> print(result) ... # [ ... # {"columnA": 10, "columnB": 20, "columnC": 30}, # row 1 ... # {"columnA": 11, "columnB": 21, "columnC": 31}, # row 2 ... # ] ... ... # print out value from the second row and second column ('columnB') >>> print(result[1]["columnB"]) # prints 21