Text
Process Text
The ProcessText action allows you to process unstructured text using Large Language Models (LLMs) according to provided instructions.
The output is a JSON-like string or object that can be validated against a provided Pydantic model.
- class aiviro.actions.documents.ProcessText(instructions: str, output_model: type[BaseModel] | dict, text: str, examples: Sequence[tuple[str, BaseModel]] | None = None, images: list[ndarray] | None = None)
Process text using Large Language Model (LLM) according to the provided instructions.
This action takes an unstructured text and a set of instructions for an LLM to process the text. The output is a structured object based on the provided output model, using Pydantic model.
- Parameters:
instructions – Detailed instructions for the LLM to process the text
output_model – Pydantic model to use for the output
text – Text to process
examples – Optional list of input and corresponding output examples
images – Optional list of images to be processed using the LLM
- Returns:
ProcessTextResponse object containing the JSON formatted response from the LLM model.
- Example:
>>> from aiviro.actions.documents import ProcessText >>> from pydantic import BaseModel >>> >>> class Person(BaseModel): ... name: str ... age: int ... >>> robot = ... # e.g.: create_desktop_robot() >>> process_text = ProcessText( ... instructions="Extract name and age from the text.", ... output_model=Person, ... text="John is 30 years old.", ... ) >>> result = process_text(robot=robot) >>> >>> # convert response to the Person object >>> person = Person.model_validate(result.response_dict) >>> # access the extracted data >>> print(person.name) >>> print(person.age)
Extract Structured Text from Image
The ImageExtractStructuredText action runs OCR on an image and returns layout-preserving text,
turning horizontal gaps into spaces and keeping line breaks intact.
- class aiviro.actions.documents.ImageExtractStructuredText(image: ndarray)
Run OCR on an image and return layout-preserving text.
Accepts a raw
np.ndarrayonly — screen capture and cropping are the responsibility of the caller (e.g. the Editor). OCR boxes are converted to per-character positions and laid out so that horizontal gaps become spaces and line breaks are preserved.- Parameters:
image – Image to recognize, as an
H x W x 3np.ndarray.- Returns:
Layout-preserving text reconstructed from the OCR output. Empty string when OCR finds no text.
- Example:
>>> import aiviro >>> from aiviro.actions.documents import ImageExtractStructuredText >>> >>> robot = aiviro.create_static_robot() >>> img = ... # np.ndarray, e.g. a cropped region of a screenshot >>> text = ImageExtractStructuredText(image=img)(robot) >>> print(text)
Data Schemas
- pydantic model aiviro.core.utils.api_client.schemas.process_text.ProcessTextResponse
Response from the
ProcessTextaction.Note
Always use
result.response_dictto access the parsed data, then validate it with your Pydantic model usingYourModel.model_validate(result.response_dict).