Text

Process Text

The ProcessText action allows you to process unstructured text using Large Language Models (LLMs) according to provided instructions. The output is a JSON-like string or object that can be validated against a provided Pydantic model.

class aiviro.actions.documents.ProcessText(instructions: str, output_model: type[BaseModel] | dict, text: str, examples: Sequence[tuple[str, BaseModel]] | None = None, images: list[ndarray] | None = None)

Process text using Large Language Model (LLM) according to the provided instructions.

This action takes an unstructured text and a set of instructions for an LLM to process the text. The output is a structured object based on the provided output model, using Pydantic model.

Parameters:
  • instructions – Detailed instructions for the LLM to process the text

  • output_model – Pydantic model to use for the output

  • text – Text to process

  • examples – Optional list of input and corresponding output examples

  • images – Optional list of images to be processed using the LLM

Returns:

ProcessTextResponse object containing the JSON formatted response from the LLM model.

Example:

>>> from aiviro.actions.documents import ProcessText
>>> from pydantic import BaseModel
>>>
>>> class Person(BaseModel):
...     name: str
...     age: int
...
>>> robot = ...  # e.g.: create_desktop_robot()
>>> process_text = ProcessText(
...     instructions="Extract name and age from the text.",
...     output_model=Person,
...     text="John is 30 years old.",
... )
>>> result = process_text(robot=robot)
>>>
>>> # convert response to the Person object
>>> person = Person.model_validate(result.response_dict)
>>> # access the extracted data
>>> print(person.name)
>>> print(person.age)

Extract Structured Text from Image

The ImageExtractStructuredText action runs OCR on an image and returns layout-preserving text, turning horizontal gaps into spaces and keeping line breaks intact.

class aiviro.actions.documents.ImageExtractStructuredText(image: ndarray)

Run OCR on an image and return layout-preserving text.

Accepts a raw np.ndarray only — screen capture and cropping are the responsibility of the caller (e.g. the Editor). OCR boxes are converted to per-character positions and laid out so that horizontal gaps become spaces and line breaks are preserved.

Parameters:

image – Image to recognize, as an H x W x 3 np.ndarray.

Returns:

Layout-preserving text reconstructed from the OCR output. Empty string when OCR finds no text.

Example:

>>> import aiviro
>>> from aiviro.actions.documents import ImageExtractStructuredText
>>>
>>> robot = aiviro.create_static_robot()
>>> img = ...  # np.ndarray, e.g. a cropped region of a screenshot
>>> text = ImageExtractStructuredText(image=img)(robot)
>>> print(text)

Data Schemas

pydantic model aiviro.core.utils.api_client.schemas.process_text.ProcessTextResponse

Response from the ProcessText action.

Note

Always use result.response_dict to access the parsed data, then validate it with your Pydantic model using YourModel.model_validate(result.response_dict).

field model_usage: LLMUsage | None = None

Optional usage statistics of the LLM model.

field response: str [Required]

Response from the LLM model in JSON-like string.

property response_dict: dict

Parsed JSON response as a dictionary