ciberlabreport.postprocesing.cleaner module

Regex-based postprocessing utilities for generated report JSON payloads.

TextCleaner consumes a regex configuration file that lives alongside the application configuration, serializes a report to JSON, and applies the listed patterns in order to normalize tokens, fix formatting, or redact values. The resulting string is parsed back into a dictionary that can be fed to the PDF generator.

class ciberlabreport.postprocesing.cleaner.TextCleaner(config_path: Path, mode: str, tmp_path: Path, families_list: list | None = None)

Bases: object

Applies configurable regex substitutions over structured report content.

clean(text: dict | Any) dict | Any

Executes the full cleaning pipeline if the provided object is a dict.

Parameters:

text (dict | Any) – Report payload returned by the LLM.

Returns:

Cleaned dict or the untouched input when not a dict.

Return type:

dict | Any

Raises:

ValueError – If regex substitutions lead to invalid JSON content.