ciberlabreport.postprocesing.cleaner module
Regex-based postprocessing utilities for generated report JSON payloads.
TextCleaner consumes a regex configuration file that lives alongside the application configuration, serializes a report to JSON, and applies the listed patterns in order to normalize tokens, fix formatting, or redact values. The resulting string is parsed back into a dictionary that can be fed to the PDF generator.
- class ciberlabreport.postprocesing.cleaner.TextCleaner(config_path: Path, mode: str, tmp_path: Path, families_list: list | None = None)
Bases:
objectApplies configurable regex substitutions over structured report content.
- clean(text: dict | Any) dict | Any
Executes the full cleaning pipeline if the provided object is a dict.
- Parameters:
text (dict | Any) – Report payload returned by the LLM.
- Returns:
Cleaned dict or the untouched input when not a dict.
- Return type:
dict | Any
- Raises:
ValueError – If regex substitutions lead to invalid JSON content.