ciberlabreport.preprocesing.utils module
Utility helpers shared by preprocessing components.
The helpers in this module are tiny, side-effect-free routines aimed to keep preprocesing lean and testable. They handle text trimming, list compaction, and dict cleanup so reducers can focus on domain logic.
- Functions:
trim_text: Normalizes text length without altering non-string types. compact_list: Restricts iterable outputs to a safe number of items. top_from_counter: Converts Counter results into serializable dicts. drop_empty: Removes falsy values from mappings. first_or_value: Returns list’s first element or the value itself.
- ciberlabreport.preprocesing.utils.compact_list(value: Any, limit: int) list[Any]
Converts iterable inputs into short lists bounded by
limit.- Parameters:
value (Any) – Iterable to compact.
limit (int) – Maximum number of items to keep.
- Returns:
Truncated list, or an empty list when the input is not iterable.
- Return type:
list[Any]
- ciberlabreport.preprocesing.utils.drop_empty(mapping: Mapping[str, Any]) dict[str, Any]
Removes falsy values so the remaining mapping is JSON-friendly.
- Parameters:
mapping (Mapping[str, Any]) – Original dictionary.
- Returns:
Copy without
None, empty strings, or empty containers.- Return type:
dict[str, Any]
- ciberlabreport.preprocesing.utils.first_or_value(value: Any) Any
Returns the first element of a list or the value itself.
- Parameters:
value (Any) – List or scalar value.
- Returns:
First list entry when possible, otherwise the original value.
- Return type:
Any
- ciberlabreport.preprocesing.utils.top_from_counter(counter: Counter[str], limit: int) list[dict[str, Any]]
Serializes the most common entries from a collections.Counter.
- Parameters:
counter (Counter[str]) – Counter instance populated with counts.
limit (int) – Number of top entries to include.
- Returns:
List of
{"value": item, "count": count}dicts.- Return type:
list[dict[str, Any]]
- ciberlabreport.preprocesing.utils.trim_text(value: Any, limit: int = 160) Any
Trims long strings to
limitcharacters while keeping other types intact.- Parameters:
value (Any) – Value to normalize.
limit (int, optional) – Maximum length for string values. Defaults to 160.
- Returns:
Trimmed string or the original value if no changes were needed.
- Return type:
Any