ciberlabreport.preprocesing.utils module

Utility helpers shared by preprocessing components.

The helpers in this module are tiny, side-effect-free routines aimed to keep preprocesing lean and testable. They handle text trimming, list compaction, and dict cleanup so reducers can focus on domain logic.

Functions:

trim_text: Normalizes text length without altering non-string types. compact_list: Restricts iterable outputs to a safe number of items. top_from_counter: Converts Counter results into serializable dicts. drop_empty: Removes falsy values from mappings. first_or_value: Returns list’s first element or the value itself.

ciberlabreport.preprocesing.utils.compact_list(value: Any, limit: int) list[Any]

Converts iterable inputs into short lists bounded by limit.

Parameters:
  • value (Any) – Iterable to compact.

  • limit (int) – Maximum number of items to keep.

Returns:

Truncated list, or an empty list when the input is not iterable.

Return type:

list[Any]

ciberlabreport.preprocesing.utils.drop_empty(mapping: Mapping[str, Any]) dict[str, Any]

Removes falsy values so the remaining mapping is JSON-friendly.

Parameters:

mapping (Mapping[str, Any]) – Original dictionary.

Returns:

Copy without None, empty strings, or empty containers.

Return type:

dict[str, Any]

ciberlabreport.preprocesing.utils.first_or_value(value: Any) Any

Returns the first element of a list or the value itself.

Parameters:

value (Any) – List or scalar value.

Returns:

First list entry when possible, otherwise the original value.

Return type:

Any

ciberlabreport.preprocesing.utils.top_from_counter(counter: Counter[str], limit: int) list[dict[str, Any]]

Serializes the most common entries from a collections.Counter.

Parameters:
  • counter (Counter[str]) – Counter instance populated with counts.

  • limit (int) – Number of top entries to include.

Returns:

List of {"value": item, "count": count} dicts.

Return type:

list[dict[str, Any]]

ciberlabreport.preprocesing.utils.trim_text(value: Any, limit: int = 160) Any

Trims long strings to limit characters while keeping other types intact.

Parameters:
  • value (Any) – Value to normalize.

  • limit (int, optional) – Maximum length for string values. Defaults to 160.

Returns:

Trimmed string or the original value if no changes were needed.

Return type:

Any