ciberlabreport.llm.openai module

Module wrapping OpenAI chat/completions API for report generation.

This module provides the OpenAIWrapper class, which coordinates prompt loading, schema validation, synchronous/asynchronous clients, and PDF-ready payload extraction.

Classes:
OpenAIWrapper: High-level helper to call OpenAI, enforce schemas, and

convert responses into data structures expected by the PDF builder.

class ciberlabreport.llm.openai.OpenAIWrapper(openai_api_key: str, schemas_path: Path, prompts_path: Path, model: str = 'gpt-5', temperature: int = 0.0, max_completion_tokens: int = 25000)

Bases: object

Convenience wrapper that handles prompts, schemas, and OpenAI clients.

build_user_images_for_pdf(images: list[tuple], response_images: dict) list

Generate the necesary list to append the images in the PDF file.

Parameters:
  • images (list[tuple]) – List of pathlib.Path object and base64 representation of each shot.

  • response_images (dict) – LLM output for the images.

Returns:

List containing necesary data to PDF for each file: id, path and caption.

Return type:

list

build_user_images_to_call(images: list[tuple]) HumanMessage

Creates the HumanMessage to make the LLM call with Images.

Parameters:

images (list[tuple]) – List of pathlib.Path object and base64 representation of each shot.

Returns:

Well formed object to pass to the LLM call.

Return type:

HumanMessage

call(system: SystemMessage, user: HumanMessage, json_schema: dict[str, Any]) tuple[dict[str, Any], dict[str, Any]]

Executes a structured OpenAI chat completion enforcing a JSON schema.

Parameters:
  • system (SystemMessage) – System message describing the system role.

  • user (HumanMessage) – Prompt content generated from the reduced report data.

  • json_schema (dict[str, Any]) – JSON schema fed to the response_format API.

Returns:

Raw response converted to a serializable dict when possible and measurable statistics of the call.

Return type:

tuple[dict[str, Any], dict[str, Any]]

Raises:

RuntimeError – If the OpenAI client call fails for any reason.

call_init_text_with_retry(system_init_text: SystemMessage, response_report: dict[str, Any], init_text_schema: dict[str, Any], max_attempts: int = 3) tuple[dict[str, Any], list[dict[str, Any]]]

Call init text generation with automatic retries on length/schema violations.

create_stats(stats: list) dict

Generates a dictionary with tokens and money spent for each LLM call.

Parameters:

stats (list) – List with each LLM callback.

Returns:

Well formed stats.

Return type:

dict

decide_ransomware(response_images: dict, vt_file_data: dict) dict

Decide ransomware verdict using image evidence and VT data.

Parameters:
  • response_images (dict) – LLM output for image analysis.

  • vt_file_data (dict) – Normalized VirusTotal data for the file.

Returns:

Verdict payload containing decision, confidence, source, and evidence.

Return type:

dict

generate_report(reduced: dict[str, Any], images: list[tuple], vt_file_data: dict[str, Any], malware_types: dict[str], families_list: list[str]) tuple[dict[str, Any], Any, list[dict[str, str]]]

Calls the LLM to transform reduced report data into a PDF-ready payload.

Parameters:
  • reduced (dict[str, Any]) – Trimmed report generated.

  • images (list[tuple]) – Path and base64 representation of images.

  • vt_file_data (dict[str, Any]) – VirusTotal data for the file.

  • malware_types (dict[str]) – Possible malware types stored in config.

  • families_list (list[str]) – Possible families names

Returns:

Set containing the statistics,

the structured content to render as PDF, the structured content for images in PDF and the initial text to insert in the webapp related with the sample.

Return type:

tuple[dict, dict[str, Any], list[dict[str, Any]], dict[str, Any]]

get_file(filename: str, variables: dict = None) str | dict

Load a .txt or .json file from the appropriate base directory.

  • For .txt files:
    • Reads the file as a string.

    • Applies str.format(**variables) if variables are provided.

  • For .json files:
    • Parses and returns the JSON as a Python dict.

Parameters:
  • filename (str) – Name of the file to load. Must end in .txt or .json.

  • variables (Mapping[str, Any] | None, optional) – A dictionary of variables to format into .txt templates. Ignored for .json files.

Returns:

The loaded file content: a formatted string for .txt files or a dictionary for .json files.

Return type:

str | dict

Raises:

ValueError – If the file extension is unsupported, the file is not found, template variables are missing, or JSON decoding fails.