ciberlabreport.llm.openai module

Module wrapping OpenAI chat/completions API for report generation.

This module provides the OpenAIWrapper class, which coordinates prompt loading, schema validation, synchronous/asynchronous clients, and PDF-ready payload extraction.

Classes:

OpenAIWrapper: High-level helper to call OpenAI, enforce schemas, and: convert responses into data structures expected by the PDF builder.

class ciberlabreport.llm.openai.OpenAIWrapper(openai_api_key: str, schemas_path: Path, prompts_path: Path, model: str = 'gpt-5', temperature: int = 0.0, max_completion_tokens: int = 25000)

Bases: object

Convenience wrapper that handles prompts, schemas, and OpenAI clients.

build_user_images_for_pdf(images: list[tuple], response_images: dict) → list

Generate the necesary list to append the images in the PDF file.

Parameters:

images (list[tuple]) – List of pathlib.Path object and base64 representation of each shot.
response_images (dict) – LLM output for the images.

Returns:

List containing necesary data to PDF for each file: id, path and caption.

Return type:

list

build_user_images_to_call(images: list[tuple]) → HumanMessage

Creates the HumanMessage to make the LLM call with Images.

Parameters:: images (list[tuple]) – List of pathlib.Path object and base64 representation of each shot.
Returns:: Well formed object to pass to the LLM call.
Return type:: HumanMessage

call(system: SystemMessage, user: HumanMessage, json_schema: dict[str, Any]) → tuple[dict[str, Any], dict[str, Any]]

Executes a structured OpenAI chat completion enforcing a JSON schema.

Parameters:

system (SystemMessage) – System message describing the system role.
user (HumanMessage) – Prompt content generated from the reduced report data.
json_schema (dict[str, Any]) – JSON schema fed to the response_format API.

Returns:

Raw response converted to a serializable dict when possible and measurable statistics of the call.

Return type:

tuple[dict[str, Any], dict[str, Any]]

Raises:

RuntimeError – If the OpenAI client call fails for any reason.

call_init_text_with_retry(system_init_text: SystemMessage, response_report: dict[str, Any], init_text_schema: dict[str, Any], max_attempts: int = 3) → tuple[dict[str, Any], list[dict[str, Any]]]: Call init text generation with automatic retries on length/schema violations.

create_stats(stats: list) → dict

Generates a dictionary with tokens and money spent for each LLM call.

Parameters:: stats (list) – List with each LLM callback.
Returns:: Well formed stats.
Return type:: dict

decide_ransomware(response_images: dict, vt_file_data: dict) → dict

Decide ransomware verdict using image evidence and VT data.

Parameters:

response_images (dict) – LLM output for image analysis.
vt_file_data (dict) – Normalized VirusTotal data for the file.

Returns:

Verdict payload containing decision, confidence, source, and evidence.

Return type:

dict

generate_report(reduced: dict[str, Any], images: list[tuple], vt_file_data: dict[str, Any], malware_types: dict[str], families_list: list[str]) → tuple[dict[str, Any], Any, list[dict[str, str]]]

Calls the LLM to transform reduced report data into a PDF-ready payload.

Parameters:

reduced (dict[str, Any]) – Trimmed report generated.
images (list[tuple]) – Path and base64 representation of images.
vt_file_data (dict[str, Any]) – VirusTotal data for the file.
malware_types (dict[str]) – Possible malware types stored in config.
families_list (list[str]) – Possible families names

Returns:

Set containing the statistics,: the structured content to render as PDF, the structured content for images in PDF and the initial text to insert in the webapp related with the sample.

Return type:

tuple[dict, dict[str, Any], list[dict[str, Any]], dict[str, Any]]

get_file(filename: str, variables: dict = None) → str | dict

Load a .txt or .json file from the appropriate base directory.

For .txt files:
- Reads the file as a string.
- Applies str.format(**variables) if variables are provided.
For .json files:
- Parses and returns the JSON as a Python dict.

Parameters:

filename (str) – Name of the file to load. Must end in .txt or .json.
variables (Mapping[str, Any] | None, optional) – A dictionary of variables to format into .txt templates. Ignored for .json files.

Returns:

The loaded file content: a formatted string for .txt files or a dictionary for .json files.

Return type:

str | dict

Raises:

ValueError – If the file extension is unsupported, the file is not found, template variables are missing, or JSON decoding fails.