ciberlabreport.llm.openai module
Module wrapping OpenAI chat/completions API for report generation.
This module provides the OpenAIWrapper class, which coordinates prompt
loading, schema validation, synchronous/asynchronous clients, and PDF-ready
payload extraction.
- Classes:
- OpenAIWrapper: High-level helper to call OpenAI, enforce schemas, and
convert responses into data structures expected by the PDF builder.
- class ciberlabreport.llm.openai.OpenAIWrapper(openai_api_key: str, schemas_path: Path, prompts_path: Path, model: str = 'gpt-5', temperature: int = 0.0, max_completion_tokens: int = 25000)
Bases:
objectConvenience wrapper that handles prompts, schemas, and OpenAI clients.
- build_user_images_for_pdf(images: list[tuple], response_images: dict) list
Generate the necesary list to append the images in the PDF file.
- Parameters:
images (list[tuple]) – List of pathlib.Path object and base64 representation of each shot.
response_images (dict) – LLM output for the images.
- Returns:
List containing necesary data to PDF for each file: id, path and caption.
- Return type:
list
- build_user_images_to_call(images: list[tuple]) HumanMessage
Creates the HumanMessage to make the LLM call with Images.
- Parameters:
images (list[tuple]) – List of pathlib.Path object and base64 representation of each shot.
- Returns:
Well formed object to pass to the LLM call.
- Return type:
HumanMessage
- call(system: SystemMessage, user: HumanMessage, json_schema: dict[str, Any]) tuple[dict[str, Any], dict[str, Any]]
Executes a structured OpenAI chat completion enforcing a JSON schema.
- Parameters:
system (SystemMessage) – System message describing the system role.
user (HumanMessage) – Prompt content generated from the reduced report data.
json_schema (dict[str, Any]) – JSON schema fed to the response_format API.
- Returns:
Raw response converted to a serializable dict when possible and measurable statistics of the call.
- Return type:
tuple[dict[str, Any], dict[str, Any]]
- Raises:
RuntimeError – If the OpenAI client call fails for any reason.
- call_init_text_with_retry(system_init_text: SystemMessage, response_report: dict[str, Any], init_text_schema: dict[str, Any], max_attempts: int = 3) tuple[dict[str, Any], list[dict[str, Any]]]
Call init text generation with automatic retries on length/schema violations.
- create_stats(stats: list) dict
Generates a dictionary with tokens and money spent for each LLM call.
- Parameters:
stats (list) – List with each LLM callback.
- Returns:
Well formed stats.
- Return type:
dict
- decide_ransomware(response_images: dict, vt_file_data: dict) dict
Decide ransomware verdict using image evidence and VT data.
- Parameters:
response_images (dict) – LLM output for image analysis.
vt_file_data (dict) – Normalized VirusTotal data for the file.
- Returns:
Verdict payload containing decision, confidence, source, and evidence.
- Return type:
dict
- generate_report(reduced: dict[str, Any], images: list[tuple], vt_file_data: dict[str, Any], malware_types: dict[str], families_list: list[str]) tuple[dict[str, Any], Any, list[dict[str, str]]]
Calls the LLM to transform reduced report data into a PDF-ready payload.
- Parameters:
reduced (dict[str, Any]) – Trimmed report generated.
images (list[tuple]) – Path and base64 representation of images.
vt_file_data (dict[str, Any]) – VirusTotal data for the file.
malware_types (dict[str]) – Possible malware types stored in config.
families_list (list[str]) – Possible families names
- Returns:
- Set containing the statistics,
the structured content to render as PDF, the structured content for images in PDF and the initial text to insert in the webapp related with the sample.
- Return type:
tuple[dict, dict[str, Any], list[dict[str, Any]], dict[str, Any]]
- get_file(filename: str, variables: dict = None) str | dict
Load a .txt or .json file from the appropriate base directory.
- For .txt files:
Reads the file as a string.
Applies str.format(**variables) if variables are provided.
- For .json files:
Parses and returns the JSON as a Python dict.
- Parameters:
filename (str) – Name of the file to load. Must end in .txt or .json.
variables (Mapping[str, Any] | None, optional) – A dictionary of variables to format into .txt templates. Ignored for .json files.
- Returns:
The loaded file content: a formatted string for .txt files or a dictionary for .json files.
- Return type:
str | dict
- Raises:
ValueError – If the file extension is unsupported, the file is not found, template variables are missing, or JSON decoding fails.