ciberlabreport.preprocesing.virustotal module

VirusTotal preprocessing helpers and normalization routines.

This module wraps VirusTotal API access, normalizes raw responses into a compact structure, and derives ransomware indicators from multiple signals. It also defines dataclasses used to represent the normalized data payload.

Environment variables:: MODE (str): When set to “DEV”, raw and normalized outputs are stored on disk. TMP_PATH (str, optional): Base directory used to store debug JSON files.

class ciberlabreport.preprocesing.virustotal.VTConfig(search_url: str, high_threshold: int, medium_threshold: int, ransom_pattern: Pattern, evasion_keywords: Set[str])

Bases: object

Modificable configuration used in the wrapper.

evasion_keywords: Set[str]

high_threshold: int

medium_threshold: int

ransom_pattern: Pattern

search_url: str

class ciberlabreport.preprocesing.virustotal.VTFinalData(sha256: str, verdict: VTVerdict, threat_profile: VTThreatProfile, sandbox_signals: VTSandboxSignals, yara_summary: VTYaraSummary, is_ransomware: bool, ransomware_evidence: List[str] = <factory>)

Bases: object

Normalized VirusTotal data payload returned by the wrapper.

is_ransomware: bool

ransomware_evidence: List[str]

sandbox_signals: VTSandboxSignals

sha256: str

threat_profile: VTThreatProfile

verdict: VTVerdict

yara_summary: VTYaraSummary

class ciberlabreport.preprocesing.virustotal.VTSandboxSignals(malicious_sandboxes: List[str] = <factory>, malware_names: List[str] = <factory>, evasion_indicators: List[str] = <factory>)

Bases: object

Sandbox-based indicators extracted from VT verdicts and tags.

evasion_indicators: List[str]

malicious_sandboxes: List[str]

malware_names: List[str]

class ciberlabreport.preprocesing.virustotal.VTThreatProfile(family: str | None, family_confidence: str, categories: List[str] = <factory>, suggested_label: str | None = None)

Bases: object

Threat family and category profile derived from VT metadata.

categories: List[str]

family: str | None

family_confidence: str

suggested_label: str | None = None

class ciberlabreport.preprocesing.virustotal.VTVerdict(is_malicious: bool, malicious_engines: int, suspicious_engines: int, confidence_level: str)

Bases: object

Summary of VirusTotal engine verdicts for a file hash.

confidence_level: str

is_malicious: bool

malicious_engines: int

suspicious_engines: int

class ciberlabreport.preprocesing.virustotal.VTYaraSummary(highlights: List[str] = <factory>)

Bases: object

YARA rule match highlights from VT crowdsourced results.

highlights: List[str]

class ciberlabreport.preprocesing.virustotal.VirusTotalWrapper(config_path: Path, api_keys: list, mode: str, tmp_path: Path)

Bases: object

Client wrapper to query VirusTotal and normalize responses.

call(filehash: str) → dict

Fetch VirusTotal data for a given file hash.

Parameters:: filehash (str) – SHA256 hash to query in VirusTotal.
Returns:: Raw VirusTotal API response (or empty dict on failure).
Return type:: dict

check_store_result(out: dict, outname: str) → None

Optionally store debug JSON data when running in DEV mode.

Parameters:

out (dict) – Data to serialize to disk.
outname (str) – Filename appended to the TMP_PATH directory.

get_data(raw: dict) → dict

Fetch and normalize VirusTotal data for an input report payload.

Parameters:: raw (dict) – Raw input JSON containing target file metadata.
Returns:: Normalized VirusTotal data. Empty dict if something fails.
Return type:: dict

normalize(filehash: str, vt_raw: dict) → dict

Normalize a VirusTotal response into the internal data model.

This method extracts verdict counts, threat profiles, sandbox signals, YARA highlights, and ransomware evidence into a structured dict.

Parameters:

filehash (str) – SHA256 hash tied to the VT response.
vt_raw (dict) – Raw VirusTotal API response data.

Returns:

Normalized VirusTotal data as a plain dictionary.

Return type:

dict

read_hash_from_raw(raw: dict) → str

Extract the SHA256 hash from the input report payload.

Parameters:: raw (dict) – Original JSON input with a nested target/file section.
Returns:: SHA256 hash string if present; otherwise an empty string.
Return type:: str