Threat-intelligence corpus

OPTIC / OPTICLab

A security reporting pipeline that turns vendor research into a normalized, source-backed threat-intelligence corpus and public dashboard.

Product Surface

OPTICLab public dashboard
OPTICLab public release surface with corpus metrics and downloadable snapshot context.
OPTIC architecture diagram
Vendor report ingestion, hybrid extraction, normalization, PostgreSQL corpus, and public release layer.

Problem

Threat reports contain useful actor, malware, infrastructure, technique, and relationship data, but analysts often have to read long narrative reports manually and reconstruct structure themselves. OPTIC makes the extracted structure queryable while preserving links back to source text.

Solution

The pipeline ingests reporting from sources such as Mandiant, Google security resources, Cisco Talos, CrowdStrike, Microsoft, Sophos, and Unit42. It combines deterministic extraction, LLM-assisted inference, vendor-specific normalization, and PostgreSQL models for articles, extraction JSON, entities, aliases, mentions, relationships, and technique facts.

Data Flow

OPTIC extraction flow diagram
Reports move from collection into quoted extraction records, normalized entities, and analyst-facing dashboard metrics.

Design Decisions

Source Traceability

Source quotes and extraction JSON are retained so structured records can be challenged and traced.

Archive Versus Corpus

Public copy separates the broader archive from the analyst-usable operational corpus to avoid inflated claims.

Hybrid Extraction

Heuristics handle repeatable patterns while model-assisted inference helps with ambiguous narrative context.

Snapshot Release

The public dashboard and downloadable database snapshot make the corpus inspectable outside the local dev environment.

Verified Corpus Signals

The dashboard documentation distinguishes 571 normalized archive articles from 402 analyst-usable operational corpus reports. It also documents 18,319 mentions, 18,220 source-quoted mentions, 7,243 searchable IOCs, and 1,113 relationships for the current operational projection.