For capital markets teams that run on documents
Finally get 100% quality on the pockets of dark data you actually care about.
Open-source Airlock proves the raw document never went into the model, and extraction tournaments turn each validated win into compounding quality across future extractions.
Your data is trapped inside your documents.
The numbers you need — holdings, rents, footnotes, terms — are locked inside thousands of PDFs and filings.
Asking an LLM to read one file gives you an answer. Asking it to read five thousand gives you a bill and no way to verify the output.
If the document has to go into the model, the conversation can stop before it starts. Verbal assurances are not enough for capital markets workflows.
New filings arrive. The team re-does the same work. Nothing compounds.
Four steps from documents to verified data.
Start with the field, workflow, and document family that matter most — holdings, footnotes, rate components, lease terms.
Multiple extraction strategies run across the document set. The one that proves it can get to 100% quality on that pocket becomes the champion.
Open-source Airlock proves the document never crosses the model boundary. The model sees bounded telemetry, not raw source files.
Each validated champion raises quality and prevents backsliding, then expands to more filings and layouts over time. Next quarter starts ahead instead of back at zero.
One real example: BDC holdings extraction.
5,320 SEC 10-Q filings, Schedule of Investments. Costs vary by document type — this is one tournament we actually ran.
Open-source Airlock proves the raw document never went into the model.
The document stays inside a deterministic harness. The model sees bounded telemetry, not raw source files. Airlock is open source, so you can inspect exactly what did and did not cross. Then every extracted row carries its own evidence trail.
Get early access.
We are starting with a small set of real document pools and expanding from there.
No spam. We reach out when your pool type is live.