AI systems do not consume contracts, statements, policies, correspondence, and case files directly.
They consume whatever the ingestion pipeline produces.
If that pipeline flattens structure, loses tables, detaches annotations, removes page evidence, or breaks source traceability, the model is reasoning over damaged material.
AUD 9,500 to 12,500 · 1 week
This review is for organisations using documents in AI or retrieval workflows where structure matters.
A document can look orderly to a human and become disorderly the moment it enters a pipeline.
PDFs are often positioned drawing instructions, not semantic documents. RTF files can contain structure that disappears when treated as plain text.
Once structure is lost, later AI stages try to infer what the ingestion layer already destroyed.
Lumen & Lever maintains Sourcetrace, a local-first document-structure layer used to inspect how document meaning survives extraction.
Sourcetrace RTF is powered by rtfstruct, a free open-source parser for reading RTF as structure, not just text.
Sourcetrace PDF is powered by pdfstruct, a source-available commercial parser for traceable PDF extraction.
The review remains tool-agnostic in conclusion. If a better existing tool fits the document set, the recommendation will say so.
If document structure is unreliable at ingestion, retrieval and model reasoning inherit the damage.
Start the intake