Sheet 01 · What it preserves

Most pipelines flatten the document and hope the meaning survived.

A document is pulled in, turned into a run of text, chunked, embedded, and treated as if its structure did not carry meaning. Often the structure was the meaning. Sourcetrace keeps it.

Layout and reading order

Columns, headers, footnotes and the order a person would actually read them, kept rather than collapsed into one stream of text.
Tables as tables

Rows keep their relationship to columns. A figure stays attached to the period and the line item it belongs to, instead of becoming a loose number in a paragraph.
Annotations in context

Comments, mark-ups and notes stay attached to the clause or cell they refer to, not detached into a separate list with no anchor.
Page evidence

Every extracted element carries a reference back to the document and page it came from. That reference is what makes traceability possible downstream.
Confidence and diagnostics

Where a read is uncertain, Sourcetrace exposes the uncertainty and the diagnostics behind it, rather than presenting an inferred result as a fact.

Sheet 02 · Why it matters

Traceability is built at ingestion, or not at all.

If the page reference is thrown away when the document is read, no amount of work later puts it back. The system can cite a source only if the source was preserved on the way in.

That is why Sourcetrace sits at the front of every build. The five commitments on the engineering standard, traceable above all, depend on what the engine kept.

Sourcetrace runs local-first, inside your environment or private Australian hosting. Your documents are not sent to a third-party service to be read.

In
Contracts, leases, correspondence, project records, PDF and RTF bundles.
Through
Structure preserved: layout, tables, annotations, page evidence, diagnostics.
Out
A traceable document the system can reason over and still cite by page.

Sheet 03 · Engineering provenance

The lineage is in the open.

The structure-preserving approach behind Sourcetrace started as rtfstruct, an open-source parser that reads Rich Text Format as a structured document tree rather than as plain text. It is public, so the engineering thinking is open to inspection.

rtfstruct is published under Apache-2.0. See the source at github.com/keny369/rtfstruct, and the background in why we built rtfstruct.

Sheet 04 · Next step

Put the engine to work on your documents.

01 · Scoping call

Thirty minutes. You describe the documents and where you suspect leakage. We tell you which lane fits and whether a Proof Week is worth your money. Sometimes it is not, and we say so.

02 · Proof Week

We sign your NDA, take a sample of documents, and build. Five days later you watch the system run on your own record.

03 · Your call

Production build, or a findings register and a handshake. No retainer, no lock-in, no deck.

Book a scoping call

Or write direct: hello@lumenandlever.com · Melbourne, Australia

Most pipelines flatten the document and hope the meaning survived.

Layout and reading order

Tables as tables

Annotations in context

Page evidence

Confidence and diagnostics

Traceability is built at ingestion, or not at all.

The lineage is in the open.

Put the engine to work on your documents.