Sourcetrace PDF is powered by pdfstruct, a source-available PDF extraction layer for converting born-digital PDFs into traceable layout-aware JSON and Markdown.
It is designed for private AI pipelines where page evidence, text position, reading order, tables, annotations, metadata, and diagnostics matter before retrieval or model reasoning.
Powered by pdfstruct · Source-available · Free local evaluation · Commercial licence required for production use
PDF is not a semantic document format.
A PDF page is often a set of positioned drawing operations. Tables, headings, columns, and reading order may not exist explicitly inside the file.
Sourcetrace PDF exposes confidence, evidence, and diagnostics rather than pretending layout recovery is certain.
Sourcetrace PDF is intended to be source-available.
Recommended licence model: Business Source License 1.1 with later conversion to Apache-2.0 after the defined change date.
Sourcetrace PDF is intended to run where the documents already live.
A hosted API is not the default model because sensitive document workflows often require local control, private processing, and clear custody of source material.
Available commercial paths include professional licence, team licence, embedded/OEM licence, Document Structure Review, custom extractor pack, and Structural AI Architecture Sprint where the issue extends beyond extraction.
Discuss Commercial Use