Your LLM can finally read tables, math, and data.
tdoc turns any PDF, DOCX, or HTML into a .tdoc — a typed, queryable,
signable document file. Every number carries a declared type. Every file carries an
Ed25519 signature. Every archive is byte-deterministic — same input, same bytes out,
always. Ship in a single REST call.
What you get
Typed every node
Cells declare data-type="pvalue" / measure / currency. Your downstream code never guesses.
Query with AQL
"Find every p-value < 0.05 in this 300-page paper" — one line, no LLM needed.
Signed & verifiable
Ed25519 over canonical bytes. Tamper once, verification fails forever.
Byte-deterministic
Same input → same output, guaranteed. Hash it, sign it, audit it.
Accessibility built in
alt-text coverage, equation descriptions, reading order — not an afterthought.
Open format
Run the SDK offline. Your data never needs to stay on our servers.
Two lines to structured data
curl -X POST https://api.tdoc.xyz/v1/structure \
-H "Authorization: Bearer $TDOC_KEY" \
-F "file=@paper.pdf" \
-F "title=My Paper"
Returns an AXON document tree. Run /v1/query for AQL, /v1/sign for
Ed25519, /v1/verify to validate a third-party archive. Full OpenAPI spec at
/docs.
Pricing
Start free. No credit card. Upgrade when your project does.
Free
- 100 documents / month
- All parsing features
- AQL queries included
- Community support
Pro
- 2,000 documents / month
- Email support
- No hard overage — upgrade anytime
Signing(Team+)
Team
- 10K documents / month
- Ed25519 signing + verification
- Priority email (24h SLA)
- Overage: $0.02 / doc
Scale
- 100K documents / month
- Self-hosted option
- Slack support channel
- Overage: $0.01 / doc
Why now
Every AI team is losing 20–40% of document quality to broken PDF ingestion. Tables become strings. Equations become gibberish. P-values lose their type. tdoc gives LLMs the same structured view a database has — so your agents can cite a number, not just see it.