tdoc.

About · Format · Rates · Source
hello@tdoc.xyz   

Documents that read themselves.

A typed, queryable, signable document format for the age of language-model agents — with an open specification and a single-file reference implementation.

Fig. 1 The same sentence, passed through a PDF parser (left) and a tdoc parser (right).

“p = 0.003” is four characters to a parser and a probability to tdoc.

Left — what your LLM sees today

"p = 0.003 at the 10th minute"
"95% CI [0.41, 0.78]"
"$12,400"
"142.7 mg/dL"

Four strings. The agent must guess whether 0.003 is a probability, a footnote marker, or part of a larger number. At scale, the guesses drift — and the citations rot.

Right — what the same agent sees with a .tdoc file

<measure type="pvalue"
         value="0.003">p = 0.003</measure>
<measure type="ci"
         value="[0.41, 0.78]">95% CI</measure>
<cell    type="currency"
         ticker="USD">$12,400</cell>
<cell    type="measure"
         unit="mg/dL">142.7</cell>

Four declarations. The types pvalue, ci, currency, measure are not inferred at read-time — they were authored with the file. The agent cites, rather than guesses.

A letter  ·  To the reader, from the author  ·  Vol. 1

On the problem of the unwitnessed number.

A decade of language models has hidden a more primitive crisis: our documents have never been honest about what they contain. A clinical trial reports a p-value of 0.003 — but the file does not know this, and neither does the reader, until the reader happens to have trained on enough biostatistics to guess.

Every pipeline we have built to repair this has been a pipeline of guesses. We teach large models to read tables. We fine-tune extractors on clinical corpora. We prompt Claude to “please identify the p-value in this paragraph.” The models oblige and, often enough, they are right. But they are always guessing, and occasionally they are confidently wrong — and in regulated domains, confident wrongness is not a failure mode so much as a lawsuit.

tdoc proposes a different contract. Before the agent opens the document, the document already knows: this number is a p-value; that one is a concentration in milligrams per decilitre; this span is a citation; this block is signed by a public key I can verify in a hundred milliseconds. The types are declarations, not inferences. The signature is mathematical, not cultural. The bytes are identical across encodes — hashable, diffable, auditable without ceremony.

We did not invent the desire for this. Scientific publishing has wanted it for thirty years and approximated it with XML schemas that nobody reads. Financial disclosure has wanted it for twenty years and approximated it with XBRL, which can be rendered only by machines. What is new is the reader: the autonomous agent that cannot be bored, cannot read between the lines, and will not improvise a unit that is missing. For that reader, a typed document is the difference between a useful citation and a hallucination.

The spec is open. The reference implementation is a single file of Python. A `.tdoc` archive carries every declaration, every signature, every deterministic byte — ready for agents that must cite, verify, and be accountable.

— Rishi Arun Shivhare, author of AXON 1.0.

Every cell, declared.

AXON separates what a document says from how it renders. The content tree is typed. A pvalue is a probability. A measure carries its units. A currency knows its ticker. The renderer is a separate profile, itself hashable and signable, so the same content can be laid out five different ways without ever lying about the data.

<section id="results">
  <heading level="2">Outcome</heading>
  <paragraph>
    <measure type="pvalue"
             value="0.003">p = 0.003</measure>
    at the 10th minute, with a
    <measure type="ci"
             value="[0.41, 0.78]">
      95 % confidence interval
    </measure>.
  </paragraph>

  <table id="t1" summary="arm-by-arm">
    <cell type="currency"
          ticker="USD">$12,400</cell>
    <cell type="measure"
          unit="mg/dL">142.7</cell>
  </table>
</section>

Rates.

Monthly · USD · cancel any time

tdoc subscription rates
Free 100 documents a month. All parsing features, all query operators, community support, forever. No credit card at signup. $0/mo
Pro 2,000 documents a month. Email support; soft upgrade path — no hard overage block, you simply get a note when you cross. $29/mo
Team 10,000 documents a month. Ed25519 signing and verification. Twenty-four hour email response. Overage billed at $0.02 per document, with a monthly ceiling. $99/mo
Scale 100,000 documents a month. Self-hosted option. Shared Slack channel with the author. Overage at $0.01 per document. Annual contracts available. $499/mo

Enterprise & invoice billing by arrangement — write to hello@tdoc.xyz. We reserve the right to decline customers whose intended use would embarrass us. Self-hosting is always free and always will be: tdoc is Apache 2.0.

“The difference between a good citation and a hallucination is a declared type.