cvfile.org

One file. Three audiences. Real semantic search.

.cv is the open file format for resumes. One file packs a designed PDF, a clean Markdown copy, a self-contained HTML rendering, and pre-computed BGE-M3 vectors inside a single PDF/A-3u that opens everywhere on day one.

Create your .cv in the browser Install the CLI Read the spec

Pick the path that matches your role

I'm writing my resume.

Fill a form. Download a .cv file. Upload it to Workday, Greenhouse, Lever, LinkedIn, your site. One file, every audience.

I'm building a RAG / AI tool.

Use the LangChain, LlamaIndex, or Haystack loader. Or the 200-line cvfile-cv-detector sniffer. Skip OCR, skip re-embedding.

I run an ATS or job board.

Use the server middleware (Express, Fastify, Hono, FastAPI, Django, Go net/http). Content negotiation makes the wrapper invisible to existing crawlers.

Why .cv exists

Today a person sends three different artifacts to three different audiences: a polished PDF to recruiters, a Markdown copy to ATS systems, a web bio for their site. They drift out of sync within weeks. AI tools re-embed the same documents over and over.

.cv fixes all three: one file, one source of truth, opens in any PDF reader on day one. Bots that ask for text/markdown get the markdown back. RAG pipelines that recognize the format read pre-computed BGE-M3 vectors directly instead of re-embedding.

How is this different from JSON Resume or Europass?

Format Visual PDF Clean ATS text Pre-computed vectors Opens in any PDF reader
.cv yesyesyesyes
JSON Resume / FRESH no (data only)yesnono
Europass XML no (data only)yesnono
HR-XML / HR Open no (data only)yesnono
Plain PDF yesno (OCR)noyes

What ships today

Hello, .cv

# Build a .cv from a markdown CV
cv pack \
  --pdf resume.pdf \
  --md resume.md \
  --html resume.html \
  --lang en \
  -o resume.cv

# Read it back any way you like
cv extract resume.cv --format md      # markdown stream
cv inspect resume.cv --json           # XMP + payload metadata
cv validate resume.cv --strict        # PDF/A-3u + cv-strict gate
cv search  resume.cv "kubernetes"     # semantic search via BGE-M3

The killer property

A producer ships one .cv file. The HTTP middleware makes the wrapper invisible to consumers:

Consumer Accept header Sees
text/html, */* (LLM crawlers, browsers) Extracted resume.html
text/markdown (newer agents, our SDK) Extracted resume.md
application/pdf The visual PDF
No Accept header The visual PDF (renders in built-in viewer)

The bot never needs to know .cv exists. The format is producer-side convenience; consumer-side it is invisible.

Frequently asked questions

What is the .cv file format?

.cv is an open file format for resumes. It is a valid PDF/A-3u file that bundles a designed PDF, a clean Markdown copy, a self-contained HTML rendering, and optional pre-computed BGE-M3 embeddings inside one file. Any PDF reader opens it visually; ATS systems and AI agents read the embedded Markdown directly.

How is .cv different from JSON Resume, FRESH, or Europass?

Those are pure data schemas (JSON or XML) that drop the visual artifact and require the consumer to render. .cv keeps the designer-controlled PDF intact and travels the JSON/Markdown/HTML alongside it as PDF Associated Files (ISO 32000-2 §14.13). A recruiter still opens a polished PDF; an ATS or LLM still reads clean text; both come from the same file.

Do I need a special viewer to open a .cv file?

No. A .cv is a valid PDF. Preview, Adobe Reader, Chrome, every PDF reader shipped in the last fifteen years opens it normally and shows the visual layer. The additional payloads (Markdown, HTML, embeddings) are discoverable via standard PDF Associated Files mechanism.

Why are pre-computed embeddings inside the file?

So that any third-party RAG pipeline indexing the file (LangChain, LlamaIndex, Haystack, custom vector DB) can skip the embedding API call entirely. Default model is BAAI BGE-M3 (MIT licensed, multilingual, 1024-dim, free). Producers may also ship vectors in proprietary spaces (OpenAI text-embedding-3-large, Voyage-3, Gemini-text-004) when they target a specific downstream stack.

What MIME type does a .cv file use?

application/vnd.cv+pdf, registered (pending) with IANA per RFC 6838 vendor tree and RFC 8081 structured suffix +pdf. Until IANA approves, servers safely emit application/pdf alongside a Link header advertising the .cv alternates. A 200-line reference sniffer (cvfile-cv-detector, available in Python, Go, and TypeScript) detects .cv wrapping inside any application/pdf bytes.

Is the format and tooling free?

Yes. The spec is CC-BY-4.0. The CLI, the three SDKs (JavaScript, Python, Go), the web component, the server middleware, and the RAG integrations (langchain-cvfile, llama-index-readers-cvfile, cvfile-haystack) are Apache-2.0. There is no vendor lock-in. A planned cvfile.org/cloud paid tier exists for hosted convenience but is non-essential.

Can a job seeker create a .cv file?

Two paths. (1) Use the in-browser builder at cvfile.org/create — fill a form, download a .cv with PDF + Markdown payloads ready. (2) Install the CLI (brew install cvfile/tap/cv) and run cv pack with your existing PDF + Markdown.

How can an ATS or LLM read the inside of a .cv file?

Several options. (a) Use one of the published reference SDKs (npm: @cvfile/sdk, PyPI: cvfile, Go: github.com/cvfile/cv/sdks/go). (b) Use one of the RAG integrations (langchain-cvfile, llama-index-readers-cvfile, cvfile-haystack). (c) Use the 200-line cvfile-cv-detector reference sniffer that depends only on the PDF parser the host already trusts. (d) Send an Accept: text/markdown header to a server running the @cvfile/server middleware; you receive the Markdown payload directly.