One file. Three audiences. Real semantic search.
.cv is the open file format for resumes. One file packs a
designed PDF, a clean Markdown copy, a self-contained HTML rendering, and
pre-computed BGE-M3 vectors inside a single PDF/A-3u that opens
everywhere on day one.
Pick the path that matches your role
Fill a form. Download a .cv file. Upload it to Workday,
Greenhouse, Lever, LinkedIn, your site. One file, every audience.
Use the LangChain, LlamaIndex, or Haystack loader. Or the 200-line
cvfile-cv-detector sniffer. Skip OCR, skip re-embedding.
Use the server middleware (Express, Fastify, Hono, FastAPI, Django,
Go net/http). Content negotiation makes the wrapper
invisible to existing crawlers.
Why .cv exists
Today a person sends three different artifacts to three different audiences: a polished PDF to recruiters, a Markdown copy to ATS systems, a web bio for their site. They drift out of sync within weeks. AI tools re-embed the same documents over and over.
.cv fixes all three: one file, one source of truth, opens in
any PDF reader on day one. Bots that ask for text/markdown
get the markdown back. RAG pipelines that recognize the format read
pre-computed BGE-M3 vectors directly instead of re-embedding.
How is this different from JSON Resume or Europass?
| Format | Visual PDF | Clean ATS text | Pre-computed vectors | Opens in any PDF reader |
|---|---|---|---|---|
.cv | yes | yes | yes | yes |
| JSON Resume / FRESH | no (data only) | yes | no | no |
| Europass XML | no (data only) | yes | no | no |
| HR-XML / HR Open | no (data only) | yes | no | no |
| Plain PDF | yes | no (OCR) | no | yes |
What ships today
- Stable spec at cv-1.0.
- Reference SDKs in JavaScript, Python, Go.
- Single-binary CLI (
cv extract / inspect / validate / search). <cv-embed>web component (Lit, ~10 KB shell, lazy PDF.js worker).- Server middleware for Express, Fastify, Hono, FastAPI, Flask, Django, Go
net/http. - Optional BGE-M3 embedding generation (
@cvfile/embed,cvfile[embed]). - LangChain, LlamaIndex, and Haystack document loaders.
- 200-line reference sniffer (
cvfile-cv-detector) for crawler vendors.
Hello, .cv
# Build a .cv from a markdown CV
cv pack \
--pdf resume.pdf \
--md resume.md \
--html resume.html \
--lang en \
-o resume.cv
# Read it back any way you like
cv extract resume.cv --format md # markdown stream
cv inspect resume.cv --json # XMP + payload metadata
cv validate resume.cv --strict # PDF/A-3u + cv-strict gate
cv search resume.cv "kubernetes" # semantic search via BGE-M3 The killer property
A producer ships one .cv file. The HTTP middleware
makes the wrapper invisible to consumers:
| Consumer Accept header | Sees |
|---|---|
text/html, */* (LLM crawlers, browsers) | Extracted resume.html |
text/markdown (newer agents, our SDK) | Extracted resume.md |
application/pdf | The visual PDF |
No Accept header | The visual PDF (renders in built-in viewer) |
The bot never needs to know .cv exists. The format is producer-side
convenience; consumer-side it is invisible.
Frequently asked questions
What is the .cv file format?
.cv is an open file format for resumes. It is a valid PDF/A-3u file that bundles a designed PDF, a clean Markdown copy, a self-contained HTML rendering, and optional pre-computed BGE-M3 embeddings inside one file. Any PDF reader opens it visually; ATS systems and AI agents read the embedded Markdown directly.
How is .cv different from JSON Resume, FRESH, or Europass?
Those are pure data schemas (JSON or XML) that drop the visual artifact and require the consumer to render. .cv keeps the designer-controlled PDF intact and travels the JSON/Markdown/HTML alongside it as PDF Associated Files (ISO 32000-2 §14.13). A recruiter still opens a polished PDF; an ATS or LLM still reads clean text; both come from the same file.
Do I need a special viewer to open a .cv file?
No. A .cv is a valid PDF. Preview, Adobe Reader, Chrome, every PDF reader shipped in the last fifteen years opens it normally and shows the visual layer. The additional payloads (Markdown, HTML, embeddings) are discoverable via standard PDF Associated Files mechanism.
Why are pre-computed embeddings inside the file?
So that any third-party RAG pipeline indexing the file (LangChain, LlamaIndex, Haystack, custom vector DB) can skip the embedding API call entirely. Default model is BAAI BGE-M3 (MIT licensed, multilingual, 1024-dim, free). Producers may also ship vectors in proprietary spaces (OpenAI text-embedding-3-large, Voyage-3, Gemini-text-004) when they target a specific downstream stack.
What MIME type does a .cv file use?
application/vnd.cv+pdf, registered (pending) with IANA per RFC 6838 vendor tree and RFC 8081 structured suffix +pdf. Until IANA approves, servers safely emit application/pdf alongside a Link header advertising the .cv alternates. A 200-line reference sniffer (cvfile-cv-detector, available in Python, Go, and TypeScript) detects .cv wrapping inside any application/pdf bytes.
Is the format and tooling free?
Yes. The spec is CC-BY-4.0. The CLI, the three SDKs (JavaScript, Python, Go), the web component, the server middleware, and the RAG integrations (langchain-cvfile, llama-index-readers-cvfile, cvfile-haystack) are Apache-2.0. There is no vendor lock-in. A planned cvfile.org/cloud paid tier exists for hosted convenience but is non-essential.
Can a job seeker create a .cv file?
Two paths. (1) Use the in-browser builder at cvfile.org/create — fill a form, download a .cv with PDF + Markdown payloads ready. (2) Install the CLI (brew install cvfile/tap/cv) and run cv pack with your existing PDF + Markdown.
How can an ATS or LLM read the inside of a .cv file?
Several options. (a) Use one of the published reference SDKs (npm: @cvfile/sdk, PyPI: cvfile, Go: github.com/cvfile/cv/sdks/go). (b) Use one of the RAG integrations (langchain-cvfile, llama-index-readers-cvfile, cvfile-haystack). (c) Use the 200-line cvfile-cv-detector reference sniffer that depends only on the PDF parser the host already trusts. (d) Send an Accept: text/markdown header to a server running the @cvfile/server middleware; you receive the Markdown payload directly.