.cv Open Resume File Format
§03 / Specification

The .cv format, version 1.0.

Source of truth: spec/cv-1.0.md. Licensed under CC BY 4.0. Frozen and versioned; changes ship as a new spec number.

normative text · cv-1.0.md

.cv File Format — Specification, version 1.0

Status: Stable. Within this MAJOR (1.x) consumers MUST ignore unknown fields and continue rendering. Breaking changes require a new MAJOR.

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, MAY, REQUIRED, RECOMMENDED, and OPTIONAL in this document are to be interpreted as described in BCP 14 (RFC 2119 and RFC 8174) when, and only when, they appear in all capitals.

1. Introduction

A .cv file bundles three coordinated representations of a single document, plus optional pre-computed embeddings:

  • A designed PDF — what humans see.
  • A clean Markdown copy — what bots, ATS systems, and LLMs read.
  • A self-contained HTML rendering — what web pages embed.
  • Optional embeddings — pre-computed vectors over the markdown so retrieval pipelines can skip an embedding pass.

A .cv file IS a valid PDF. Existing PDF readers open it without modification. The other representations are carried inside the PDF as PDF/A-3 Associated Files (/AF).

1.1 Goals

  • One source of truth for a document, with three audience-specific representations always in sync.
  • Zero-install human fallback: the file opens in any PDF reader on day one.
  • Bot-ready: the markdown is trivial to extract for indexing, ATS parsing, and LLM context.
  • Web-embeddable: the HTML is self-contained and ready to drop into a <cv-embed> web component.
  • AI-ready: optional pre-computed embeddings let RAG pipelines index the file without re-embedding.
  • Forward-compatible: unknown fields are ignored within the same major version.

1.2 Non-goals (in 0.x)

  • Encryption (revisit in 1.1).
  • Digital signatures (revisit in 1.x via PAdES).
  • Multi-document containers (one document per .cv).

1.3 Conformance levels

  • cv-strict: passes veraPDF PDF/A-3u conformance and satisfies every MUST in this spec.
  • cv-lenient: is a valid PDF, carries the cv:version XMP marker, and contains at least one valid content payload. Useful for environments where producing PDF/A-3u is impractical.

Implementations MUST state which level they produce and SHOULD support reading both.

2. Terminology

  • Producer — software that writes a .cv file.
  • Consumer — software that reads a .cv file.
  • Container — the wrapping PDF.
  • Payload — a file embedded in the container via /AF.
  • Primary payload — the payload that consumers should treat as the canonical text representation when no other preference applies. Identified by cv:primaryPayload.
  • Alternate payload — a payload carrying the same content as the primary in another representation or language.
  • Supplement payload — a payload that adds material not present in the primary (cover letter, portfolio link list, etc.).

3. Container

3.1 PDF version

A .cv file MUST be a valid PDF 1.7 or PDF 2.0 file.

3.2 PDF/A-3u conformance (cv-strict)

A cv-strict file MUST conform to PDF/A-3u (ISO 19005-3, Unicode level). PDF/A-3u is the only PDF/A level that permits arbitrary embedded files and that guarantees Unicode text extraction.

Practical implication: every font used in the visual PDF MUST be fully embedded (ISO 19005-3 § 6.2.11.4.1). Producers using standard 14 PDF base fonts by name (Helvetica, Times, Courier, Symbol, ZapfDingbats) will fail this requirement. Embedding is performed by the input-PDF generator, not by .cv packers — most modern producers (LaTeX/XeLaTeX, browser print-to-PDF, Word/LibreOffice export, headless Chromium) embed fonts by default.

3.3 Output intent

A cv-strict file SHOULD include an embedded ICC profile (typically sRGB IEC61966-2.1) referenced from a /OutputIntent.

3.4 Forbidden constructs

A .cv file MUST NOT contain:

  • PDF JavaScript actions (/JS or /JavaScript).
  • /Launch actions.
  • /ImportData actions.
  • /SubmitForm actions targeting any URI other than mailto:.
  • An /Encrypt dictionary (no encryption in 0.x).
  • External stream references (/F filespecs pointing to files outside the container).

Validators MUST reject files containing any of these.

4. Embedded payloads

Payloads are carried as PDF Associated Files via the /AF entry on the document catalog (see ISO 32000-2 §14.13).

4.1 Filespec dictionary requirements

Each /Filespec dictionary representing a .cv payload MUST set the following entries:

Entry Type Requirement
/Type name /Filespec
/F string filename in PDFDocEncoding
/UF string (UTF-16BE BOM) filename in Unicode
/EF dict embedded file dict with /F pointing to the stream
/Desc string human-readable description
/AFRelationship name one of /Alternative, /Data, /Supplement
/Subtype name MIME type as a name (e.g. /text#2Fmarkdown)

The embedded file stream MUST include /Params << /ModDate (...) /Size N /CheckSum <md5> >> per the PDF spec.

4.2 Required content

A .cv file MUST carry at least one of:

  • resume.mdtext/markdown; charset=UTF-8, /AFRelationship /Alternative. RECOMMENDED as primary payload.
  • resume.htmltext/html; charset=UTF-8, /AFRelationship /Alternative. The HTML SHOULD be self-contained (inline CSS, no external <script>/<link>).

4.3 Optional content

  • resume.jsonapplication/json, /AFRelationship /Alternative. SHOULD follow JSON Resume v1.0.0 schema.
  • embeddings.cborapplication/vnd.cv.embeddings+cbor, /AFRelationship /Data. See §5.
  • attachments/<name> — any MIME, /AFRelationship /Supplement.

4.4 Filename rules

Filenames MUST be POSIX-portable (matching [A-Za-z0-9._/-]+) and SHOULD be lowercase. Slashes denote logical grouping (e.g. attachments/portfolio.pdf); they have no required filesystem meaning.

For internationalised content, language-tagged filenames take the form resume.<bcp47>.md (e.g. resume.fr.md, resume.zh-Hant.md). The unsuffixed resume.md represents cv:primaryLanguage.

4.5 Ordering (informative)

Producers SHOULD order the /AF array as: primary payload, alternates by language, embeddings, supplements. This eases byte-identical round-trip testing across producers.

5. Embeddings payload

The embeddings.cbor payload carries one or more pre-computed vector representations of the markdown.

5.1 Schema

The CBOR document MUST be a map with the following keys:

embeddings = {
  format-version: uint,                 ; this spec section version, current = 1
  spaces: [+ space],                    ; one or more embedding spaces
}

space = {
  model:           tstr,                ; model identifier (e.g. "BAAI/bge-m3")
  model-revision:  tstr,                ; pinned revision (e.g. HuggingFace commit sha)
  dimension:       uint,
  metric:          "cosine" / "dot" / "euclidean",
  normalized:      bool,
  chunking:        "document" / "section" / "paragraph",
  chunks:          [+ chunk],
}

chunk = {
  id:           tstr,                   ; stable id within the file (e.g. "experience")
  text-offset:  uint,                   ; byte offset into resume.md
  text-length:  uint,                   ; byte length within resume.md
  vector:       bstr,                   ; little-endian float32 array, length = dimension * 4
}

vector is a binary string of dimension * 4 bytes containing little-endian IEEE 754 float32 values.

5.2 Recommended baseline model

Producers using a single embedding space SHOULD use BAAI/bge-m3 (MIT, multilingual, 1024-dim, cosine, normalised). This permits cross-.cv comparison without an explicit agreement between producer and consumer.

5.3 Multiple spaces

Producers MAY include vectors in additional embedding spaces (e.g. text-embedding-3-large, voyage-3, gemini-text-004). Each space is independent; consumers select the space they trust. Producers SHOULD NOT ship more than four spaces in a single file.

5.4 Trust and verification

Consumers MAY recompute vectors from the matching model + revision and compare to the shipped vectors as a tampering check. The spec does not require this.

6. Metadata (XMP)

6.1 Namespace

The XMP namespace for this format is:

URI:    http://ns.cvfile.org/cv/1.0/
prefix: cv

6.2 Required properties

Property Type Notes
cv:version text Spec version, e.g. "0.1" or "1.0"
cv:created xs:dateTime ISO 8601 UTC, e.g. "2026-05-10T12:00:00Z"
cv:primaryLanguage text BCP-47 tag, e.g. "en", "fr-CA"
cv:primaryPayload text Filename of the canonical text payload, e.g. "resume.md"

6.3 Recommended properties

Property Type Notes
cv:modified xs:dateTime Last modification
cv:generator text Producer string, e.g. "@cvfile/sdk/0.1.0"
cv:alternates Text (JSON-encoded array) One entry per alternate payload: { payload, language, mimeType }
cv:integrity Text (JSON-encoded array) One entry per payload: { payload, algorithm, digest }
cv:embeddings Text (JSON-encoded array) One entry per embedding space: { model, dimension, metric, chunks }

These three properties are simple XMP Text values whose content is a JSON-encoded array of objects (UTF-8, XML-escaped), declared with pdfaProperty:valueType="Text" in the PDF/A extension schema. They are not rdf:Bag containers. Consumers parse the element text as JSON.

6.4 Integrity

When cv:integrity is present, digests are computed over the unwrapped payload bytes (not the PDF EmbeddedFile stream). Algorithm names follow the IANA "Hash Function Textual Names" registry. sha-256 is the recommended algorithm.

Validators MUST verify integrity entries when present and MUST fail validation on mismatch.

7. Security

7.1 Threat model

Untrusted .cv files arrive via email, web download, or chat. The attacker's goal is code execution, data exfiltration, or lateral movement when a recruiter or ATS opens the file.

7.2 Validator requirements

A conformant validator MUST reject files that contain any construct listed in §3.4. A conformant validator MUST verify integrity digests when present and reject on mismatch.

7.3 Renderer requirements

Viewers MUST:

  • Render embedded HTML inside a sandboxed iframe with no allow-scripts allow-same-origin.
  • Disable raw HTML when rendering Markdown by default. Producers may opt in to relaxed rendering with explicit sanitisation.
  • Strip javascript: URIs from links.
  • Cap decompressed payload size; default 16 MiB per payload.

7.4 XML parsing

XMP and any other XML inside the file MUST be parsed without DTD resolution and without external entity resolution.

8. Versioning and compatibility

8.1 Version format

cv:version follows MAJOR.MINOR. Pre-stable releases use 0.MINOR.

8.2 Same-major compatibility

Within the same MAJOR:

  • Producers MAY add new XMP properties under the cv: namespace.
  • Producers MAY add new /AF entries with new AFRelationship values or new payload mime types.
  • Consumers MUST ignore unknown XMP properties and continue rendering.
  • Consumers MUST ignore /AF entries with unrecognised AFRelationship for rendering, but MUST return them in extraction APIs.

8.3 Cross-major behaviour

When a consumer encounters a cv:version with a higher MAJOR than it knows:

  • It SHOULD still render the visual PDF (the file remains a valid PDF).
  • It MUST surface a "newer format version" warning.
  • It MUST NOT silently drop unknown payloads from extraction APIs.

8.4 Deprecation

Properties may be deprecated in a MINOR release but MUST NOT be removed within the same MAJOR. Removal requires a new MAJOR.

9. IANA considerations

Upon registration, the media type for .cv files will be:

  • Type: application
  • Subtype: vnd.cv+pdf
  • Required parameters: none
  • Optional parameters: none
  • Encoding considerations: binary
  • Security considerations: see §7
  • Interoperability considerations: any PDF reader can open the file. .cv-aware tools additionally read the /AF payloads.

The filename extension is .cv.

10. References

10.1 Normative

  • ISO 32000-2:2020, Document management — Portable document format — PDF 2.0
  • ISO 19005-3:2012, Document management — Electronic document file format for long-term preservation — Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3)
  • ISO 16684-1:2019, Graphic technology — Extensible metadata platform (XMP)
  • RFC 2119, RFC 8174, RFC 5646 (BCP 47), RFC 6838 (media-type registration)

10.2 Informative

11. Appendix A — Worked example

A minimal cv-strict file contains:

  • A 1-page sRGB PDF/A-3u of a CV.
  • An /AF array with one entry: resume.md.
  • An XMP packet containing:
    <rdf:Description xmlns:cv="http://ns.cvfile.org/cv/1.0/">
      <cv:version>0.1</cv:version>
      <cv:created>2026-05-10T12:00:00Z</cv:created>
      <cv:primaryLanguage>en</cv:primaryLanguage>
      <cv:primaryPayload>resume.md</cv:primaryPayload>
      <cv:generator>@cvfile/sdk/0.1.0</cv:generator>
    </rdf:Description>
    

A consumer reads the catalog's /AF array, locates the resume.md filespec, decompresses the embedded file stream, and returns the Markdown.

12. Change log

  • 1.0 (this document) — first stable release. Frozen to enable IANA registration of application/vnd.cv+pdf and downstream tooling lock-in. No normative changes from 0.1; status promoted from pre-stable to stable.
  • 0.1 — initial pre-stable draft used to gather interop feedback during the v0.x series.