DeepSeek OCR
A picture is worth a thousand words. Discover how DeepSeek-OCR's visual modality compresses long text by 10x while preserving full semantic meaning.
DeepSeek-OCR: Contexts Optical Compression Technology
Why use images for text? Because images are higher-dimensional information carriers. Drag the slider to see how DeepSeek-OCR maintains high accuracy with extremely low Token usage. The key insight behind DeepSeek-OCR is that visual representations encode spatial relationships, font hierarchies, and layout semantics that would require thousands of extra tokens to describe textually. DeepSeek-OCR fundamentally rethinks how language models consume document content.
Pure Text
Visual Tokens
Optimal Zone
DeepSeek-OCR achieves its remarkable compression by treating documents as images rather than character sequences. This optical approach preserves structural information like tables, headers, and formatting that traditional text tokenizers discard, while using 10x fewer tokens. The DeepSeek-OCR compression pipeline is trained end-to-end, ensuring that the visual tokens retain maximum semantic fidelity.
DeepSeek-OCR DeepEncoder Architecture
To achieve both High-Res Input and Low-Token Output, DeepSeek designed a serial architecture for DeepSeek-OCR called DeepEncoder. This three-stage DeepSeek-OCR pipeline processes documents at full resolution while aggressively compressing the output token count, balancing visual fidelity with computational efficiency.
(High Res)
1. SAM Encoder
2. Conv Compressor
3. CLIP Encoder
Latent Tokens
Hover or click on a module to see details
The DeepSeek-OCR architecture is deliberately modular: SAM handles perception, the Compressor handles efficiency, and CLIP handles understanding. This separation of concerns allows each component to be optimized independently, and the entire DeepSeek-OCR pipeline can be fine-tuned end-to-end for specific document types like invoices, academic papers, or handwritten notes.
多分辨率适配 (Adaptive Resolution)
DeepSeek-OCR 支持多种分辨率模式,从极速的 Tiny 模式到处理超大报纸的 Gundam (高达模式) 拼接模式,灵活应对不同场景。
FAQ
Knowledge Quiz
How much have you learned?
