Why PDF files get large
PDF file size is driven by a few main factors. Embedded fonts can add 50–200 KB per font face, even if only a handful of characters are actually used — a properly sub-setted font uses only the glyphs that appear in the document. High-resolution images embedded without compression are the single biggest contributor to PDF size: a raw JPEG from a modern camera embedded without recompression can easily be 5–10 MB per page. Document metadata, revision history, embedded thumbnails, and XML form data all add overhead.
Knowing what is making your PDF large determines which compression strategy will actually help.
Method 1: metadata cleanup (best for office documents)
Office applications — Word, Excel, PowerPoint, Google Docs — export PDFs with extensive embedded metadata: author name, company, creation and modification timestamps, software version strings, revision history, and document properties. A typical Word export carries 50–200 KB of metadata overhead that is invisible to readers.
Metadata cleanup strips these internal properties and re-encodes the cross-reference table in a more compact format (object streams). The result file is functionally identical — same layout, same fonts, same images — just with the overhead removed. Typical savings: 1–15% for office documents, near zero for already-clean PDFs.
Use this method when: you want to clean up a Word, PowerPoint, or Google Docs export before sharing. Do not use this for image-heavy scanned documents where the actual image data is what makes the file large.
Method 2: rasterized compression (best for scans and image PDFs)
Rasterized compression renders each PDF page to a canvas at a reduced scale, re-encodes it as JPEG at your chosen quality level, and rebuilds a new PDF containing those JPEG images. This directly addresses the main size driver — embedded image data.
A scanned 20-page contract that is 40 MB can be reduced to 4–8 MB with rasterized compression at 70% JPEG quality. A design portfolio that is 80 MB can become 10–20 MB. The trade-off is that text in the output PDF is no longer selectable or searchable — every page becomes a JPEG image. This is acceptable for archiving, sharing, or printing, but not for a PDF where the recipient needs to copy text or where the text must be machine-readable.
Use this method when: the PDF is a scan, contains many embedded photos, or is an image-heavy design document. Choose your quality level based on the use case: 70% for email and sharing (smaller file), 80–85% for internal reference copies, 50–60% for thumbnail previews.
Choosing the right quality level
The quality slider in the rasterized mode controls JPEG compression. At 90%, the output is nearly indistinguishable from the original — file savings are modest but image quality is excellent. At 70%, most photographic content looks very good and the file is significantly smaller. Below 60%, visible compression artifacts start appearing on fine text, sharp edges, and highly contrasted areas.
For a multi-page scanned document with text on white background, 75–80% is typically the sweet spot: text remains clear and readable, and the file size is much smaller than the original. For color-rich design documents, 80–85% preserves gradient smoothness and avoids posterization.
When to use other approaches
Browser-based compression has limits. If you need to compress a 500 MB PDF, browser memory constraints may cause it to fail — in that case, split the PDF first with PDF Splitter, compress each section, then merge with PDF Merger.
For PDFs where searchability must be preserved but image content still needs compression, the right tool is Adobe Acrobat's "Reduce File Size" (which sub-samples embedded images without rasterizing the text layer) or Ghostscript on the command line. These server-side tools have more control over exactly which objects get compressed and how.
For sharing very large PDFs online, consider also whether a cloud link is more appropriate than an attachment — Google Drive or Dropbox links have no file size limit and avoid email bounce issues entirely.