Home/Blog/005
005 · Guide · December 2025

Scanned PDFs,
rescued.

~4 minute read Why you can't copy text out of them, why they look fuzzy when you zoom, and the five-minute fix.

Someone emails you a PDF. You open it. You try to copy a sentence. Your cursor moves but nothing highlights. You zoom in to read something and the letters dissolve into pixels. The file is 40 MB for eight pages.

What happened: the PDF isn't a document. It's a stack of photographs of a document. Here's why that's different, and what you can do about it.

Two kinds of PDF

The PDF format can hold either:

Both are valid PDFs. They look similar on a quick glance. But they behave completely differently the moment you try to do anything with them.

Why this matters

An image-based PDF is, for computer purposes, a picture. You can't search it. You can't copy from it. If you convert it to DOCX with a naïve tool, you get a DOCX containing the picture — not editable text. Accessibility tools (screen readers) can't read it.

How to tell which one you have

Open it and try to select a word with your cursor. If text highlights, you have a text-based PDF. If your cursor just sweeps over the page without selecting anything, you have an image-based PDF.

The fix: OCR

Optical character recognition (OCR) reads the pixels and extracts the text. After OCR, you have actual characters you can copy, search, and edit.

In Formatly, drop the PDF on the home page and pick OCR (Extract Text). You'll get back a .txt file containing the readable text. For a proper Word document, open the .txt in Word or Google Docs and format from there.

Tips for better scans

If you're the one doing the scanning, a few things help OCR enormously:

Why the file is so big

Each page is a photo — often at high resolution, sometimes in full color. Eight photos easily adds up to 40 MB. A text-based PDF of the same content might be 200 KB.

If you need a smaller file, convert the scan's individual pages to JPG with moderate compression, or re-export after OCR as a text-based PDF. Size drops by an order of magnitude.

Related