OCR, explained in ten minutes

You have a PDF. You try to copy a sentence. Nothing highlights. That PDF is a picture — somebody scanned a page and saved the scan inside a PDF wrapper. The computer sees pixels, not letters.

OCR (optical character recognition) is the thing that reads the pixels and gives you letters back.

What OCR actually does

An OCR engine slices an image into lines, lines into words, words into glyphs, then matches each glyph to a character. The better engines also handle layout — columns, headers, tables — so the output looks vaguely like the input instead of a single mashed paragraph.

Modern OCR is extremely good at printed English. It's good at most other printed languages. It's passable at clean handwriting. It's terrible at messy handwriting, stylized fonts, and text on busy photographic backgrounds.

When you need it

Scanned PDFs where you can't select text.
Photos of receipts, whiteboards, slides, or menus.
Screenshots where you need the text but not the image.
Old documents that were digitized without an OCR pass.

When it won't help (much)

Handwriting that even your mother can't read.
Very small text (less than ~10 pixels tall after scanning).
Severely skewed, faded, or low-contrast scans — de-skew and increase contrast first.
Formulas, equations, or heavy diagrams — those need specialized tools.

How to run OCR in Formatly

Drop your image or PDF on the home page.
In the target format dropdown, pick OCR (Extract Text).
Convert, then download the resulting .txt.

Tips for better results

Higher DPI beats everything. If you're scanning, use 300 DPI or more. Phone photos should be reasonably close and well-lit.
Straight beats crooked. Rotate to align rows horizontally before uploading. Some OCR engines de-skew, some don't.
Clean beats dirty. Crop to the text. Remove coffee stains where possible.
Flat beats curved. A page photographed from a bound book will have characters distorted near the spine. Press it flat or use a scanner.

What you get back

Formatly outputs a plain .txt file. No formatting is preserved — just the text, in reading order. If you need structure, paste it into a Word doc or Google Doc and re-format from there.

OCR,
plainly explained.