Home/Blog/011
011 · Guide · May 2026

Screenshot to
text.

~3 minute read Most people don't realize how easy this is until they try it. Then they wonder how they spent years retyping things.

You snapped the screenshot. Now you need the words out of it. To paste into your editor, or quote in an email, or save the address from a confirmation page. Whatever the reason, retyping it character by character is for masochists.

OCR pulls the text out of the pixels and gives you back a .txt. It works on screenshots better than on almost anything else, because nothing about a screenshot is ambiguous. The text was rendered, not photographed. No shadows. No skew. No JPEG noise. Sharp pixels on a flat background in a predictable font. An OCR engine's idea of a good day.

The how-to is short.

  1. Open formatly.app.
  2. Drop the screenshot in.
  3. Pick OCR (Extract Text) from the dropdown.
  4. Hit Convert. A .txt appears under the box with the words.

Under the hood we're calling Google Cloud Vision, which is the best general OCR engine available outside paid enterprise tooling. For the kinds of screenshots people actually take, the output is paste-ready. For weirder inputs, less so.

What it gets right

The boring everyday stuff is essentially solved. A screenshot of a Notion doc comes out clean. So does a Slack thread, a Hacker News comment, a code snippet pulled from VS Code. Yes, indentation survives. Yes, the angle brackets and curly braces come through correctly. Burned-in subtitles from a YouTube clip mostly work. Confirmation emails, error messages, button text, menu items: all easy. If you can read it, Vision can usually read it too.

Where it falls down

The biggest landmine is dark mode. Vision was trained mostly on dark text on light backgrounds, and "white text on near-black" trips it up more often than you'd expect. If you screenshot a dark-mode UI and half the words are missing from the output, invert the colors first. Preview on Mac, the share sheet on iOS, any image editor — they all have a one-keystroke invert. Re-upload the inverted version and it usually works.

Stylized type is the second one. Logos, magazine display headlines, anything where the type designer got clever — accuracy drops fast. Handwriting too, though Vision tries; expect to clean up the result by hand.

The third is text on top of busy backgrounds. Burned-in subtitles over a moving video frame, captions on a photo, lyrics over album art. The engine treats the background's edges as noise that might also be characters and the output gets garbled. Crop tighter or boost contrast before uploading.

How to get cleaner output

Crop tightly. The less stuff that isn't the text you want, the better. Most screenshot tools let you crop while you capture (Cmd+Shift+4 on Mac, Win+Shift+S on Windows, the volume-and-power gymnastics on iOS). Use them.

Capture at native resolution. A retina screenshot at 2× or 3× the visible pixel size gives the engine more detail to chew on. Don't zoom in on your phone before screenshotting — the pixels you "added" by zooming aren't real ones, just stretched copies of the originals.

Don't mix content. If half the screenshot is text and half is a photograph, expect the photo's edges to read as garbled characters. Either crop the photo out or run two passes and stitch the text manually.

The mobile workflow

This is where the trick gets really useful. You see something on screen — a flight number, a recipe ingredient list, a transit map, an OTP from a banking app — and you want it as text without retyping. Open formatly.app in mobile Safari or Chrome, tap the box, pick the screenshot from your camera roll, choose OCR. Twenty seconds. No app to install, nothing to log into.

Privacy

Screenshots leak. They contain banking app overlays you didn't notice, half a private DM, a coworker's email address in the corner of a Zoom call. Worth knowing what we do with them.

The file's deleted from our servers an hour after upload. Source and output both. After that the URLs 404. We can't see your files either — there's no admin UI for it, and the storage bucket is locked down to the converter's service account.

That said: redact locally before uploading anything truly sensitive. Passwords, card numbers, API keys. A black rectangle from any image editor takes ten seconds and means it never leaves your device.

Why not Apple Live Text?

Fair question. iOS, macOS, and Android all ship built-in OCR now — long-press text in a photo, copy it. For one-off captures it's perfect: a phone number, an address, a single word. Use the built-in tool.

For longer passages it gets fiddly fast. The selection handles never quite land where you want them. There's no batch. The output isn't a file you can save, share, or pipe into anything. And nothing crosses devices: a screenshot from your iPhone needs another Apple device to read it.

If you want a text file, want it on any device, or have more than one screenshot to process, the in-browser route is faster.

Three things that'll catch you

OCR follows the screenshot's visual layout. If the text wrapped at funny points in the original (narrow column, wrapped tooltip, hyphenated breaks), the output text breaks at the same points. A find/replace on newlines flattens it.

Number-letter confusion still happens. "1" vs "l" vs "I". "0" vs "O". If your screenshot contains identifiers, codes, or anything you need to be exact, proofread before pasting it somewhere that matters.

Mixed-language screenshots will skip words in whichever language Vision wasn't expecting. English is the default; most Latin-script European languages work without setup. CJK and right-to-left scripts are on the roadmap.

Related