How to Clean Up Text and Data Without Pasting It Into a Website
The usual way to fix messy text is to google "remove duplicate lines online", land on an ad-covered site, and paste your data into a form on someone else's server. That works until the data is a client list, an API response with tokens in it, or anything covered by a confidentiality agreement. This guide covers the common cleanup jobs and how to do them locally.
Why "Online Tool" Is the Wrong Default
When you paste into a web form, the contents typically travel to a server you know nothing about. Most of these sites are honest, some are not, and none of them owe you an audit trail. If the text contains emails, keys, internal hostnames, or customer records, the safe assumption is that it should not leave your machine. The fix is to use tools that run entirely in your own browser with no backend: a local extension, a devtools snippet, or a command-line one-liner.
1. Fixing Whitespace, Quotes, and Line Endings
Text copied from Word, PDFs, or Slack arrives with curly quotes, mixed CRLF/LF line endings, and runs of spaces. On the command line:
tr -s ' ' < input.txt # collapse repeated spaces
sed 's/[""]/"/g' input.txt # straighten curly double quotes
That works if you live in a terminal. If you do not, or the text is already on your clipboard, a local utility with a "normalize whitespace" and "smart quotes to straight" button does the same job in one click without a round trip through a file.
2. Sorting and Deduplicating Lines
The classic job: a column of emails or SKUs with duplicates and random ordering. Terminal version:
sort -u input.txt > output.txt
The catch is case sensitivity ("Alice@" and "alice@" survive sort -u as two lines) and surrounding whitespace. A tool with explicit case-sensitive and trim toggles handles the real-world version of this job better than remembering sort -fu flags.
3. Formatting and Validating JSON
Prettifying JSON is one line if you have Python:
python3 -m json.tool response.json
Validation with a useful error location is harder; most CLI tools tell you it failed, not where. Look for a validator that reports the exact line and column of the first error. For JSON that is almost valid (trailing commas, single quotes, comments pasted from a JS config), a repair function saves you from fixing it by hand.
4. Converting JSON, CSV, and YAML
Converting an API response to a spreadsheet, or a CSV export to JSON, is exactly the case where people upload sensitive data to converter sites, because the CLI options (jq, yq, csvkit) have a real learning curve. Whatever tool you use, the requirement is the same: the conversion must happen locally. If the converter is a website, check whether it advertises client-side processing, and be aware you are trusting that claim.
5. Stripping Tracking Parameters From URLs
Links copied from newsletters and social feeds carry utm_source, fbclid, gclid, and friends. You can delete them by hand, but a stripper that knows the common tracking params cleans a pasted link in one click and leaves legitimate query params alone.
6. Removing EXIF Data From Photos
Photos embed camera details and often GPS coordinates. Uploading a photo to an "EXIF remover" site to protect your privacy is self-defeating; the removal has to happen on your device. macOS and Windows can strip some metadata natively (image re-export), or use a local tool that re-encodes the image entirely in the browser.
One Toolbox Instead of Six Tabs
Everything above can be done with a mix of CLI tools and native tricks, and if that is your workflow, keep it. We built Scrub for the other case: a Chrome extension and offline web app that bundles all of these jobs (text cleanup, JSON, CSV/YAML conversion, URL de-junking, Base64/hash/UUID/timestamp utilities, and image EXIF removal) behind one consistent interface, running 100% on your device.
The honest disclosure: the text tools, JSON formatting and validation, and the encode utilities are free forever with no account. The conversions, URL stripper, image tools, batch mode, and history are a $9 one-time Pro unlock. The only network request the app ever makes is verifying that license key with Gumroad. There are no analytics and no servers of ours involved.