buncha.tools
Bunch·3–4 min

Clean text from PDF

Copy-paste from a PDF and you almost always get a mess: hyphens at the end of every line, random capitalisation, page footers in the middle. This bunch is the pipeline that turns it back into prose you can actually use.

Start with PDF to Text

The workflow

4 steps. Click any one to jump straight in.

  1. 1
    PDF to TextPDF
    Pull the text out of the PDF.
  2. 2
    Find & ReplaceText
    Strip random line breaks and weird whitespace.
  3. 3
    Case ConverterText
    Fix the case if everything came out CAPS or lower.
  4. 4
    Word CounterText
    Confirm the final word count for whatever you're doing next.
Ready to go?
Open PDF to Text — the bunch will guide you the rest of the way.
Begin

More like this