How to Split a Large PDF Into Smaller Files

Large PDFs are everywhere — scanned contracts, exported reports, e-books, bank statements — and they’re almost always inconvenient. They’re slow to open, hard to email, and painful to navigate when you only need pages 47–52. Splitting a PDF into smaller files is one of those tasks that sounds trivial until you actually have to do it on 200 documents, or on a machine where you can’t install anything, or in a script that runs every night.

This article walks through the practical options, from “I just need to do this once” to “I need to automate it across thousands of files,” with the tradeoffs of each.

First, Decide How You Want to Split

Before picking a tool, get clear on what “split” actually means for your case. The choice drives everything else.

StrategyExampleWhen to use
By page range“Pages 1–10, 11–20, 21–end”You know exactly which pages you want
By fixed chunk size“Every 50 pages becomes a new file”Long uniform documents (logs, scans)
By bookmarks / outline“One file per chapter”Structured documents (books, reports)
By blank page or separator“Split whenever a blank page appears”Batch-scanned documents
By text content“Split whenever ‘Invoice #’ appears”Merged invoices, statements, tickets
By file size“Each output ≤ 10 MB”Email attachment limits

Most tools do the first two well. The last three require either a smart desktop tool or a script.

Option 1: The Browser (Zero Install, One-Off Jobs)

If you just need to split one PDF and the contents aren’t sensitive, the fastest path is a web tool: Adobe’s online splitter, Smallpdf, iLovePDF, PDF24. They all do the same thing — upload, pick page ranges, download.

Use this when: the file is small, non-confidential, and you’ll never need to do it again.

Don’t use this when: the PDF contains personal data, financial records, medical information, or anything covered by an NDA. You’re uploading the file to a third party — read the privacy policy or pick a different option.

Option 2: A Desktop App (One-Off, Confidential, or Visual)

For sensitive files or when you want to see what you’re splitting, a desktop tool is the right call.

  • macOS Preview — built in. Open the PDF, show the sidebar (View → Thumbnails), drag selected thumbnails to the desktop. Each drag becomes a new PDF. Great for plucking out a few pages; awkward for long ranges.
  • Adobe Acrobat ProOrganize Pages → Split. Lets you split by number of pages, file size, or top-level bookmarks. Polished, paid.
  • PDFsam Basic — free, open source, cross-platform. Has explicit modes for “split by page numbers,” “split by every N pages,” “split by bookmarks,” and “split by size.” This is the tool to install if you do this even occasionally.
  • PDF24 Creator (Windows) — free, offline, similar feature set.

Use this when: you want a GUI, the file is confidential, or you want bookmark-aware splitting without writing code.

Option 3: The Command Line (Repeatable, Scriptable)

This is where it gets interesting. Once you can do it in a terminal, you can do it 10,000 times.

See also: Mastering the Linux Command Line — Your Complete Free Training Guide

qpdf — the modern default

qpdf is fast, lossless, and available everywhere (brew install qpdf, apt install qpdf, choco install qpdf).

# Extract pages 5–12 into a new file
qpdf input.pdf --pages input.pdf 5-12 -- out_5-12.pdf

# Split EVERY page into its own file (page-001.pdf, page-002.pdf, …)
qpdf --split-pages=1 input.pdf page-%d.pdf

# Split into 50-page chunks
qpdf --split-pages=50 input.pdf chunk-%d.pdf

# Pull non-contiguous pages: 1, 3, 5–10, last page
qpdf input.pdf --pages input.pdf 1,3,5-10,z -- selected.pdf

z means “the last page,” and you can use r1 for “1 from the end” — handy when you don’t know the page count.

pdftk — the old reliable

Still works fine, syntax is friendlier to read aloud:

pdftk input.pdf cat 1-10 output part1.pdf
pdftk input.pdf cat 11-end output part2.pdf

Ghostscript — when you need re-rendering

Useful when you also want to compress, rasterize, or strip features:

gs -sDEVICE=pdfwrite -dFirstPage=1 -dLastPage=10 \
   -sOutputFile=part1.pdf input.pdf

Slower and lossy-ish (it re-renders), but it can shrink the output dramatically.

Use this when: you’ll repeat the operation, or the split is part of a larger pipeline (download → split → upload).

Option 4: A Script (Conditional or Content-Aware Splitting)

When the split rule depends on the contents of the PDF — “split before each invoice,” “split at every chapter heading” — you need code. Python with pypdf is the standard.

Split into fixed-size chunks

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
chunk_size = 50

for start in range(0, len(reader.pages), chunk_size):
    writer = PdfWriter()
    for page in reader.pages[start:start + chunk_size]:
        writer.add_page(page)
    with open(f"chunk_{start // chunk_size + 1:03d}.pdf", "wb") as f:
        writer.write(f)

Split by bookmarks (one file per chapter)

from pypdf import PdfReader, PdfWriter

reader = PdfReader("book.pdf")

# Top-level outline entries only
chapters = [item for item in reader.outline if not isinstance(item, list)]

bounds = [reader.get_destination_page_number(c) for c in chapters]
bounds.append(len(reader.pages))  # sentinel for the last chapter

for i, chapter in enumerate(chapters):
    writer = PdfWriter()
    for p in range(bounds[i], bounds[i + 1]):
        writer.add_page(reader.pages[p])
    safe_title = "".join(c for c in chapter.title if c.isalnum() or c in " -_").strip()
    with open(f"{i + 1:02d}_{safe_title}.pdf", "wb") as f:
        writer.write(f)

Split by text content (e.g., one file per invoice)

import re
from pypdf import PdfReader, PdfWriter

reader = PdfReader("merged_invoices.pdf")
boundaries = [
    i for i, page in enumerate(reader.pages)
    if re.search(r"Invoice\s+#\s*\d+", page.extract_text() or "")
]
boundaries.append(len(reader.pages))

for i in range(len(boundaries) - 1):
    writer = PdfWriter()
    for p in range(boundaries[i], boundaries[i + 1]):
        writer.add_page(reader.pages[p])
    with open(f"invoice_{i + 1:03d}.pdf", "wb") as f:
        writer.write(f)

Use this when: the rule is data-driven, the volume is large, or the split is one step in a longer automation.

Option 5: Bulk Processing Many PDFs

If you have a folder of PDFs and want to split each one the same way, wrap a CLI tool in a shell loop:

# Split every PDF in the folder into 25-page chunks
for f in *.pdf; do
    mkdir -p "split/${f%.pdf}"
    qpdf --split-pages=25 "$f" "split/${f%.pdf}/page-%d.pdf"
done

For thousands of files, run it in parallel:

ls *.pdf | xargs -n 1 -P 8 -I{} \
    qpdf --split-pages=25 "{}" "split/{}-page-%d.pdf"

-P 8 runs eight splits concurrently. PDF splitting is mostly I/O-bound, so parallelism helps.

Things That Will Bite You

A few realities that aren’t obvious until you hit them:

  1. “Splitting” doesn’t shrink each piece proportionally. PDFs share fonts, images, and metadata across pages. Splitting a 100 MB / 100-page PDF into 10 files often gives you ten ~10 MB files — but it can give you ten ~30 MB files if every page references the same embedded fonts and images. If size is the goal, run the output through qpdf --linearize or Ghostscript with -dPDFSETTINGS=/ebook afterward.
  2. Encrypted / password-protected PDFs need to be unlocked first. qpdf --password=… --decrypt in.pdf out.pdf.
  3. Scanned PDFs have no text layer. Text-based splitting (Option 4, last example) won’t work until you OCR them — ocrmypdf input.pdf searchable.pdf adds a text layer in place.
  4. Bookmarks and form fields don’t always survive. pypdf and qpdf preserve most structure, but interactive forms, JavaScript, and digital signatures often break when pages are extracted. Test with a known-good document before processing important files.
  5. Page numbers in tools are 1-indexed; in code they’re usually 0-indexed. This is responsible for roughly 80% of the bugs people hit when scripting PDF splits.

Picking the Right Tool — Quick Decision Guide

  • One file, one time, not sensitive → web tool.
  • One file, one time, sensitive → Preview, PDFsam, or Acrobat.
  • Several files, repeatable rule, simple rangesqpdf in a shell loop.
  • Content-aware rule (text, bookmarks, blank pages) → Python + pypdf.
  • Pipeline / production / thousands of filesqpdf or pypdf in a script, with logging and error handling around it.

The Takeaway

Splitting a PDF is a five-minute job the first time and a five-second job every time after — if you pick the right tool for the situation. Web tools are fine for disposable one-offs. PDFsam and qpdf cover 90% of real work. The moment your splitting rule depends on what’s inside the document, switch to a script and stop fighting GUIs.

The skill worth investing in isn’t memorizing tool flags — it’s recognizing, before you start, which of the strategies in the first table you actually need. Get that right, and the tool falls out of the answer.

David Cao
David Cao

David is a Cloud & DevOps Enthusiast. He has years of experience as a Linux engineer. He had working experience in AMD, EMC. He likes Linux, Python, bash, and more. He is a technical blogger and a Software Engineer. He enjoys sharing his learning and contributing to open-source.

Articles: 659

Leave a Reply

Your email address will not be published. Required fields are marked *