How to Split a Large PDF Into Smaller Files

Large PDFs are everywhere — scanned contracts, exported reports, e-books, bank statements — and they’re almost always inconvenient. They’re slow to open, hard to email, and painful to navigate when you only need pages 47–52. Splitting a PDF into smaller files is one of those tasks that sounds trivial until you actually have to do it on 200 documents, or on a machine where you can’t install anything, or in a script that runs every night.

This article walks through the practical options, from “I just need to do this once” to “I need to automate it across thousands of files,” with the tradeoffs of each.

Table of Contents

First, Decide How You Want to Split

Before picking a tool, get clear on what “split” actually means for your case. The choice drives everything else.

Strategy	Example	When to use
By page range	“Pages 1–10, 11–20, 21–end”	You know exactly which pages you want
By fixed chunk size	“Every 50 pages becomes a new file”	Long uniform documents (logs, scans)
By bookmarks / outline	“One file per chapter”	Structured documents (books, reports)
By blank page or separator	“Split whenever a blank page appears”	Batch-scanned documents
By text content	“Split whenever ‘Invoice #’ appears”	Merged invoices, statements, tickets
By file size	“Each output ≤ 10 MB”	Email attachment limits

Most tools do the first two well. The last three require either a smart desktop tool or a script.

Option 1: The Browser (Zero Install, One-Off Jobs)

If you just need to split one PDF and the contents aren’t sensitive, the fastest path is a web tool: Adobe’s online splitter, Smallpdf, iLovePDF, PDF24. They all do the same thing — upload, pick page ranges, download.

Use this when: the file is small, non-confidential, and you’ll never need to do it again.

Don’t use this when: the PDF contains personal data, financial records, medical information, or anything covered by an NDA. You’re uploading the file to a third party — read the privacy policy or pick a different option.

Option 2: A Desktop App (One-Off, Confidential, or Visual)

For sensitive files or when you want to see what you’re splitting, a desktop tool is the right call.

macOS Preview — built in. Open the PDF, show the sidebar (View → Thumbnails), drag selected thumbnails to the desktop. Each drag becomes a new PDF. Great for plucking out a few pages; awkward for long ranges.
Adobe Acrobat Pro — Organize Pages → Split. Lets you split by number of pages, file size, or top-level bookmarks. Polished, paid.
PDFsam Basic — free, open source, cross-platform. Has explicit modes for “split by page numbers,” “split by every N pages,” “split by bookmarks,” and “split by size.” This is the tool to install if you do this even occasionally.
PDF24 Creator (Windows) — free, offline, similar feature set.

Use this when: you want a GUI, the file is confidential, or you want bookmark-aware splitting without writing code.

Option 3: The Command Line (Repeatable, Scriptable)

This is where it gets interesting. Once you can do it in a terminal, you can do it 10,000 times.

`qpdf` — the modern default

qpdf is fast, lossless, and available everywhere (brew install qpdf, apt install qpdf, choco install qpdf).

# Extract pages 5–12 into a new file
qpdf input.pdf --pages input.pdf 5-12 -- out_5-12.pdf

# Split EVERY page into its own file (page-001.pdf, page-002.pdf, …)
qpdf --split-pages=1 input.pdf page-%d.pdf

# Split into 50-page chunks
qpdf --split-pages=50 input.pdf chunk-%d.pdf

# Pull non-contiguous pages: 1, 3, 5–10, last page
qpdf input.pdf --pages input.pdf 1,3,5-10,z -- selected.pdf

z means “the last page,” and you can use r1 for “1 from the end” — handy when you don’t know the page count.

`pdftk` — the old reliable

Still works fine, syntax is friendlier to read aloud:

pdftk input.pdf cat 1-10 output part1.pdf
pdftk input.pdf cat 11-end output part2.pdf

`Ghostscript` — when you need re-rendering

Useful when you also want to compress, rasterize, or strip features:

gs -sDEVICE=pdfwrite -dFirstPage=1 -dLastPage=10 \
   -sOutputFile=part1.pdf input.pdf

Slower and lossy-ish (it re-renders), but it can shrink the output dramatically.

Use this when: you’ll repeat the operation, or the split is part of a larger pipeline (download → split → upload).

Option 4: A Script (Conditional or Content-Aware Splitting)

When the split rule depends on the contents of the PDF — “split before each invoice,” “split at every chapter heading” — you need code. Python with pypdf is the standard.

Split into fixed-size chunks

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
chunk_size = 50

for start in range(0, len(reader.pages), chunk_size):
    writer = PdfWriter()
    for page in reader.pages[start:start + chunk_size]:
        writer.add_page(page)
    with open(f"chunk_{start // chunk_size + 1:03d}.pdf", "wb") as f:
        writer.write(f)

Split by bookmarks (one file per chapter)

from pypdf import PdfReader, PdfWriter

reader = PdfReader("book.pdf")

# Top-level outline entries only
chapters = [item for item in reader.outline if not isinstance(item, list)]

bounds = [reader.get_destination_page_number(c) for c in chapters]
bounds.append(len(reader.pages))  # sentinel for the last chapter

for i, chapter in enumerate(chapters):
    writer = PdfWriter()
    for p in range(bounds[i], bounds[i + 1]):
        writer.add_page(reader.pages[p])
    safe_title = "".join(c for c in chapter.title if c.isalnum() or c in " -_").strip()
    with open(f"{i + 1:02d}_{safe_title}.pdf", "wb") as f:
        writer.write(f)

Split by text content (e.g., one file per invoice)

import re
from pypdf import PdfReader, PdfWriter

reader = PdfReader("merged_invoices.pdf")
boundaries = [
    i for i, page in enumerate(reader.pages)
    if re.search(r"Invoice\s+#\s*\d+", page.extract_text() or "")
]
boundaries.append(len(reader.pages))

for i in range(len(boundaries) - 1):
    writer = PdfWriter()
    for p in range(boundaries[i], boundaries[i + 1]):
        writer.add_page(reader.pages[p])
    with open(f"invoice_{i + 1:03d}.pdf", "wb") as f:
        writer.write(f)

Use this when: the rule is data-driven, the volume is large, or the split is one step in a longer automation.

Option 5: Bulk Processing Many PDFs

If you have a folder of PDFs and want to split each one the same way, wrap a CLI tool in a shell loop:

# Split every PDF in the folder into 25-page chunks
for f in *.pdf; do
    mkdir -p "split/${f%.pdf}"
    qpdf --split-pages=25 "$f" "split/${f%.pdf}/page-%d.pdf"
done

For thousands of files, run it in parallel:

ls *.pdf | xargs -n 1 -P 8 -I{} \
    qpdf --split-pages=25 "{}" "split/{}-page-%d.pdf"

-P 8 runs eight splits concurrently. PDF splitting is mostly I/O-bound, so parallelism helps.

Things That Will Bite You

A few realities that aren’t obvious until you hit them:

“Splitting” doesn’t shrink each piece proportionally. PDFs share fonts, images, and metadata across pages. Splitting a 100 MB / 100-page PDF into 10 files often gives you ten ~10 MB files — but it can give you ten ~30 MB files if every page references the same embedded fonts and images. If size is the goal, run the output through qpdf --linearize or Ghostscript with -dPDFSETTINGS=/ebook afterward.
Encrypted / password-protected PDFs need to be unlocked first. qpdf --password=… --decrypt in.pdf out.pdf.
Scanned PDFs have no text layer. Text-based splitting (Option 4, last example) won’t work until you OCR them — ocrmypdf input.pdf searchable.pdf adds a text layer in place.
Bookmarks and form fields don’t always survive. pypdf and qpdf preserve most structure, but interactive forms, JavaScript, and digital signatures often break when pages are extracted. Test with a known-good document before processing important files.
Page numbers in tools are 1-indexed; in code they’re usually 0-indexed. This is responsible for roughly 80% of the bugs people hit when scripting PDF splits.

Picking the Right Tool — Quick Decision Guide

One file, one time, not sensitive → web tool.
One file, one time, sensitive → Preview, PDFsam, or Acrobat.
Several files, repeatable rule, simple ranges → qpdf in a shell loop.
Content-aware rule (text, bookmarks, blank pages) → Python + pypdf.
Pipeline / production / thousands of files → qpdf or pypdf in a script, with logging and error handling around it.

The Takeaway

Splitting a PDF is a five-minute job the first time and a five-second job every time after — if you pick the right tool for the situation. Web tools are fine for disposable one-offs. PDFsam and qpdf cover 90% of real work. The moment your splitting rule depends on what’s inside the document, switch to a script and stop fighting GUIs.

The skill worth investing in isn’t memorizing tool flags — it’s recognizing, before you start, which of the strategies in the first table you actually need. Get that right, and the tool falls out of the answer.

How to Split a Large PDF Into Smaller Files

First, Decide How You Want to Split

Option 1: The Browser (Zero Install, One-Off Jobs)

Option 2: A Desktop App (One-Off, Confidential, or Visual)

Option 3: The Command Line (Repeatable, Scriptable)

`qpdf` — the modern default

`pdftk` — the old reliable

`Ghostscript` — when you need re-rendering

Option 4: A Script (Conditional or Content-Aware Splitting)

Split into fixed-size chunks

Split by bookmarks (one file per chapter)

Split by text content (e.g., one file per invoice)

Option 5: Bulk Processing Many PDFs

Things That Will Bite You

Picking the Right Tool — Quick Decision Guide

The Takeaway

David Cao

Leave a ReplyCancel Reply

First, Decide How You Want to Split

Option 1: The Browser (Zero Install, One-Off Jobs)

Option 2: A Desktop App (One-Off, Confidential, or Visual)

Option 3: The Command Line (Repeatable, Scriptable)

qpdf — the modern default

pdftk — the old reliable

Ghostscript — when you need re-rendering

Option 4: A Script (Conditional or Content-Aware Splitting)

Split into fixed-size chunks

Split by bookmarks (one file per chapter)

Split by text content (e.g., one file per invoice)

Option 5: Bulk Processing Many PDFs

Things That Will Bite You

Picking the Right Tool — Quick Decision Guide

The Takeaway

David Cao

Related Posts

4 Ways to Fix “command not found: claude” or “claude is not recognized”

How to Integrate MCP Servers with Claude Skills

What are Claude Skills – The Building Blocks of AI Customization

Leave a ReplyCancel Reply

`qpdf` — the modern default

`pdftk` — the old reliable

`Ghostscript` — when you need re-rendering