How are words counted?

Words are counted by splitting on whitespace (spaces, tabs, newlines). Hyphenated words like well-known count as one word. Numbers and punctuation-only tokens may be counted depending on the mode selected.

What does reading time estimate?

Reading time is calculated at 200-250 words per minute, which is the average silent reading speed for adults. Technical content with dense terminology is typically read more slowly than general prose.

Does this count characters with or without spaces?

Both metrics are shown. Character count with spaces includes every character including whitespace. Character count without spaces counts only non-whitespace characters. The without-spaces count is used for SMS and social media character limits.

Word Counter

Unicode-aware text analysis that runs fully in your browser.

Input Text

Filtered/Normalized Output

Ignore numbers in word count Ignore punctuation in word count Normalize line endings + trim trailing spaces JSON/YAML pretty mode (best effort)

Regex exclude filter (words/lines)

Status

Top 5 Word Frequency

Word	Count	Percent

About this tool

This word counter provides live text analysis for writers, developers, and content teams. Beyond a basic word count, it calculates character counts (with and without spaces), line and paragraph counts, estimated reading time (at 200 words/minute), speaking time range, and a word frequency table showing your top five most-used words. A regex exclude filter lets you strip comment lines, boilerplate headers, or any pattern before counting. All analysis is Unicode-aware and runs entirely in your browser - paste text, drop a file, or load from the file picker to get instant stats.

Real example

Input: a 500-word blog post draft pasted into the tool.

Stats output:

Words: 498
Characters (with spaces): 2,941
Characters (no spaces): 2,447
Lines: 32
Sentences: 21
Paragraphs: 8
Read time: 149 sec
Speaking time: 199-229 sec

Top 5 word frequency: "the" (28, 5.6%), "and" (19, 3.8%), "api" (14, 2.8%), "request" (11, 2.2%), "data" (9, 1.8%). Seeing "the" at 5.6% is expected - academic writing style guides typically recommend keeping common filler words below 7%. Seeing a technical term like "api" at 2.8% confirms the post stays on topic.

Common use cases

SEO content length verification: Paste a draft blog post or landing page to confirm it meets your target word count before publishing. Most competitive SEO topics benefit from 1,000-2,000 words of substantive content.
Social media and ad copy: Twitter/X limits posts to 280 characters; Google Ads headlines to 30. Paste your copy here to check character counts before switching between platforms.
Presentation and script timing: Speaking time at 130-150 words/minute is accurate for a measured delivery. A 10-minute conference talk needs roughly 1,300-1,500 words. Use the speaking time output to calibrate your script length.
Code comment and documentation audits: Use the regex exclude filter with a pattern like ^//|^#|^\* to strip comment lines before counting, so you measure only prose content in a mixed source file or runbook.

How it works

Word tokenization uses a Unicode-aware regex ([\p{L}\p{M}]+(:[-'][\p{L}\p{M}]+)*) that correctly handles accented characters, CJK scripts, and hyphenated compounds like "well-formed". CJK characters (Chinese, Japanese, Korean) are counted separately and divided by two to approximate word units, since they do not use spaces. Reading time divides the word count by 200 (average adult reading speed). Speaking time brackets use 150 words/minute (fast) and 130 words/minute (measured). The regex exclude filter applies per-line and per-token, so you can strip entire lines (e.g., ^#.*$ to remove Markdown headers) or specific words.

Common mistakes

Counting with "ignore numbers" off: If your text contains phone numbers, IP addresses, or code snippets, the word count will include numeric tokens. Enable "Ignore numbers in word count" to exclude them and get a cleaner prose count.
Reading time for technical content: The 200 words/minute rate is for standard prose. Dense technical documentation, code-heavy tutorials, or legal text is read at 100-130 words/minute. Adjust your expectations accordingly - double the reading time estimate for deeply technical material.
Frequency table misleading due to stop words: Common words like "the", "a", "is", and "and" will always dominate the frequency table. For meaningful keyword density analysis, add a stop-word regex filter (e.g., \b(the|and|is|a|in|of)\b) before running the frequency analysis.

FAQ

How accurate is the word count for non-English text
The tokenizer uses Unicode property escapes (\p{L}) which cover all alphabetic scripts including Arabic, Thai, Hebrew, and Cyrillic. CJK text (Chinese, Japanese, Korean) does not use word boundaries, so those characters are counted individually and divided by two to approximate words - this is a common convention, not an exact count.

What does the regex exclude filter do
The filter is applied both per-line (lines matching the pattern are dropped entirely) and per-token (matching substrings within remaining lines are removed). This lets you strip comment lines, boilerplate, or specific words before counting. Invalid regex patterns are caught and reported in the status field.

Does this tool upload my text
No. All analysis - word tokenization, frequency counting, regex filtering, and normalization - runs in your browser using JavaScript. No text is sent to any server.

Can I analyze a file directly
Yes. Use the file picker or drag and drop a .txt, .md, .csv, .json, or .yaml file onto the input area. The file's text content is read locally and loaded into the input for immediate analysis.

Related tools

Diff Checker — compare two document versions after editing
Case Converter — normalize text case before counting or comparing
Line Sorter — sort and deduplicate lines in your text block