Word Counter
Unicode-aware text analysis that runs fully in your browser.
Top 5 Word Frequency
| Word | Count | Percent |
|---|
About this tool
This word counter provides live text analysis for writers, developers, and content teams. Beyond a basic word count, it calculates character counts (with and without spaces), line and paragraph counts, estimated reading time (at 200 words/minute), speaking time range, and a word frequency table showing your top five most-used words. A regex exclude filter lets you strip comment lines, boilerplate headers, or any pattern before counting. All analysis is Unicode-aware and runs entirely in your browser - paste text, drop a file, or load from the file picker to get instant stats.
Real example
Input: a 500-word blog post draft pasted into the tool.
Stats output:
Words: 498 Characters (with spaces): 2,941 Characters (no spaces): 2,447 Lines: 32 Sentences: 21 Paragraphs: 8 Read time: 149 sec Speaking time: 199-229 sec
Top 5 word frequency: "the" (28, 5.6%), "and" (19, 3.8%), "api" (14, 2.8%), "request" (11, 2.2%), "data" (9, 1.8%). Seeing "the" at 5.6% is expected - academic writing style guides typically recommend keeping common filler words below 7%. Seeing a technical term like "api" at 2.8% confirms the post stays on topic.
Common use cases
- SEO content length verification: Paste a draft blog post or landing page to confirm it meets your target word count before publishing. Most competitive SEO topics benefit from 1,000-2,000 words of substantive content.
- Social media and ad copy: Twitter/X limits posts to 280 characters; Google Ads headlines to 30. Paste your copy here to check character counts before switching between platforms.
- Presentation and script timing: Speaking time at 130-150 words/minute is accurate for a measured delivery. A 10-minute conference talk needs roughly 1,300-1,500 words. Use the speaking time output to calibrate your script length.
- Code comment and documentation audits: Use the regex exclude filter with a pattern like
^//|^#|^\*to strip comment lines before counting, so you measure only prose content in a mixed source file or runbook.
How it works
Word tokenization uses a Unicode-aware regex ([\p{L}\p{M}]+(:[-'][\p{L}\p{M}]+)*) that correctly handles accented characters, CJK scripts, and hyphenated compounds like "well-formed". CJK characters (Chinese, Japanese, Korean) are counted separately and divided by two to approximate word units, since they do not use spaces. Reading time divides the word count by 200 (average adult reading speed). Speaking time brackets use 150 words/minute (fast) and 130 words/minute (measured). The regex exclude filter applies per-line and per-token, so you can strip entire lines (e.g., ^#.*$ to remove Markdown headers) or specific words.
Common mistakes
- Counting with "ignore numbers" off: If your text contains phone numbers, IP addresses, or code snippets, the word count will include numeric tokens. Enable "Ignore numbers in word count" to exclude them and get a cleaner prose count.
- Reading time for technical content: The 200 words/minute rate is for standard prose. Dense technical documentation, code-heavy tutorials, or legal text is read at 100-130 words/minute. Adjust your expectations accordingly - double the reading time estimate for deeply technical material.
- Frequency table misleading due to stop words: Common words like "the", "a", "is", and "and" will always dominate the frequency table. For meaningful keyword density analysis, add a stop-word regex filter (e.g.,
\b(the|and|is|a|in|of)\b) before running the frequency analysis.
FAQ
How accurate is the word count for non-English text
The tokenizer uses Unicode property escapes (\p{L}) which cover all alphabetic scripts including Arabic, Thai, Hebrew, and Cyrillic. CJK text (Chinese, Japanese, Korean) does not use word boundaries, so those characters are counted individually and divided by two to approximate words - this is a common convention, not an exact count.
What does the regex exclude filter do
The filter is applied both per-line (lines matching the pattern are dropped entirely) and per-token (matching substrings within remaining lines are removed). This lets you strip comment lines, boilerplate, or specific words before counting. Invalid regex patterns are caught and reported in the status field.
Does this tool upload my text
No. All analysis - word tokenization, frequency counting, regex filtering, and normalization - runs in your browser using JavaScript. No text is sent to any server.
Can I analyze a file directly
Yes. Use the file picker or drag and drop a .txt, .md, .csv, .json, or .yaml file onto the input area. The file's text content is read locally and loaded into the input for immediate analysis.
Related tools
- Diff Checker — compare two document versions after editing
- Case Converter — normalize text case before counting or comparing
- Line Sorter — sort and deduplicate lines in your text block