Word Count Differences

Posted on Posted in Uncategorized

How can it be that the word count for the same file differs from (translation) tool to (translation) tool?

The way a translation tool counts words can differ from any other translation tool as well as the word count you can do in Word. The reason is the way words and word boundaries are defined in the tools. Some specify that a word with a hyphen (like “tool-related”) should be counted as one word, others see it as two words. The same is true for other delimiting characters, like slashes (/) or apostrophes (‘). It can even happen that a character like a slash, if it is surrounded by spaces (like in “in / out) could be counted as a word on its own in one tool, but not at all in another.

Some tools recognize combinations of letters and numbers (alphanumeric items) as one word, but only as long as there is no slash or hyphen that separates numbers from letters (ABC123 = 1 word, but ABC-123 = 2 words).

Depending on the types of elements your file contains, the difference can be quite extensive. A recent example from a file preparation showed elements like these:

/content/legal/privacy?cid=cookieprivacy#cookies-policy

One tool counted that whole expression as 1 word, the other counted 4 words, using the slashes and the equal symbol as word delimiters. Imagine the word count difference if there are 1000 items like this one in the file.

Of course it is debatable what of the above expression needs to be translated, if at all. That would be a nice exercise for the use of regular expressions, either to tag the whole thing or to extract the translatable part. 🙂

And although some tools let you influence the way they count by providing checkboxes to specify words with hyphens as one or two words, it is almost impossible to achieve the exact same word count with any two tools when your documents contain delimiting characters like slashes or equal symbols.

And here is the real-life comparison over 39 files:

Analysis tool A

Analysis tool B

Note that the segment count is quite close but the word count is very different.

One thought on “Word Count Differences

Leave a Reply

Your email address will not be published. Required fields are marked *