fix(dictionary_rare): remove “empress” and “empresses” words#3678
fix(dictionary_rare): remove “empress” and “empresses” words#3678Kristinita wants to merge 1 commit intocodespell-project:mainfrom
Conversation
Words “empress” and “empresses” isn’t “rare”, they are still used when we talk about empresses. Signed-off-by: Kristinita <Kristinita@users.noreply.github.com>
|
The rare dictionary is for rare English words, not non-English or deprecated words. As already explained, I believe the rare dictionary should be disabled by default - but currently it's not. In the meantime, you may simply ignore false positives in your projects.
You'll find more details scattered in existing codespell issues. You should be able to find them with a GitHub search. I think we would welcome a PR that would gather that scattered information and suggest more formal criteria. |
|
Type: Reply 💬 1. Checking the frequency of English words
If the phrase “rare English words” meaning “words with a low frequency in the English language”, we can check the frequency online. The list of the most widely used online English corpora. Google Books Ngram Viewer contains more words than another corpora. 2. Queries to Google Books Ngram Viewer2.1. “empress” vs. “impress”“empress” query to Google Books Ngram Viewer — 0.0004093305% “impress” query to Google Books Ngram Viewer — 0.0004247406% In books of 2022 on Google Books words “empress” and “impress” was found with almost the same frequency. 2.2. “Empress” vs. “Impress”“Empress” query to Google Books Ngram Viewer — 0.0004773066% “Impress” query to Google Books Ngram Viewer — 0.0000105055% In books of 2022 on Google Books the word “Empress” was found 45 times more often than the word “Impress”. Thanks. |
|
I have to agree there has been a recent surge of occurrences of Maybe related to the current decline of democracy and the rise of authoritarian regimes around the world 😄 |




Question if the pull-request will decline
Please provide more details about the criteria for including words to the dictionary
dictionary_rare.txt. I found solely information fromcodespell --help:In my view, the dictionary
dictionary_rare.txtmust contain words and forms of words that used in English language in previous centuries, but almost out of use in the 21st century. Words “empress” and “empresses” doesn’t match this criterion — when now, in 2025, people speak and write about empresses, they still use the word “empress”.Thanks.