Zaliznyak’s Morphological Dictionary
- Zaliznyak’s Morphological Dictionary –
This file is the basis for many implementations of Russian morphology.
It contains the word entries from an early edition of Zaliznyak’s
morphological dictionary, though some of the unusual characters used
in the printed dictionary have been replaced with ASCII substitutes.
It does not include the information needed to interpret these entries.
For that you must refer to printed dictionary or scans.)
Though this is ostensibly a text file, it is stored in what is now a very
inconvenient format:
- It is an installer which runs under Microsoft Windows to install the text files.
- The files it creates are encoded in the ALT encoding for MS-DOS which is now little used.
- The already obscure ALT encoding is altered to provide accented vowels at code points usually used for graphic characters
- The files contain typographical errors from the OCR process
- At the end of each line a control-d character has been added followed by what is appearently intended as an English translation of the word, but these are often unintelligible. For these reasons you would likely do better to use Odict.
- Odict.ru – A version of Zaliznyak’s dictionary with about 8000 additional entries added by Sergey Slepov, the author of the site Morpher.ru. Versions before October 25, 2020 are licensed CC-BA-SA 3.0. Later versions are licensed commercially. Site includes documentation of the format taken from the forward to Zaliznayak’s dictionary and adapted:
- «Грамматический словарь» А. А. Зализняка – Official release of the data from the sixth edition. Licensed CC BY-NC.
- Викисловарь:Использование словаря Зализняка – Description of Zaliznyak’s grammatical notation as used in Wiktionary.
Other Downloadable Linguistic Data
- Modern Russian Frequency List – Serge Sharoff’s word frequency list based on a selection of works in modern Russian
- Opencorpa – A Russian corpus with ambiguity resolved so that the identity of each word and its morphological form is known
- OPUS–an Open Source Parallel Corpus – Translated texts sentence aligned. Can be searched on this and other sites. Datasets can be downloaded so that you can use them with your own tools. Texts come from sources including UN documents, government publications, Wikipedia, movie subtitles, and multilingual news sites.
- Openrussian.org – Database dumps of the Openrussian.org dictionary
Programs for Russian Morphology
- Pymorphy2 – Morphological analyzer and inflection engine for the Russian and Ukrainian languages. Is of good quality, fast, but does not provide information about stressed syllables.
- Jurta – A program written in Perl which takes the entries in Zalizyak’s dictionary and produces the full paradigm of each word. Unfortunately the author has not posted all of the files, so the program does not really work. The output files are posted though and these may be useful.
- Russress – Uses a statistical model to guess the stress of Russian words in context
- StressRNN – Uses a statistical model to guess the stress of Russian words in context
Natural Language Processing in Russian
- NLTK4RUSSIAN – The Natural Language Toolkit adapted for Russian
- Natasha – Rule-based named entity recognition library for Russian
- Polyglot – NLP library in Python with support for dozens to hundreds of languages, depending on the function
- Link Grammar – Natural language parser with support for Russian