The General Internet-Corpus of Russian (GICR) is a megacorpus (more than 15 GT) created with a fully automated technology of collecting and tagging texts from Russian Internet and based on the latest achievements of computational linguistics.
The project is being implemented with the technological and organizational support of ABBYY company.
As of May 2015 the corpus includes materials from the largest Russian Internet resources: News sites, VKontakte, LiveJournal, Mail.ru blogs – as well as from the Russian Journal’s Magazine Hall.
The project has the status of an educational and scientific one, and students of the Department of Computational Linguistics of RSUH and of MIPT participate in its realization, as well as the employees of the stated departments and experts of ABBYY, MSU and the University of Leeds (UK).
The project is open to external researchers (at the moment, with some limitations related to the fact that the project is in active development and testing).
The project is accompanied by scientific seminars, which are open to all who are interested to contribute to the creation of GICR or to conduct linguistic experiments on it.