The Cambridge and Nottingham Corpus of Discourse in English (CANCODE) is a collection of spoken English recorded at hundreds of locations across the British Isles in a wide variety of situations. The English Web Corpus (enTenTen) is an English corpus made up of texts collected from the Internet. A Corpus of English Dialogues 1560–1760 (CED) The CED was compiled as a tool for the study of the Early Modern period; the focus was placed on dialogues because interactive face-to-face communication is known to be an important factor in language change. The Cambridge English Corpus (formerly the Cambridge International Corpus) is a multi-billion word corpus of English language (containing both text corpus and spoken corpus data). The Cambridge English Corpus contains a number of specialized corpora: The Cambridge Business English Corpus is a large collection of British and American business language, including reports and documents, books relating to different aspects of business, and the business sections from many national newspapers. The founding partners are Cambridge University Press, Cambridge English Language Assessment, the University of Cambridge, the University of Bedfordshire, the British Council and English UK. The CEC also contains the Cambridge Learner Corpus, a 40m word corpus made up from English exam responses written by English language learners. The project's aim is to describe what learners know and can do in English at each level of the Common European Framework of Reference (CEFR). At present the Old English section of the Corpus contains 413,300 words, the Middle English section 608,600 words and the British English section 551,000 words, a total of 1,572,800 words. It contains formal and informal meetings, presentations, telephone conversations, lunchtime conversations, and spoken language from other business situations. It was created by Mark Davies, Professor of Corpus Linguistics. This is a comprehensive archive of newswire text data in English that has been acquired over several years by the LDC. While the spoken language of the past is inaccessible directly to modern speakers, it is recorded in speech related texts. Four distinct international sources of English newswire are represented here. TV Corpus: 325 million words / 75,000 episodes. The corpus contains more than one billion words of text (25+ million words each year 1990-2019) from eight genres: spoken, fiction, popular magazines, newspapers, academic texts. The corpora are built using technology specialized in collecting only linguistically valuable web content. London: Routledge. The CEC also contains the Cambridge Learner Corpus, a 40m word corpus made up from English exam responses written by English language learners. Sketch Engine currently provides access to TenTen corpora in more than 40 languages. The Cambridge Corpus of Spoken North American English (CAMSNAE) is a large collection of spoken American English. It contains a corpus of 75 million words of literature, though not all of it is English literature. American National Corpus; Bank of English; British National Corpus; Bergen Corpus of London Teenage Language (COLT); Brown Corpus, forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB; Corpus of Contemporary American English (COCA) 425 million words. The CLC contains scripts from over 180,000 students, from around 200 countries, speaking 138 different first languages and is growing all the time. Another word for corpus: collection, body, whole, compilation, entirety | Collins English Thesaurus and anyone who needs to deal with domain texts. sentences and Wikipedia definitions. Note There are 2 vowel letters and 4 consonant letters in the word corpus. Learn more in the Cambridge English-Italian Dictionary. The corpus belongs to the TenTen corpus family. Released in Spring 2006, A Corpus of English Dialogues 1560-1760 (CED) is a 1.2-million-word computerized corpus of Early Modern English speech-related texts. Wikipedia Corpus: 1.9 billion words / 4.4 million texts: Best corpus for specialized language for an almost unlimited range of topics: science, entertainment, technology, history, sports, etc. COHA: Corpus of Historical American English: 400 million words / 107,000 texts. It consists of 500 samples of Australian English (60% speech, 40% writing) that matches the structure of other ICE corpora (associated with the International corpus of English). The 17 most-represented L1 categories. The corpus was completed in 1993 and contains texts from the 1970s through the early 1990s, but no more texts have been added since. The data is based on the one billion word Corpus of Contemporary American English (COCA)-- the only corpus of English that is large, up-to-date, and balanced between many genres. COHA contains more than 400 million words of text from the 1810s-2000s (which makes it 50-100 times as large as other comparable historical corpora of English) and the corpus is balanced by genre decade by decade.