Student writing corpora

Concordance in Sketch Engine

By Milos Jakubicek, acting on behalf of Lexical Computing (Own work) [CC BY-SA 4.0 (], via Wikimedia Commons

A blog to keep track of essay/student writing corpora I’ve encountered which might be useful in development of writing analytics.  (There is also a curated list of ‘Learner corpora around the world‘ which is more extensive)

  1. Uppsala Student English Corpus (USE) – has a range of student essay genres in the subject of English, including reflective, argumentation, and discursive
  2. Michigan Corpus of Upper-Level Student Papers, including various paper genres and levels
  3. The Louvain Corpus of Native English Essays (LOCNESS)
  4. The International Corpus of Learner English (purchase necessary) contains argumentative essays written by higher intermediate to advanced learners of English from several mother tongue backgrounds (Bulgarian, Chinese, Czech, Dutch, Finnish, French, German, Italian, Japanese, Norwegian, Polish, Russian, Spanish, Swedish, Tswana, Turkish
  5. CALE, the Corpus of Academic Learner English (in progress)
  6. British Academic Written English corpus
  7. Linguistic Data Consortium (paid subscription) essay corpora, including non-native written English
  8. MERLIN corpus contains 2,286 texts for learners of Italian, German and Czech that were taken from written examinations of acknowledged test institutions
  9. There are some other interesting lists available

Print pagePDF pageEmail page
Source: SK Blog

Leave a Reply

Your email address will not be published. Required fields are marked *