One of the things I’m interested in is how to use assessment (and especially peer and self assessment) as a learning activity. A method I used as a teacher, and more recently in my research, was to use a kind of ‘diagnostic and training’ exercise in which students would assess (on a rubric) sample assignments about which I have prior knowledge (i.e., I know their grades), prior to doing any other assessment.
The benefits of this model are that (1) students gain understanding of the assessment system (2) students see assessments other than their own (3) feedback can be richer, more frequent, & more collaborative, and (4) that you know the feedback will be of a certain quality because of the diagnostic/training element.
At UTS we have some wonderful practice on this very thing – Cathy Gorrie in life sciences, and Andy Leigh (also life sciences), (and probably others!) – have worked on a model using ‘benchmarking’ with exemplar cases to be marked by students (per my diagnostic), alongside flagging common errors in student responses (things to ‘watch for’ in assessment).
In Cathy’s talk today, she indicated this has largely gone well, with students tending to rate slightly lower on high marked texts (i.e. being too harsh) and higher on low marked texts (i.e. they were generous), and there was a big range of results given across the exemplars, but excellent written feedback. Markers notified if they did a poor job, and asked to re-do the benchmarking. On appeal students could write 1/2 a page to justify the mark they thought (3/307 took that opportunity, with 2 awarded).
It also looks like students are broadly satisfied with the model with few complaints (4, who just really don’t like peer marking), and those generally thought it was lazy teaching…emphasising (1) the pedagogic value of the exercise (rather than educator burden relief), and (2) (suggested to me by someone later in the day) the range of results given on the benchmarking, to flag how varied people’s interpretation of the marking criteria are, may help with this issue.
This area excites me because I think it’s a great teaching tool, and it was something I enjoyed doing with my students (as well as asking them to write sample exams, and asking them to improve poor answers). I also think there’s a lot of rich data from the technique, and there are a lot of small changes that could be explored for their impact, some of which would adapt a generic model to be closer to the kind of calibrated peer assessment at UCLA (but without buying into the particular product), for example:
- Around the benchmark, it would be interesting to know:
- if students who perform poorly, also perform poorly in the assessment
- if students who perform poorly, in fact perform poorly as assessors too
- To reduce the need for moderation and (hopefully) increase quality, it would be interesting to look at some reliability analysis of student ratings, and whether or not targeting discrepancies (or, tracking students who tended to disagree) could reduce variance in grades, and perhaps also support those students in their assessments and assignments (per ‘1’)
- Analysis of the written feedback would be interesting. In my work I asked students to suggest 3 improvements that could be made, using a drop down menu to categorise those improvements according to the rubric facets they were assessed on. Looking at the kinds of suggestions made is much easier with that kind of semantic annotation, but even just taking the written feedback and analysing the topics/themes should provide useful insight. It would also be interesting to explore whether ‘poorer’ raters also gave poorer written feedback or not (and especially if not, why is that the case?)
I’d forgotten, I actually posted a short powerpoint of the model I use, along with some teaching resources in an earlier post on this issue!