Identifying gaps in textual cohesion

One of the things I’m thinking about at the moment is how we analyse student writing and present those analyses as feedback. In the context of our work on rhetorical parsing, we currently highlight sentences making particular rhetorical ‘moves’ (contrast, highlighting novel ideas, etc.) and give an overview of topics covered through a word cloud. The OpenEssayist tool aims to visually foreground the key concepts being used and their location within a submitted text to students to encourage them to reflect on whether they seem to be talking about the right things, and how their argument looks to be developing. Coh-metrix is another tool which looks for ‘cohesion’ indicators (think of it in terms of how well the text ‘hangs together’ at different levels of granularity). In its rawest form, it outputs a set of numeric indicators to be used by researchers. In the original proposal though there was intent to build a ‘gap analysis’ feature to highlight areas in which the target features were missing.

This is mostly just me making notes for my own reference to think about how we might implement this kind of gap analysis. So, in that proposal (p.15 on), each pair of elements (sentences, paragraphs), receives a set of bidirectional cohesion indicators, thus for any pair of elements, we can identify how well they cohere across a set of measures. Coh-GIT, then, was intended to foreground this information to the author to help them target areas for improvement. Formally:

…the computer tool will automatically determine how text elements and constituents are connected for specific types of cohesion. Suppose there are 20 (2 x 2 x 5) types of cohesion (local and global, vocabulary- and grammar-driven, referential, locational, temporal, causal, and structural). Suppose that there are N elements and constituents in a particular text. There would be N x (N-1) directional cohesion connections an d [N x (N-1)/2) bidirectional connections with respect to any one of the 20 cohesion relations. We can capture the resulting set of connections in the form of a matrix (Graesser, Karnavat et al., 2000; Kintsch, 1998; Zwaan et al., 1995). A cell in the matrix is 0 if there is no particular relation between a pair of elements/constituents, a 1 if a solid relation, and intermediate values if there is a non-discrete metric. The full matrix has the entire set of connections with respect to a type of relation R, whereas the contiguous sub-matrix includes only those elements/constituents that are contiguous in the explicit text. We can define a density measure as the summation of such cell values. For example, the causal cohesion density would be computed as: [ΣR(ei,ej)]/[N x (N-1)], which is the average cell value for pairs of cells with respect to the causal relation. One could restrict this to a contiguous, bi-directional, causal cohesion density, which only considers the contiguous elements and constituents, computed as [ΣR(ei, ej| ei & ej are contiguous)]/[N+1)]. When integrating over all types of cohesion markers, one can compute an overall cohesion density score for a particular text. More importantly, however, these density scores would specify how much an entire text has cohesion markers with respect to a particular type of relation. For instance, it might be the case that different readers (low vs. high knowledge, low vs. high reading ability) rely more on one type of cohesion relation than another. A fine-grained recall analysis can be used to test the validity of the coherence and cohesion metrics produced by Coh-Metrix. Suppose that a recall protocol is collected from a sample of subjects

To assess how well this translates into gap analysis, they propose:

Preliminary experiments will be conducted to examine the face validity of Coh-GIT by comparing its output to reading experts’ and readers’ judgments of points of difficulty in texts. For our subsequent experiments, we will take advantage of eye-tracking technology that we have available to verify that Coh-GIT reliably predicts participants’ fixation times. McNamara and Kintsch (1996) found longer reading times for less cohesive expository texts and Zwaan et al. (1995) reported that reading times for sentences in narrative texts increase robustly with the number of coherence categories that have breaks in continuity. Therefore, gaze durations should be longer for words that are in sentences associated with coherence gaps than words in other sentences.

(p.23)

I would actually be more interested in whether or not foregrounding these gaps helps students to improve their texts overall, and to improve cohesion. The challenges here, as I see, are the vast range of cohesion measures and pairs one could foreground – for example, how do we foreground cohesion from the fine to coarse grains of sentential to whole-text cohesion simultaneously in an actionable way, while making use of the full set of cohesion indicators across those granularities?

Refs

Leave a Reply

Your email address will not be published. Required fields are marked *