Rubrics for Assessing Writing

An Annotated Bibliography

Shurli Makmillen

Writing Centre

“Contemporary writing programs need to discover, document, and negotiate their evaluative landscape before they move to standardize and simplify it—if indeed they choose to do the latter at all” (Broad 126).

Introduction

The following is a work-in-progress aimed at teasing out some of the complexities of using rubrics in different courses, and disciplines. It does not include any “how-to” advice, as is presented in the work of John Bean, Roger Graves, and others; my goal was rather to engage with those peer reviewed studies that ask what, why, or why not, as well as to explore some of the relevant contingencies in writing assessment in general, and rubrics in particular. Before proceeding to the bibliography itself, here are a few major considerations in contemplating rubrics for your courses that emerged for us in the Writing Centre from the research summarized below:

 

  • Rubrics may not save time. And if they do we could pause to consider this motivation toward exploiting economies of scale at the expense of pedagogical benefit. 

 

  • If rubrics are used, keep in mind it can be discriminatory to give too much (some would say any) weight to “grammatical correctness” and Standard English in current contexts of linguistic (and often concurrently racial) diversity.   

 

  • Time is often better spent on non-evaluative, formative, and encouraging feedback on the content of students’ drafts and interim (scaffolded) assignments, rather than spending it filling out summative rubrics at the end of term.

 

  • Criteria described in rubrics can detract from the rhetorical nature of academic writing, such that its positive attributes can be reduced to residing “in the text,” rather than in the social action of the writing in a disciplinary community of readers.

 

  • Rubrics can be useful in guiding the feedback and revision cycle in longer, project based assignments, creating department-wide reliability, and structuring (both in terms of timing and content) interactions between students and their supervisors.

 

I have devoted a lot of space to Bob Broad’s book length study What We Really Value: Beyond Rubrics in Teaching and Assessing Writing, and for this reason its summary is placed last, after the alphabetically-listed articles.

 

 

Annotated bibliography

 

Anson, Chris. “Black Holes: Writing Across the Curriculum, and the Gravitational Invisibility of Race.” In Race and Writing Assessment. Eds. Asao B. Inoue and Mya Poe. 2012. 15-28. Print.

 

Like other articles in this collection about how race informs writing assessment, Writing Across the Curriculum (WAC) scholar Chris Anson’s chapter engages with how using strategies such as rubrics can “create outliers of” (23) students from racialized groups whose language practices might reside further from the norms of standard English than non-racialized groups. His extensive literature review demonstrates that though WAC actually attracts “people who marvel at the diversity and unpredictability of culture” (16; drawing from Thaiss), the field is nonetheless puzzlingly lacking much of an exploration of how racial and ethnic diversity informs evaluation and assessment. (One exception is Pendergast, who speaks of how “race becomes subsumed into the powerful tropes of ‘basic writer,’ ‘stranger’ to the academy, or the trope of the generalized, marginalized ‘other’” [in Anson 20].) Anson points to some of the reasons why race does not figure so much in WAC discourse (and here “WAC” includes all initiatives that seek to integrate writing-to-learn practices and pedagogies across disciplines and courses), including, most notably, the observation that writing specialists’ outreach efforts can be viewed with suspicion if they are seen to be motivated by such “politically charged subjects” (21) as racial politics, and/or because of assumptions that WAC faculty are, or should be, narrowly focused on “grammar, style and principles of classical rhetoric” (21) or on “’methods that work’ rather than interrogating prevailing attitudes about knowledge, language, and learning” (Mahala, qtd in Anson 22). Racial diversity can be an invisible and therefore unreflected-upon contingency in assessment practices like rubrics—which Anson suspects are often for the most part designed to exploit economies of scale rather than to have pedagogical benefit. Also what might be unhelpfully at work in our assessment practices and rubric design is what Anson describes as the “nervous energy”  departments and universities as a whole can experience “from the spectre of accreditation agencies” and other outside interested parties, putting pressure on us all to “narrow [our] agenda to the eradication of surface features of ‘bad English’” (27).   

 

Balester, Valerie. “How Writing Rubrics Fail: Toward a Multicultural Model.” In Race and Writing Assessment. Eds. Asao B. Inoue and Mya Poe. 2012. 63-78. Print.

 

Citing numerous scholars who have studied error correction in the teaching of writing, Balester begins with an adamant claim that focusing on error is antithetical to students’ development of rhetorical awareness, and also privileges mainstream speakers. Accordingly, she advises “revising rather than rejecting rubrics, at least for the short term” (64), and adds that they need to “embrace language variety” (64). Her own study analyzes 13 rubrics and how they characterize the category of grammar and style, on the basis of three categories: Acculturationism (aims to eradicate “bad” English, and tends to count and name errors); Accomodationism (accepts home languages and dialects as a “bridge” to standard edited academic English; embraces code-switching according to a public/private divide; and relates error to comprehension rather than the standard); and Multiculturalism (embraces College Composition and Communication’s position statement on “Students’ right to their own language”; sees the fluidity of World Englishes as empowering for all; and includes identity of the writer in the rhetorical situation being responded to in writing). Returning to the questions raised by critics—namely, “whether [rubrics] can fairly and reliably assess writing without standardizing it and without creating an unequal playing field for writers anything different from the norm defined by rubrics and raters” (71)—her answer is that rubrics need to respond to rhetorical effectiveness and appropriateness, and avoid terms like “standard,” “error,” and “rules” (74). The multiculturalism framework leaves avenues for writers to exercise more control and choice, including that of “thoughtful experimentation” (74). A multiculturalist approach to teaching writing would also include, therefore, education in the politics of language and making room for “alternative discourses” (71).

 

Birol, Gülnur, Andrea Han, Ashley Welsh, and Joanne Fox. “Impact of a First-Year Seminar in Science on Student Writing and Argumentation.” Journal of College Science Teaching.  43.1 (2013): 82-91. Web. 8 Mar. 2014.

 

These authors address evaluation in a first year science course. The study doesn’t reflect on assignment rubrics per se, as much as use them in an effort to demonstrate the positive effects of a writing-intensive approach to a first year science course at UBC.  The goal in these courses is to teach students “scientific argument,” relying on ideas about what constitutes “strong” thesis development and concerns that student writers address “both sides of an argument.” Instructions for the paper included: “develop an argument that supports your claim and explores the reasoning behind your answer. Remember to use specific examples and/or course materials to support your case.” Their example of a good argument was: “Open-net salmon farming in British Columbia should cease to expand because it damages local ecosystems, has a negative effect on employment, and poses a risk to human health.” This three part thesis is presumably exemplary to all concerned in the project, but to this observer the instructions and example thesis seems to reflect a somewhat over-simplified version of the world—and a lack of faith in students to produce a more complex and original claim—leaving little room for critical thinking beyond the agree/disagree model. The study draws from data gathered via the Student Assessment of their Learning Gains (SALG) survey (www.salgsite.org), a free web-based software that allows instructors to design their own questions for surveys of student learning. They also used a website forum for something called “Calibrated Peer Review,” https://cpr.molsci.ucla.edu/Home.aspx, another software application, not free, that supposedly “enables frequent writing assignments in any discipline, with any class size, even in large classes with limited instructional resources. … reduc[ing] the time an instructor now spends reading and assessing student writing.” One can detect the value of efficiency here, but others might argue that spending time reading and responding meaningfully to student writing is a worthwhile investment.  

 

Dryer, Dylan. “Scaling Writing Ability: A Corpus-Driven Inquiry.” Written Communication. 30.3 (2013): 3-35. Web. Apr 29, 2014.

 

Dryer conducts a corpus-based study of writing rubrics used in public research institutions across the US to conclude something about the theoretical construct of academic writing that is being enacted in this genre, especially at the border of success/fail. First Year Composition (FYC) writing assessment has seen a shift from what could be called “connoisseurship”—more of an intuition about good writing that “grounds the appraisal in the reader’s taste” (5)—to strategies that would minimize such subjective influence. Dryer collected 83 writing scales from first year writing courses from across the U.S. His corpus analysis paid attention to how language is used to describe writing across the evaluation continuum, and it revealed, for one thing, that “descriptive language diminishes with performance level, a tendency that is especially visible in the way that predicates such as ‘demonstrates,’ ‘structures, ‘establishes,’ and ‘engages’ dwindle to simple existential verbs, such as ‘is,’ ‘are,’ and ‘be.’” (16). In other words “poor writing ‘shows,’ where good writing ‘demonstrates’ and ‘addresses’”(22). Dryer finds three aspects of these rubrics to value: the language of assessment has gone beyond the “connoisseurship” language of the past (originality at one end of the scale, and childishness at the other); there is more acknowledgement of how writers attend to readers’ needs in context; and errors are seen as not always interfering with meaning. On the latter topic, even in the highest ranked categories, “few” errors occurred in more descriptions than “no” errors. Importantly, Dryer notices how different subjects are constructed as agents in these rubrics. Students are often constructed as agents of their own successes—which typically involve obediently following conventions—but not of their failings, which conceivably might, he speculates, have originated in students’ choices to resist or even satirize the called for conventions (27). And though evaluations are typically made upon the basis of reader experience, readers are hardly ever mentioned. Dryer concludes by noting that for the most part “in all traits, at all performance levels, readers’ experiences of the texts are presented as intrinsic qualities of those texts” (26), and heralds the resultant dangers of “an overgeneralized and brittle theoretical construct of writing” (28).  “The ubiquitous framing of what is local, temporal, and contingent as if it were generalizable, ahistorical, and definitive is inconsistent with what is generally known to be true [in Writing Studies scholarship] about the embeddedness of writing in complex social systems of activity”(26-27).

 

Patton, Chloe (2011) “‘Some Kind of Weird, Evil Experiment’: Student Perceptions of Peer Assessment.” Assessment & Evaluation in Higher Education. (2011): 719–731. Web. Mar 13, 2014.

 

Have you ever thought of using rubrics for the purposes of peer assessment? Patton’s data from focus groups in an Australian university indicate that students are under no illusions about the impetus for faculty to engage them in peer assessment, intuiting that the goal is not pedagogical, but rather structural, and due to instructors’ efforts to save time in a context of an “increasing scarcity of resources” (728). Her goal is to step outside of the “taken-for-granted positivist framework” (721) that undergirds studies of student self-assessment (e.g. reliability studies), and do a theory-driven study that puts these practices in the context of larger power structures. Patton draws from Tan’s distinctions between the different kinds of power governing relations within universities—“sovereign power” describes the finite power of the individual as agent; “epistemological power” governs what kinds of knowledge is valid, and who has intellectual authority; and “disciplinary power” (from Foucault) defines the way that assessment “subjects the learner to normalising judgement” (727)—to add a fourth she calls “structural power.” This names the ways that organizational and societal structures interact to inflect assessment practices in the new university with its “‘new managerialist’ policies and programmes” (729). The take-away for faculty from this article could be to instigate a practice of asking students to engage in peer assessment of work-in-progress, with well-thought out and discussed criteria, so that they recognize assessment’s role in their learning. More formative than summative, it is therefore more explicitly pedagogical rather than a veiled instrument in reducing instructor workload.   

 

Reynolds, Julie, Robin Smith, Cary Moskovitz, and Amy Sayle. “BioTAP: A Systematic Approach to Teaching Scientific Writing and Evaluating Undergraduate Theses.” BioScience.  59. 10 (2009): 896-903. Web. Mar 13, 2014.

 

Strategic plans that promote the undergraduate student research experience put strains on faculty, because of the expectation that more than just the most highly able and motivated students will be expected to engage in capstone research projects with faculty mentors. With this in mind, these authors report on a rubric they developed for the Biology department at Duke University, which was designed to monitor, facilitate, and assess students’ capstone research projects. Developed in consultation with Writing in the Disciplines faculty and their Office of Assessment, this rubric, say the authors, could be well adapted for use in all of the STEM (Science, Technology, Engineering, and Mathematics) disciplines. Reynolds et al. list three benefits of the rubric: It provides a protocol for formative feedback and revision that mimics the professional peer review process; it provides a fair basis for summative assessment based on departmentally agreed upon learning outcomes; and it provides a context for wider discussions amongst faculty, “promot[ing] valuable conversations about teaching and learning” (897). There is much to appreciate in the details of this rubric: sensible and discipline-specific details about audience:  “readers who are not necessarily specialists in the particular area of the student's research but who have a solid understanding of basic biology—specifically, any faculty member in the biology department regardless of subdiscipline" (897); a peer review process that sees “writing a thesis [as] a great opportunity for students to transition from the role of passive student to that of engaged writer” (898); directions for faculty—rooted in Writing Studies research that determines what is pedagogically more efficient—to give reader-based feedback and avoid editing; advice to create, in consultation with writing pedagogy experts, thesis writing courses for students; and a website to support those who want to use the rubric for undergraduate theses in biology, or adapt it for other STEM disciplines: http://www.science-writing.org/biotap.html The success and usefulness of this rubric, then, seems to be rooted in the process it initiated amongst faculty, and in the routines faculty and students maintained as part of the process of the capstone research project. 

 

Broad, Bob. What We Really Value: Beyond Rubrics in Teaching and Assessing Writing. Logan, UT: Utah State UP, 2003.

 

This extensive study of a university’s assessment practices in their First Year Composition (FYC) courses has as a starting point the claim that “a teacher of writing cannot provide an adequate account of his rhetorical values just by sitting down and reflecting on them” (Broad 3). Broad’s overall goal is to promote what he calls Dynamic Criteria Mapping (DCM)—a protocol for writing faculty to undertake en masse, and an idea that came to him while he explored the “mainly uncharted territory in the study of rhetorical judgment” (117), namely contextually dependant influences on how we evaluate student writing. Broad provides a comprehensive and insightful history of assessment of writing and its roots in the Educational Testing Service. The ETS initially responded to research that showed chronic and profound shortcomings in consistency amongst readers evaluating the same assignments. But instead of gaining a deeper understanding of the complexities of “what we really value,” they responded with a simplified, systematized, and standardized protocol for evaluation in the form of a rubric. As Broad puts it: “Confronted with an apparent wilderness of rhetorical values, they retreated to a simplified, ordered, well-controlled representation that would keep future writing assessment efforts clean of disturbing features as dissent, diversity, context-sensitivity, and ambiguity” (6).  On the good side, at least, the ETS did make writing assessment a topic of interest and focus, and they paved the way for future considerations of “legitimacy, affordability, and accountability” (9) in writing assessment. And things improved in the field once attention shifted to pedagogy; on pedagogy’s account it was noted that validity of assessment can be maintained without inter-rater reliability, validity being defined as “not a quality of an assessment instrument (a test or a portfolio assessment, for example), but rather a quality of the decisions people make on the basis of such instruments” (10).  Rubrics, in this model, are designed not on the basis of compliance with rules about writing, or the genres of writing. They can be useful primarily to engender “open and meaningful discussion” (12). 

 

Broad developed Dynamic Criteria Mapping (DCM)—a site-based, rhetorically sensitive system for departments to explore what they value in their students’ writing—as a result of a long term study at one institution, filling a research gap in the otherwise mainly theoretical work on this topic.  His question about what instructors really value came to him after he was well into the last half of the project (29), which involved numerous faculty coming to terms with a scoring guide in a series of norming sessions, and a focus on the particular salience of the criteria as pass/fail judgements were made for introductory writing students.  The study is incredibly detailed in ways too extensive to do justice to here, but there is a wealth of material for those interested. A main finding comes in the form of Broad discovering that this writing program had some problems with how instructors could assess many factors in contradictory ways. For example, he came to realize that evaluators resided in one of two camps regarding revision, some privileging revised writing over unrevised (e.g. in-class writing) and vice versa. The mission statement for the Writing Program stated that revision was a value in writing development, but some saw unrevised writing “as a more reliable indicator of a writer’s ‘true’ ability and a test of whether the author may have plagiarized or received undue assistance in the revised texts,” a situation which is “likely to lead to dramatically different pass/fail judgments” (46).  DCM, says Broad, was able to “clarify[y] this rift”—as well as others that lead to major discrepancies in pass/fail decisions—for the benefit of program. It became apparent, for example, that one person’s subtlety can be another person’s cliché. Intentionality also played a role for some evaluators, as in whether a literary effect could be praised or not according to assumptions about whether it was evidence of control over language or whether it happened accidentally (53-54). The question of intentionality is very hard to answer, yet decisions were being made based on assumptions about it. Similarly, there is the problem of self-expression, and whether or not is seen as “sincere,” “honest” and “raw,” or evidence of someone not in control of their subject matter or its expression (55).  [It is worth noting here that this could also be seen as a reflection of assignment design, in that the corpus of essays—as is typical of first year writing in US colleges—would fit more into the expressivist tradition (write about an event from your past; present a portrait of a person that influenced you) than assignments designed around the genres of academic writing.]

 

Of particular interest is Broad’s chapter on contextual criteria, which play a role but are typically delegitimized and therefore excluded from rubrics.  Broad mined his transcription data to rank comments fitting into this category, and noted that the most discussed contextual criteria was “standards.” (The most discussed textual criteria, by the way, was “mechanics.”) Standards reflected three areas of concern: the big (and it turns out unanswerable) question of the main function of the course; the fuzzy boundary between passing and failing; and the moving boundary between passing and failing. It turns out that the department hadn’t stipulated whether or not “’basic competency’ should be viewed as minimal or substantial” (76), in other words should the importance be placed on whether the student passed a minimal threshold, or whether or not they were substantially prepared for the next course? Whether in considerations of individual assignments, or of the end-of-term portfolios, rather than treat contextual factors as part of the rhetorical situation here—and to embrace that as a more holistic guide—the instructors sought to nail down and agree upon the actual threshold concretely.  Broad sums up that the program “could conceivably have committed itself more fully to the rhetorical, postpositivist paradigm on which its portfolio program already drew heavily. Had it done so, it might have been able to loosen its grip on the ‘standard-setting’ goal of securing independent and prior evaluative agreement among evaluators” (78).

 

                Issues in the norming sessions amongst faculty and TAs varied according to whether the texts were sample essays (where assessors could only imagine the real student author) or whether they were essays in the courses currently being taught (where at least one assessor had knowledge of the student in question, and their other written work). In the latter case, there was more impetus for the “world [to] intrude” (79), including the consequences (emotional, financial) to real students of having to repeat the course. For example, about one student, an instructor said: “It’s taken him a lot to open out and really work in the class. . . . I know that if he fails he’s going to quit trying” (85). Broad was struck by how much “teacher’s special knowledge” of students affected decisions, even though this type of contextual knowledge was “contraband” in this setting. Higher and more stringent demands were made for the sample essays where there was no opportunity for this type of knowledge to enter into the discussion, although student information could be “imagined,” and this imagined information—about age and ethnicity, for example—did find its way into rationalizations for pass/fail decisions.  

 

The last contextual factor I will summarize here is that of plagiarism/originality. Assessors were divided on the degree to which modeling was okay, as in when a student uses a form of a published model essay and plugs in their own content, as illustrated by these two comments: “That’s not plagiarism when you take a form and re-apply it. That’s how—that’s the way we used to teach writing . .” versus “I don’t want them to use exact sentence structure and phrasing just like some of the things we’ve read, or that they’ve read” (102).  Of course, Broad’s goal here is to promote his DCM, arguing that, “rather than drive [such contingencies] underground by insisting that instructors evaluate according to a conventional rubric, DCM can make such criteria available for discussion, negotiation, and informed policy decision. Writing programs can then publish their positions on such issues for the benefit of students and other stakeholders” (94).  He continues: “Since evaluative deliberations are clearly based on not only textual but also contextual considerations, any effort to tell the truth of assessment in a writing program must include mapping or charting of both [textual and contextual] realms and of their interrelationships” (117-118). One contingency he mentioned may be even more difficult to address, i.e. the institutional positions of those doing the ranking, and related time factors and working conditions: “In the end, the level of performance we expect from students in a given context may be inevitably linked to issues of pedagogy, ethics, and professional status like those in which City University’s writing instructors found themselves entangled” (81). 

 

If Broad’s primary goal here is to facilitate student learning, his secondary goal would be end the situation of faculty grading in isolation in favour of communal writing assessment. “DCM is fun—an intellectual, rhetorical, and pedagogical party,” he writes. “DCM reveals and highlights the complex, conflicted, communal quilt of rhetorical values that any group of composition instructors inevitably creates in the very act of teaching and assessing writing together. For all these reasons, students and teachers of writing need the truth about writing assessment. It is our responsibility to help them discover, compose, revise, and publish that truth” (120). On the question of using rubrics or not, he doesn’t proffer an opinion, but rather seeks to make public the ongoing realities of evaluation in first year writing programs: “Whereas traditional rubric development seems to focus on qualities of students’ texts, Dynamic Criteria Mapping brings to light the dynamics by which instructors assess students’ writing. DCM therefore constitutes a ‘phenomenology of shared writing assessment’ (127-8; drawing from Elbow). Broad is intent on bringing what we really value to light, using a system that involves conversations with other faculty and with students in an ongoing qualitative inquiry: “Contemporary writing programs need to discover, document, and negotiate their evaluative landscape before they move to standardize and simplify it—if indeed they choose to do the latter at all” (126).

Twitter Facebook Linkedin Flikr UFV on Google+ YouTube goUFV