College Composition Weekly: Summaries of research for college writing professionals

Read, Comment On, and Share News of the Latest from the Rhetoric and Composition Journals


Pruchnic et al. Mixed Methods in Direct Assessment. J or Writ Assessment, 2018. Posted 12/01/2018.

Pruchnic, Jeff, Chris Susak, Jared Grogan, Sarah Primeau, Joe Torok, Thomas Trimble, Tanina Foster, and Ellen Barton. “Slouching Toward Sustainability: Mixed Methods in the Direct Assessment of Student Writing.” Journal of Writing Assessment 11.1 (2018). Web. 27 Nov. 2018.

[Page numbers from pdf generated from the print dialogue]

Jeff Pruchnic, Chris Susak, Jared Grogan, Sarah Primeau, Joe Torok, Thomas Trimble, Tanina Foster, and Ellen Barton report on an assessment of “reflection argument essay[s]” from the first-year-composition population of a large, urban, public research university (6). Their assessment used “mixed methods,” including a “thin-slice” approach (1). The authors suggest that this method can address difficulties faced by many writing programs in implementing effective assessments.

The authors note that many stakeholders to whom writing programs must report value large-scale quantitative assessments (1). They write that the validity of such assessments is often measured in terms of statistically determined interrater reliability (IRR) and samples considered large enough to adequately represent the population (1).

Administrators and faculty of writing programs often find that implementing this model requires time and resources that may not be readily available, even for smaller programs. Critics of this model note that one of its requirements, high interrater reliability, can too easily come to stand in for validity (2); in the view of Peter Elbow, such assessments favor “scoring” over “discussion” of the results (3). Moreover, according to the authors, critics point to the “problematic decontextualization of program goals and student achievement” that large-scale assessments can foster (1).

In contrast, Pruchnic et al. report, writing programs have tended to value the “qualitative assessment of a smaller sample size” because such models more likely produce the information needed for “the kinds of curricular changes that will improve instruction” (1). Writing programs, the authors maintain, have turned to redefining a valid process as one that can provide this kind of information (3).

Pruchnic et al. write that this resistance to statistically sanctioned assessments has created a bind for writing programs. Pruchnic et al. cite scholars like Peggy O’Neill (2) and Richard Haswell (3) to posit that when writing programs refuse the measures of validity required by external stakeholders, they risk having their conclusions dismissed and may well find themselves subject to outside intervention (3). Haswell’s article “Fighting Number with Number” proposes producing quantitative data as a rhetorical defense against external criticism (3).

In the view of the authors, writing programs are still faced with “sustainability” concerns:

The more time one spends attempting to perform quantitative assessment at the size and scope that would satisfy statistical reliability and validity, the less time . . . one would have to spend determining and implementing the curricular practices that would support the learning that instructors truly value. (4)

Hoping to address this bind, Pruchnic et al. write of turning to a method developed in social studies to analyze “lengthy face-to-face social and institutional interactions” (5). In a “thin-slice” methodology, raters use a common rubric to score small segments of the longer event. The authors report that raters using this method were able to predict outcomes, such as the number of surgery malpractice claims or teacher-evaluation results, as accurately as those scoring the entire data set (5).

To test this method, Pruchnic et al. created two teams, a “Regular” and a “Research” team. The study compared interrater reliability, “correlation of scores,” and the time involved to determine how closely the Research raters, scoring thin slices of the assessment data, matched the work of the Regular raters (5).

Pruchnic et al. provide a detailed description of their institution and writing program (6). The university’s assessment approach is based on Edward White’s “Phase 2 assessment model,” which involves portfolios with a final reflective essay, the prompt for which asks students to write an evidence-based argument about their achievements in relation to the course outcomes (8). The authors note that limited resources gradually reduced the amount of student writing that was actually read, as raters moved from full-fledged portfolio grading to reading only the final essay (7). The challenges of assessing even this limited amount of student work led to a sample that consisted of only 6-12% of the course enrollment.

The authors contend that this is not a representative sample; as a result, “we were making decisions about curricular and other matters that were not based upon a solid understanding of the writing of our entire student body” (7). The assessment, in the authors’ view, therefore did not meet necessary standards of reliability and validity.

The authors describe developing the rubric to be used by both the Research and Regular teams from the precise prompt for the essay (8). They used a “sampling calculator” to determine that, given the total of 1,174 essays submitted, 290 papers would constitute a representative sample; instructors were asked for specific, randomly selected papers to create a sample of 291 essays (7-8).

The Regular team worked in two-member pairs, both members of each pair reading the entire essay, with third readers called in as needed (8): “[E]ach essay was read and scored by only one two-member team” (9). The authors used “double coding” in which one-fifth of the essays were read by a second team to establish IRR (9). In contrast, the 10-member Research team was divided into two groups, each of which scored half the essays. These readers were given material from “the beginning, middle, and end” of each essay: the first paragraph, the final paragraph, and a paragraph selected from the middle page or pages of the essay, depending on its length. Raters scored the slices individually; the averaged five team members’ scores constituted the final scores for each paper (9).

Pruchnic et al. discuss in detail their process for determining reliability and for correlating the scores given by the Regular and Research teams to determine whether the two groups were scoring similarly. Analysis of interrater reliability revealed that the Research Team’s IRR was “one full classification higher” than that of the Regular readers (12). Scores correlated at the “low positive” level, but the correlation was statistically significant (13). Finally, the Research team as a whole spent “a little more than half the time” scoring than the Regular group, while individual average scoring times for Research team members was less than half of the scoring time of the Regular members (13).

Additionally, the assessment included holistic readings of 16 essays randomly representing the four quantitative result classifications of Poor through Good (11). This assessment allowed the authors to determine the qualities characterizing essays ranked at different levels and to address the pedagogical implications within their program (15, 16).

The authors conclude that thin-slice scoring, while not always the best choice in every context (16), “can be added to the Writing Studies toolkit for large-scale direct assessment of evaluative reflective writing” (14). Future research, they propose, should address the use of this method to assess other writing outcomes (17). Paired with a qualitative assessment, they argue, a mixed-method approach that includes thin-slice analysis as an option can help satisfy the need for statistically grounded data in administrative and public settings (16) while enabling strong curricular development, ideally resulting in “the best of both worlds” (18).


Bowden, Darsie. Student Perspectives on Paper Comments. J of Writing Assessment, 2018. Posted 04/14/2018.

Bowden, Darsie. “Comments on Student Papers: Student Perspectives.” Journal of Writing Assessment 11.1 (2018). Web. 8 Apr. 2018.

Darsie Bowden reports on a study of students’ responses to teachers’ written comments in a first-year writing class at DePaul University, a four-year, private Catholic institution. Forty-seven students recruited from thirteen composition sections provided first drafts with comments and final drafts, and participated in two half-hour interviews. Students received a $25 bookstore gift certificate for completing the study.

Composition classes at DePaul use the 2000 version of the Council of Writing Program Administrators’ (WPA) Outcomes to structure and assess the curriculum. Of the thirteen instructors whose students were involved in the project, four were full-time non-tenure track and nine were adjuncts; Bowden notes that seven of the thirteen “had graduate training in composition and rhetoric,” and all ”had training and familiarity with the scholarship in the field.” All instructors selected were regular attendees at workshops that included guidance on responding to student writing.

For the study, instructors used Microsoft Word’s comment tool in order to make student experiences consistent. Both comments and interview transcripts were coded. Comment types were classified as “in-draft” corrections (actual changes made “in the student’s text itself”); “marginal”; and “end,” with comments further classified as “surface-level” or “substance-level.”

Bowden and her research team of graduate teaching assistants drew on “grounded theory methodologies” that relied on observation to generate questions and hypotheses rather than on preformed hypotheses. The team’s research questions were

  • How do students understand and react to instructor comments?
  • What influences students’ process of moving from teacher comments to paper revision?
  • What comments do students ignore and why?

Ultimately the third question was subsumed by the first two.

Bowden’s literature review focuses on ongoing efforts by Nancy Sommers and others to understand which comments actually lead to effective revision. Bowden argues that research often addresses “the teachers’ perspective rather than that of their students” and that it tends to assess the effectiveness of comments by how they “manifest themselves in changes in subsequent drafts.” The author cites J. M. Fife and P. O’Neill to contend that the relationship between comments and effects in drafts is not “linear” and that clear causal connections may be hard to discern. Bowden presents her study as an attempt to understand students’ actual thinking processes as they address comments.

The research team found that on 53% of the drafts, no in-draft notations were provided. Bowden reports on variations in length and frequency in the 455 marginal comments they examined and as well as in the end comments that appeared in almost all of the 47 drafts. The number of substance-level comments exceeded that of surface-level comments.

Her findings accord with much research in discovering that students “took [comments] seriously”; they “tried to understand them, and they worked to figure out what, if anything, to do in response.” Students emphasized comments that asked questions, explained responses, opened conversations, and “invited them to be part of the college community.” Arguing that such substance-level comments were “generative” for students, Bowden presents several examples of interview exchanges, some illustrating responses in which the comments motivated the student to think beyond the specific content of the comment itself. Students often noted that teachers’ input in first-year writing was much more extensive than that of their high school teachers.

Concerns about “confusion” occurred in 74% of the interviews. Among strategies for dealing with confusion were “ignor[ing] the comment completely,” trying to act on the comment without understanding it, or writing around the confusing element by changing the wording or structure. Nineteen students “worked through the confusion,” and seven consulted their teachers.

The interviews revealed that in-class activities like discussion and explanation impacted students’ attempts to respond to comments, as did outside factors like stress and time management. In discussions about final drafts, students revealed seeking feedback from additional readers, like parents or friends. They were also more likely to mention peer review in the second interview; although some mentioned the writing center, none made use of the writing center for drafts included in the study.

Bowden found that students “were significantly preoccupied with grades.” As a result, determining “what the teacher wants” and concerns about having “points taken off” were salient issues for many. Bowden notes that interviews suggested a desire of some students to “exert their own authority” in rejecting suggested revisions, but she maintains that this effort often “butts up against a concern about grades and scores” that may attenuate the positive effects of some comments.

Bowden reiterates that students spoke appreciatively of comments that encouraged “conversations about ideas, texts, readers, and their own subject positions as writers” and of those that recognized students’ own contributions to their work. Yet, she notes, the variety of factors influencing students’ responses to comments, including, for example, cultural differences and social interactions in the classroom, make it difficult to pinpoint the most effective kind of comment. Given these variables, Bowden writes, “It is small wonder, then, that even the ‘best’ comments may not result in an improved draft.”

The author discusses strategies to ameliorate the degree to which an emphasis on grades may interfere with learning, including contract grading, portfolio grading, and reflective assignments. However, she concludes, even reflective papers, which are themselves written for grades, may disguise what actually occurs when students confront instructor comments. Ultimately Bowden contends that the interviews conducted for her study contain better evidence of “the less ‘visible’ work of learning” than do the draft revisions themselves. She offers three examples of students who were, in her view,

thinking through comments in relationship to what they already knew, what they needed to know and do, and what their goals were at this particular moment in time.

She considers such activities “problem-solving” even though the problem could not be solved in time to affect the final draft.

Bowden notes that her study population is not representative of the broad range of students in writing classes at other kinds of institutions. She recommends further work geared toward understanding how teacher feedback can encourage the “habits of mind” denoted as the goal of learning by the2010 Framework for Success in Postsecondary Writing produced by the WPA, the National Council of Teachers of English, and the National Writing Project. Such understanding, she contends, can be effective in dealing with administrators and stakeholders outside of the classroom.


Litterio, Lisa M. Contract Grading: A Case Study. J of Writing Assessment, 2016. Posted 04/20/2017.

Litterio, Lisa M. “Contract Grading in a Technical Writing Classroom: A Case Study.” Journal of Writing Assessment 9.2 (2016). Web. 05 Apr. 2017.

In an online issue of the Journal of Writing Assessment, Lisa M. Litterio, who characterizes herself as “a new instructor of technical writing,” discusses her experience implementing a contract grading system in a technical writing class at a state university in the northeast. Her “exploratory study” was intended to examine student attitudes toward the contract-grading process, with a particular focus on how the method affected their understanding of “quality” in technical documents.

Litterio’s research into contract grading suggests that it can have the effect of supporting a process approach to writing as students consider the elements that contribute to an “excellent” response to an assignment. Moreover, Litterio contends, because it creates a more democratic classroom environment and empowers students to take charge of their writing, contract grading also supports critical pedagogy in the Freirean model. Litterio draws on research to support the additional claim that contract grading “mimic[s] professional practices” in that “negotiating and renegotiating a document” as students do in contracting for grades is a practice that “extends beyond the classroom into a workplace environment.”

Much of the research she reports dates to the 1970s and 1980s, often reflecting work in speech communication, but she cites as well models from Ira Shor, Jane Danielewicz and Peter Elbow, and Asao Inoue from the 2000s. In a common model, students can negotiate the quantity of work that must be done to earn a particular grade, but the instructor retains the right to assess quality and to assign the final grade. Litterio depicts her own implementation as a departure from some of these models in that she did make the final assessment, but applied criteria devised collaboratively by the students; moreover, her study differs from earlier reports of contract grading in that it focuses on the students’ attitudes toward the process.

Her Fall 2014 course, which she characterizes as a service course, enrolled twenty juniors and seniors representing seven majors. Neither Litterio nor any of the students were familiar with contract grading, and no students withdrew on learning from the syllabus and class announcements of Litterio’s grading intentions. At mid-semester and again at the end of the course, Litterio administered an anonymous open-ended survey to document student responses. Adopting the role of “teacher-researcher,” Litterio hoped to learn whether involvement in the generation of criteria led students to a deeper awareness of the rhetorical nature of their projects, as well as to “more involvement in the grading process and more of an understanding of principles discussed in technical writing, such as usability and document design.”

Litterio shares the contract options, which allowed students to agree to produce a stated number of assignments of either “excellent,” “great,” or “good” quality, an “entirely positive grading schema” that draws on Frances Zak’s claim that positive evaluations improved student “authority over their writing.”

The criteria for each assignment were developed in class discussion through an open voting process that resulted in general, if not absolute, agreement. Litterio provides the class-generated criteria for a resumé, which included length, format, and the expectations of “specific and strong verbs.” As the instructor, Litterio ultimately decided whether these criteria were met.

Mid-semester surveys indicated that students were evenly split in their preferences for traditional grading models versus the contract-grading model being applied. At the end of the semester, 15 of the 20 students expressed a preference for traditional grading.

Litterio coded the survey responses and discovered specific areas of resistance. First, some students cited the unfamiliarity of the contract model, which made it harder for them to “track [their] own grades,” in one student’s words. Second, the students noted that the instructor’s role in applying the criteria did not differ appreciably from instructors’ traditional role as it retained the “bias and subjectivity” the students associated with a single person’s definition of terms like “strong language.” Students wrote that “[i]t doesn’t really make a difference in the end grade anyway, so it doesn’t push people to work harder,” and “it appears more like traditional grading where [the teacher] decide[s], not us.”

In addition, students resisted seeing themselves and their peers as qualified to generate valid criteria and to offer feedback on developing drafts. Students wrote of the desire for “more input from you vs. the class,” their sense that student-generated criteria were merely “cosmetics,” and their discomfort with “autonomy.” Litterio attributes this resistance to the role of expertise to students’ actual novice status as well as to the nature of the course, which required students to write for different discourse communities because of their differing majors. She suggests that contract grading may be more appropriate for writing courses within majors, in which students may be more familiar with the specific nature of writing in a particular discipline.

However, students did confirm that the process of generating criteria made them more aware of the elements involved in producing exemplary documents in the different genres. Incorporating student input into the assessment process, Litterio believes, allows instructors to be more reflective about the nature of assessment in general, including the risk of creating a “yes or no . . . dichotomy that did not allow for the discussions and subjectivity” involved in applying a criterion. Engaging students throughout the assessment process, she contends, provides them with more agency and more opportunity to understand how assessment works. Student comments reflect an appreciation of having a “voice.”

This study, Litterio contends, challenges the assumption that contract grading is necessarily “more egalitarian, positive, [and] student-centered.” The process can still strike students as biased and based entirely on the instructor’s perspective, she found. She argues that the reflection on the relationship between student and teacher roles enabled by contract grading can lead students to a deeper understanding of “collective norms and contexts of their actions as they enter into the professional world.”


Addison, Joanne. Common Core in College Classrooms. Journal of Writing Assessment, Nov. 2015. Posted 12/03/2015.

Addison, Joanne. “Shifting the Locus of Control: Why the Common Core State Standards and Emerging Standardized Tests May Reshape College Writing Classrooms.” Journal of Writing Assessment 8.1 (2015): 1-11. Web. 20 Nov. 2015.

Joanne Addison offers a detailed account of moves by testing companies and philanthropists to extend the influence of the Common Core State Standards Initiative (CCSSI) to higher education. Addison reports that these entities are building “networks of influence” (1) that will shift agency from teachers and local institutions to corporate interests. She urges writing professionals to pay close attention to this movement and to work to retain and restore teacher control over writing instruction.

Addison writes that a number of organizations are attempting to align college writing instruction with the CCSS movement currently garnering attention in K-12 institutions. This alignment, she documents, is proceeding despite criticisms of the Common Core Standards for demanding skills that are “not developmentally appropriate,” for ignoring crucial issues like “the impact of poverty on educational opportunity,” and for the “massive increase” in investment in and reliance on standardized testing (1). But even if these challenges succeed in scaling back the standards, she contends, too many teachers, textbooks, and educational practices will have been influenced by the CCSSI for its effects to dissipate entirely (1). Control of professional development practices by corporations and specific philanthropies, in particular, will link college writing instruction to the Common Core initiative (2).

Addison connects the investment in the Common Core to the “accountability movement” (2) in which colleges are expected to demonstrate the “value added” by their offerings as students move through their curriculum (5). Of equal concern, in Addison’s view, is the increasing use of standardized test scores in college admissions and placement; she notes, for example, “640 colleges and universities” in her home state of Colorado that have “committed to participate” in the Partnership for Assessment of Readiness for College and Career (PARCC) by using standardized tests created by the organization in admissions and placement; she points to an additional 200 institutions that have agreed to use a test generated by the Smarter Balanced Assessment Consortium (SBAC) (2).

In her view, such commitments are problematic not only because they use single-measure tools rather than more comprehensive, pedagogically sound decision-making protocols but also because they result from the efforts of groups like the English Language Arts Work Group for CCSSI, the membership of which is composed of executives from testing companies, supplemented with only one “retired English professor” and “[e]xactly zero practicing teachers” (3).

Addison argues that materials generated by organizations committed to promoting the CCSSI show signs of supplanting more pedagogically sound initiatives like NCTE’s Read-Write-Think program (4). To illustrate how she believes the CCSSI has challenged more legitimate models of professional development, she discusses the relationship between CCSSI-linked coalitions and the National Writing Project.

She writes that in 2011, funds for the National Writing Project were shifted to the president’s Race to the Top (3). Some funding was subsequently restored, but grants from the Bill and Melinda Gates Foundation specifically supported National Writing Project sites that worked with an entity called the Literacy Design Collaborative (LDC) to promote the use of the Common Core Standards in assignment design and to require the use of a “jurying rubric ” intended to measure the fit with the Standards in evaluating student work (National Writing Project, 2014, qtd. in Addison 4). According to Addison, “even the briefest internet search reveals a long list of school districts, nonprofits, unions, and others that advocate the LDC approach to professional development” (4). Addison contends that teachers have had little voice in developing these course-design and assessment tools and are unable, under these protocols, to refine instruction and assessment to fit local needs (4).

Addison expresses further concern about the lack of teacher input in the design, administration, and weight assigned to the standardized testing used to measure “value added” and thus hold teachers and institutions accountable for student success. A number of organizations largely funded by the Bill and Melinda Gates Foundation promote the use of “performance-based” standardized tests given to entering college students and again to seniors (5-6). One such test, the Collegiate Learning Assessment (CLA), is now used by “700 higher education institutions” (5). Addison notes that nine English professors were among the 32 college professors who worked on the development and use of this test; however, all were drawn from “CLA Performance Test Academies” designed to promote the “use of performance-based assessments in the classroom,” and the professors’ specialties were not provided (5-6).

A study conducted using a similar test, the Common Core State Standards Validation Assessment (CCSSAV) indicated that the test did provide some predictive power, but high-school GPA was a better indicator of student success in higher education (6). In all, Addison reports four different studies that similarly found that the predictor of choice was high-school GPA, which, she says, improves on the snapshot of a single moment supplied by a test, instead measuring a range of facets of student abilities and achievements across multiple contexts (6).

Addison attributes much of the movement toward CCSSI-based protocols to the rise of “advocacy philanthropy,” which shifts giving from capital improvements and research to large-scale reform movements (7). While scholars like Cassie Hall see some benefits in this shift, for example in the ability to spotlight “important problems” and “bring key actors together,” concerns, according to Addison’s reading of Hall, include

the lack of external accountability, stifling innovation (and I would add diversity) by offering large-scale, prescriptive grants, and an unprecedented level of influence over state and government policies. (7)

She further cites Hall’s concern that this shift will siphon money from “field-initiated academic research” and will engender “a growing lack of trust in higher education” that will lead to even more restrictions on teacher agency (7).

Addison’s recommendations for addressing the influx of CCSSI-based influences include aggressively questioning our own institutions’ commitments to facets of the initiative, using the “15% guideline” within which states can supplement the Standards, building competing coalitions to advocate for best practices, and engaging in public forums, even where such writing is not recognized in tenure-and-promotion decisions, to “place teachers’ professional judgment at the center of education and help establish them as leaders in assessment” (8). Such efforts, in her view, must serve the effort to identify assessment as a tool for learning rather than control (7-8).

Access this article at http://journalofwritingassessment.org/article.php?article=82