Automated Essay Scoring | College Composition Weekly: Summaries of research for college writing professionals

October 4, 2016
by vanderso 1 Comment

Moore & MacArthur. Automated Essay Evaluation. JoWR, June 2016. Posted 10/04/2016.

Moore, Noreen S., and Charles A. MacArthur. “Student Use of Automated Essay Evaluation Technology During Revision.” Journal of Writing Research 8.1 (2016): 149-75. Web. 23 Sept. 2016.

Noreen S. Moore and Charles A. MacArthur report on a study of 7th- and 8th-graders’ use of Automated Essay Evaluation technology (AEE) and its effects on their writing.

Moore and MacArthur define AEE as “the process of evaluating and scoring written prose via computer programs” (M. D. Shermis and J. Burstein, qtd. in Moore and MacArthur 150). The current study was part of a larger investigation of the use of AEE in K-12 classrooms (150, 153-54). Moore and MacArthur focus on students’ revision practices (154).

The authors argue that such studies are necessary because “AEE has the potential to offer more feedback and revision opportunities for students than may otherwise be available” (150). Teacher feedback, they posit, may not be “immediate” and may be “ineffective” and “inconsistent” as well as “time consuming,” while the alternative of peer feedback “requires proper training” (151). The authors also posit that AEE will increasingly become part of the writing education landscape and that teachers will benefit from “participat[ing]” in explorations of its effects (150). They argue that AEE should “complement” rather than replace teacher feedback and scoring (151).

Moore and MacArthur review extant research on two kinds of AEE, one that uses “Latent Semantic Analysis” (LSA) and one that has been “developed through model training” (152). Studies of an LSA program owned by Pearson and designed to evaluate summaries compared the program with “word-processing feedback” and showed enhanced improvement across many traits, including “quality, organization, content, use of detail, and style” as well as time spent on revision (152). Other studies also showed improvement. Moore and MacArthur note that some of these studies relied on scores from the program itself as indices of improvement and did not demonstrate any transfer of skills to contexts outside of the program (153).

Moore and MacArthur contend that their study differs from previous research in that it does not rely on “data collected by the system” but rather uses “real time” information from think-aloud protocols and semi-structured interviews to investigate students’ use of the technology. Moreover, their study reveals the kinds of revision students actually do (153). They ask:

How do students use AEE feedback to make revisions?

Are students motivated to make revisions while using AEE technology?

How well do students understand the feedback from AEE, both the substantive feedback and the conventions feedback? (154)

The researchers studied six students selected to be representative of a 12-student 7th- and 8th-grade “literacy class” at a private northeastern school whose students exhibited traits “that may interfere with school success” (154). The students were in their second year of AEE use and the teacher in the third year of use. Students “supplement[ed]” their literacy work with in-class work using the “web-based MY Access!” program (154).

Moore and MacArthur report that “intellimetric” scoring used by MY Access! correlates highly with scoring by human raters (155). The software is intended to analyze “focus/coherence, organization, elaboration/development, sentence structure, and mechanics/conventions” (155).

MY Access provides feedback through MY Tutor, which responds to “non-surface” issues, and MY Editor, which addresses spelling, punctuation, and other conventions. MY Tutor provides a “one sentence revision goal”; “strategies for achieving the goal”; and “a before and after example of a student revising based on the revision goal and strategy” (156). The authors further note that “[a]lthough the MY Tutor feedback is different for each score point and genre, the same feedback is given for the same score in the same genre” (156). MY Editor responds to specific errors in each text individually.

Each student submitted a first and revised draft of a narrative and an argumentative paper, for a total of 24 drafts (156). The researchers analyzed only revisions made during the think-aloud; any revision work prior to the initial submission did not count as data (157).

Moore and MacArthur found that students used MY Tutor for non-surface feedback only when their submitted essays earned low scores (158). Two of the three students who used the feature appeared to understand the feedback and used it successfully (163). The authors report that for the students who used it successfully, MY Tutor feedback inspired a larger range of changes and more effective changes in the papers than feedback from the teacher or from self-evaluation (159). These students’ changes addressed “audience engagement, focusing, adding argumentative elements, and transitioning” (159), whereas teacher feedback primarily addressed increasing detail.

One student who scored high made substantive changes rated as “minor successes” but did not use the MY Tutor tool. This student used MY Editor and appeared to misunderstand the feedback, concentrating on changes that eliminated the “error flag” (166).

Moore and MacArthur note that all students made non-surface revisions (160), and 71% of these efforts were suggested by AEE (161). However, 54.3% of the total changes did not succeed, and MY Editor suggested 68% of these (161). The authors report that the students lacked the “technical vocabulary” to make full use of the suggestions (165); moreover, they state that “[i]n many of the instances when students disagreed with MY Editor or were confused by the feedback, the feedback seemed to be incorrect” (166). The authors report other research that corroborates their concern that grammar checkers in general may often be incorrect (166).

As limitations, the researchers point to the small sample, which, however, allowed access to “rich data” and “detailed description” of actual use (167). They note also that other AEE program might yield different results. Lack of data on revisions students made before submitting their drafts also may have affected the results (167). The authors supply appendices detailing their research methods.

Moore and MacArthur propose that because the AEE scores prompt revision, such programs can effectively augment writing instruction, but recommend that scores need to track student development so that as students score near the maximum at a given level, new criteria and scores encourage more advanced work (167-68). Teachers should model the use of the program and provide vocabulary so students better understand the feedback. Moore and MacArthur argue that effective use of such programs can help students understand criteria for writing assessment and refine their own self-evaluation processes (168).

Research recommendations include asking whether scores from AEE continue to encourage revision and investigating how AEE programs differ in procedures and effectiveness. The study did not examine teachers’ approaches to the program. Moore and MacArthur urge that stakeholders, including “the people developing the technology and the teachers, coaches, and leaders using the technology . . . collaborate” so that AEE “aligns with classroom instruction” (168-69).

April 18, 2016
by vanderso

Comer and White. MOOC Assessment. CCC, Feb. 2016. Posted 04/18/2016.

Comer, Denise K., and Edward M. White. “Adventuring into MOOC Writing Assessment: Challenges, Results, and Possibilities.” College Composition and Communication 67.3 (2016): 318-59. Print.

Denise K. Comer and Edward M. White explore assessment in the “first-ever first-year-writing MOOC,” English Composition I: Achieving Expertise, developed under the auspices of the Bill & Melinda Gates Foundation, Duke University, and Coursera (320). Working with “a team of more than twenty people” with expertise in many areas of literacy and online education, Comer taught the course (321), which enrolled more than 82,000 students, 1,289 of whom received a Statement of Accomplishment indicating a grade of 70% or higher. Nearly 80% of the students “lived outside the United States” and for a majority, English was not the first language, although 59% of these said they were “proficient or fluent in written English” (320). Sixty-six percent had bachelor’s or master’s degrees.

White designed and conducted the assessment, which addressed concerns about MOOCs as educational options. The authors recognize MOOCs as “antithetical” (319) to many accepted principles in writing theory and pedagogy, such as the importance of interpersonal instructor/student interaction (319), the imperative to meet the needs of a “local context” (Brian Huot, qtd. in Comer and White 325) and a foundation in disciplinary principles (325). Yet the authors contend that as “MOOCs are persisting,” refusing to address their implications will undermine the ability of writing studies specialists to influence practices such as Automated Essay Scoring, which has already been attempted in four MOOCs (319). Designing a valid assessment, the authors state, will allow composition scholars to determine how MOOCs affect pedagogy and learning (320) and from those findings to understand more fully what MOOCs can accomplish across diverse populations and settings (321).

Comer and White stress that assessment processes extant in traditional composition contexts can contribute to a “hybrid form” applicable to the characteristics of a MOOC (324) such as the “scale” of the project and the “wide heterogeneity of learners” (324). Models for assessment in traditional environments as well as online contexts had to be combined with new approaches that addressed the “lack of direct teacher feedback and evaluation and limited accountability for peer feedback” (324).

For Comer and White, this hybrid approach must accommodate the degree to which the course combined the features of an “xMOOC” governed by a traditional academic course design with those of a “cMOOC,” in which learning occurs across “network[s]” through “connections” largely of the learners’ creation (322-23).

Learning objectives and assignments mirrored those familiar to compositionists, such as the ability to “[a]rgue and support a position” and “[i]dentify and use the stages of the writing process” (323). Students completed four major projects, the first three incorporating drafting, feedback, and revision (324). Instructional videos and optional workshops in Google Hangouts supported assignments like discussion forum participation, informal contributions, self-reflection, and peer feedback (323).

The assessment itself, designed to shed light on how best to assess such contexts, consisted of “peer feedback and evaluation,” “Self-reflection,” three surveys, and “Intensive Portfolio Rating” (325-26).

The course supported both formative and evaluative peer feedback through “highly structured rubrics” and extensive modeling (326). Students who had submitted drafts each received responses from three other students, and those who submitted final drafts received evaluations from four peers on a 1-6 scale (327). The authors argue that despite the level of support peer review requires, it is preferable to more expert-driven or automated responses because they believe that

what student writers need and desire above all else is a respectful reader who will attend to their writing with care and respond to it with understanding of its aims. (327)

They found that the formative review, although taken seriously by many students, was “uneven,” and students varied in their appreciation of the process (327-29). Meanwhile, the authors interpret the evaluative peer review as indicating that “student writing overall was successful” (330). Peer grades closely matched those of the expert graders, and, while marginally higher, were not inappropriately high (330).

The MOOC provided many opportunities for self-reflection, which the authors denote as “one of the richest growth areas” (332). They provide examples of student responses to these opportunities as evidence of committed engagement with the course; a strong desire for improvement; an appreciation of the value of both receiving and giving feedback; and awareness of opportunities for growth (332-35). More than 1400 students turned in “final reflective essays” (335).

Self-efficacy measures revealed that students exhibited an unexpectedly high level of confidence in many areas, such as “their abilities to draft, revise, edit, read critically, and summarize” (337). Somewhat lower confidence levels in their ability to give and receive feedback persuade the authors that a MOOC emphasizing peer interaction served as an “occasion to hone these skills” (337). The greatest gain occurred in this domain.

Nine “professional writing instructors” (339) assessed portfolios for 247 students who had both completed the course and opted into the IRB component (340). This assessment confirmed that while students might not be able to “rely consistently” on formative peer review, peer evaluation could effectively supplement expert grading (344).

Comer and White stress the importance of further research in a range of areas, including how best to support effective peer response; how ESL writers interact with MOOCs; what kinds of people choose MOOCs and why; and how MOOCs might function in WAC/WID situations (344-45).

The authors stress the importance of avoiding “extreme concluding statements” about the effectiveness of MOOCs based on findings such as theirs (346). Their study suggests that different learners valued the experience differently; those who found it useful did so for varied reasons. Repeating that writing studies must take responsibility for assessment in such contexts, they emphasize that “MOOCs cannot and should not replace face-to-face instruction” (346; emphasis original). However, they contend that even enrollees who interacted briefly with the MOOC left with an exposure to writing practices they would not have gained otherwise and that the students who completed the MOOC satisfactorily amounted to more students than Comer would have reached in 53 years teaching her regular FY sessions (346).

In designing assessments, the authors urge, compositionists should resist the impulse to focus solely on the “Big Data” produced by assessments at such scales (347-48). Such a focus can obscure the importance of individual learners who, they note, “bring their own priorities, objectives, and interests to the writing MOOC” (348). They advocate making assessment an activity for the learners as much as possible through self-reflection and through peer interaction, which, when effectively supported, “is almost as useful to students as expert response and is crucial to student learning” (349). Ultimately, while the MOOC did not succeed universally, it offered many students valuable writing experiences (346).

	vanderso on Wootton, Lacey. Truth-Telling…
	Lacey Wootton on Wootton, Lacey. Truth-Telling…
	When Writing Becomes… on Dush, Lisa. When Writing Becom…
	vanderso on Formo and Neary. Threshold Con…
	Kimberly Robinson Ne… on Formo and Neary. Threshold Con…

College Composition Weekly: Summaries of research for college writing professionals

Read, Comment On, and Share News of the Latest from the Rhetoric and Composition Journals

Tag Archives: Automated Essay Scoring

Moore & MacArthur. Automated Essay Evaluation. JoWR, June 2016. Posted 10/04/2016.

Comer and White. MOOC Assessment. CCC, Feb. 2016. Posted 04/18/2016.

Share this:

Share this: