Moore, Noreen S., and Charles A. MacArthur. “Student Use of Automated Essay Evaluation Technology During Revision.” Journal of Writing Research 8.1 (2016): 149-75. Web. 23 Sept. 2016.
Noreen S. Moore and Charles A. MacArthur report on a study of 7th- and 8th-graders’ use of Automated Essay Evaluation technology (AEE) and its effects on their writing.
Moore and MacArthur define AEE as “the process of evaluating and scoring written prose via computer programs” (M. D. Shermis and J. Burstein, qtd. in Moore and MacArthur 150). The current study was part of a larger investigation of the use of AEE in K-12 classrooms (150, 153-54). Moore and MacArthur focus on students’ revision practices (154).
The authors argue that such studies are necessary because “AEE has the potential to offer more feedback and revision opportunities for students than may otherwise be available” (150). Teacher feedback, they posit, may not be “immediate” and may be “ineffective” and “inconsistent” as well as “time consuming,” while the alternative of peer feedback “requires proper training” (151). The authors also posit that AEE will increasingly become part of the writing education landscape and that teachers will benefit from “participat[ing]” in explorations of its effects (150). They argue that AEE should “complement” rather than replace teacher feedback and scoring (151).
Moore and MacArthur review extant research on two kinds of AEE, one that uses “Latent Semantic Analysis” (LSA) and one that has been “developed through model training” (152). Studies of an LSA program owned by Pearson and designed to evaluate summaries compared the program with “word-processing feedback” and showed enhanced improvement across many traits, including “quality, organization, content, use of detail, and style” as well as time spent on revision (152). Other studies also showed improvement. Moore and MacArthur note that some of these studies relied on scores from the program itself as indices of improvement and did not demonstrate any transfer of skills to contexts outside of the program (153).
Moore and MacArthur contend that their study differs from previous research in that it does not rely on “data collected by the system” but rather uses “real time” information from think-aloud protocols and semi-structured interviews to investigate students’ use of the technology. Moreover, their study reveals the kinds of revision students actually do (153). They ask:
- How do students use AEE feedback to make revisions?
- Are students motivated to make revisions while using AEE technology?
- How well do students understand the feedback from AEE, both the substantive feedback and the conventions feedback? (154)
The researchers studied six students selected to be representative of a 12-student 7th- and 8th-grade “literacy class” at a private northeastern school whose students exhibited traits “that may interfere with school success” (154). The students were in their second year of AEE use and the teacher in the third year of use. Students “supplement[ed]” their literacy work with in-class work using the “web-based MY Access!” program (154).
Moore and MacArthur report that “intellimetric” scoring used by MY Access! correlates highly with scoring by human raters (155). The software is intended to analyze “focus/coherence, organization, elaboration/development, sentence structure, and mechanics/conventions” (155).
MY Access provides feedback through MY Tutor, which responds to “non-surface” issues, and MY Editor, which addresses spelling, punctuation, and other conventions. MY Tutor provides a “one sentence revision goal”; “strategies for achieving the goal”; and “a before and after example of a student revising based on the revision goal and strategy” (156). The authors further note that “[a]lthough the MY Tutor feedback is different for each score point and genre, the same feedback is given for the same score in the same genre” (156). MY Editor responds to specific errors in each text individually.
Each student submitted a first and revised draft of a narrative and an argumentative paper, for a total of 24 drafts (156). The researchers analyzed only revisions made during the think-aloud; any revision work prior to the initial submission did not count as data (157).
Moore and MacArthur found that students used MY Tutor for non-surface feedback only when their submitted essays earned low scores (158). Two of the three students who used the feature appeared to understand the feedback and used it successfully (163). The authors report that for the students who used it successfully, MY Tutor feedback inspired a larger range of changes and more effective changes in the papers than feedback from the teacher or from self-evaluation (159). These students’ changes addressed “audience engagement, focusing, adding argumentative elements, and transitioning” (159), whereas teacher feedback primarily addressed increasing detail.
One student who scored high made substantive changes rated as “minor successes” but did not use the MY Tutor tool. This student used MY Editor and appeared to misunderstand the feedback, concentrating on changes that eliminated the “error flag” (166).
Moore and MacArthur note that all students made non-surface revisions (160), and 71% of these efforts were suggested by AEE (161). However, 54.3% of the total changes did not succeed, and MY Editor suggested 68% of these (161). The authors report that the students lacked the “technical vocabulary” to make full use of the suggestions (165); moreover, they state that “[i]n many of the instances when students disagreed with MY Editor or were confused by the feedback, the feedback seemed to be incorrect” (166). The authors report other research that corroborates their concern that grammar checkers in general may often be incorrect (166).
As limitations, the researchers point to the small sample, which, however, allowed access to “rich data” and “detailed description” of actual use (167). They note also that other AEE program might yield different results. Lack of data on revisions students made before submitting their drafts also may have affected the results (167). The authors supply appendices detailing their research methods.
Moore and MacArthur propose that because the AEE scores prompt revision, such programs can effectively augment writing instruction, but recommend that scores need to track student development so that as students score near the maximum at a given level, new criteria and scores encourage more advanced work (167-68). Teachers should model the use of the program and provide vocabulary so students better understand the feedback. Moore and MacArthur argue that effective use of such programs can help students understand criteria for writing assessment and refine their own self-evaluation processes (168).
Research recommendations include asking whether scores from AEE continue to encourage revision and investigating how AEE programs differ in procedures and effectiveness. The study did not examine teachers’ approaches to the program. Moore and MacArthur urge that stakeholders, including “the people developing the technology and the teachers, coaches, and leaders using the technology . . . collaborate” so that AEE “aligns with classroom instruction” (168-69).
Pingback: THIS WEEK AT COLLEGE COMPOSITION WEEKLY! AUTOMATED ESSAY SCORING! | Virginia S. Anderson