HomepageISTEEdSurge
Skip to content
ascd logo

Log in to Witsby: ASCD’s Next-Generation Professional Learning and Credentialing Platform
Join ASCD
February 1, 2025
Vol. 82
No. 5
Research Alert

Can AI Assess Student Learning?

author avatar

    premium resources logo

    Premium Resource

    Technology
    Illustration of a magic wand on top of a stack of papers
    Credit: Shutterstock
      Teachers who have found themselves racking their brains to create suitable test items or staring forlornly at a stack of ungraded papers have likely wished they could wave a magic wand, Harry Potter-style, to get that test to write itself or whisk those ungraded papers into a stack of graded ones. Could AI grant that wish?
      Two recent conference papers offer some interesting insights into whether AI, specifically the ChatGPT-4 Large Language Model (LLM), can accurately offer just such a magic wand.
      In the first study, researchers at Stanford University examined the viability of using ChatGPT-4 to generate test items to assess sentence reading efficiency, a specific aspect of reading fluency that requires students to read simple sentences (such as, “Children play on the playground.”) and answer whether the statements in them are true or false.
      Regularly tracking student progress on this measure requires building a high-quality item bank of hundreds of sentences—no small task. So, the research team wanted to see if ChatGPT-4 could build those items. They compared the results of testing 234 students with expert-generated sentences versus those created by the LLM (130 true and 130 false sentences in each set). Remarkably, the results showed that students scored similarly on the expert- and AI-created sentences. However, humans were still needed to initially screen the AI sentences to ensure they weren’t ambiguous (“A hill is flat and square.”), dangerous (“Babies drink gasoline.”), or subjective (“Dolls are fun to play with.”).
      In the second study, researchers in the UK used a dataset of 1,700 student responses to open-ended science and history questions that were scored by humans to train ChatGPT-4 to score the responses (as correct or incorrect). ChatGPT-4 scoring of student responses matched human scoring 85 percent of the time—similar, as it turns out, to the level of agreement (87 percent) among humans themselves. Short-answer responses to open-ended questions are more effective ways to assess student learning, the researchers note, yet teachers often rely heavily on multiple-choice test items because they are easier to grade. If AI can effectively evaluate short-answer responses, it has the potential to elevate the quality of student learning and save teachers time.
      Neither study suggests AI offers a magic wand for developing and scoring classroom assessments—at least not yet. Though with some caution and common sense, LLMs may help teachers do both, making the essential work of assessment and grading a bit easier.
      End Notes

      1 Zelikman, E., Ma, W., Tran, J., Yang, D., Yeatman, J., & Haber, N. (2023). Generating and evaluating tests for K-12 students with language model simulations: A case study on sentence reading ­efficiency. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 2190–2205). ­Singapore: ­Association for Computational Linguistics.

      2 Henkel, O., Hills, L., Boxer, A., Roberts, B., & Levonian, Z. (2024, July). Can large language models make the grade? An empirical study evaluating LLMs ability to mark short answer questions in K-12 education. In Proceedings of the Eleventh ACM Conference on Learning@ Scale (pp. 300–304). Atlanta: Association for Computing Machinery.

      Bryan Goodwin is the president and CEO of McREL International, a Denver-based nonprofit education research and development organization. Goodwin, a former teacher and journalist, has been at McREL for more than 20 years, serving previously as chief operating officer and director of communications and marketing. Goodwin writes a monthly research column for Educational Leadership and presents research findings and insights to audiences across the United States and in Canada, the Middle East, and Australia.

      Learn More

      ASCD is a community dedicated to educators' professional growth and well-being.

      Let us help you put your vision into action.
      Related Articles
      View all
      undefined
      Technology
      Deeper Learning, Not Passive Compliance
      Tony Frontier
      in 2 days

      undefined
      Reclaiming Time for Instructional Leadership
      Vickie Echols
      in 2 days

      undefined
      How to Unpack a Learning Standard Using ChatGPT
      Myron Dueck
      in 2 days

      undefined
      Tell Us About
      Educational Leadership Staff
      in 2 days

      undefined
      EL Takeaways
      Educational Leadership Staff
      in 2 days
      Related Articles
      Deeper Learning, Not Passive Compliance
      Tony Frontier
      in 2 days

      Reclaiming Time for Instructional Leadership
      Vickie Echols
      in 2 days

      How to Unpack a Learning Standard Using ChatGPT
      Myron Dueck
      in 2 days

      Tell Us About
      Educational Leadership Staff
      in 2 days

      EL Takeaways
      Educational Leadership Staff
      in 2 days
      From our issue
      Issue cover featuring a red geometric apple on a white background overlaid with the the text "AI in Schools"
      AI in Schools: What Works and What's Next?
      Go To Publication