Followers

Monday, April 22, 2024

Balancing Precision and Flexibility: Using AI for Grading and Feedback in Education

Background

Grading or marking student work and providing timely and accurate feedback is a time-consuming and monotonous, yet essential task for their learning. 

Human grading of student work is far from perfect. Research n the USA based on 30 million records has shown that teachers who mark student work alphabetically by last name award lower grades towards the final letters of the alphabet, likely due to exhaustion after hours of grading. Research in Italy among 40.000 students in Northern Italy showed that girls get systematically higher grades than boys, probably because they exhibit fewer behavior issues that disrupt classes or irritate teachers.

However, recent advancements in artificial intelligence (AI) have made it possible to use large language models (LLMs) for grading written work, potentially saving teachers time and improving the accuracy and consistency of feedback.

Prompt Design Suggestions

When using LLMs for grading written work, it's important to keep in mind that prompt design is different than programming and requires a balance between precision and flexibility. Here are some suggestions for using LLMs for grading written work:

  • Choose the right LLM: For short answer or essay questions, it's best to use an LLM that is good at language tasks, such as Claude or Mistral.
  • Generate a rubric and suggested answers: Most LLMs are capable of generating a rubric and one or more suggested answers. This will help ensure that the marking is consistent and accurate.
  • Compare LLM marking with your own: It's important to always compare the LLM's marking with your own to ensure that it's reliable and accurate. Students have a reasonable expectation that a human, their teacher or lecturer, will be marking their work.
  • Do an internal reliability test: Mark the same work twice to ensure that the LLM's marking is consistent.
  • Rank students from best to worst: This will help you gauge whether the LLM's marking is accurate and consistent with your own.
  • Cross-check and validate the LLM's marking and comments: It's important to always double-check the LLM's marking and comments to ensure that they are accurate and relevant.
Adjust your prompt based on the output you obtain.

Conclusion and Call for Experimentation

While LLMs are not yet perfect tools for grading, they are improving and can be a valuable tool for teachers in grading written work and providing feedback. However, it's important to remember that AI is not a replacement for human judgment and that teachers should always be the "human-in-the-loop", protecting students from inaccurate or irrelevant grading and feedback. 

By following the prompt suggestions outlined below, teachers can use LLMs to save time and improve the accuracy and consistency of feedback, while still maintaining the essential human touch.

Here is an example prompt for using an LLM to mark written work:

Assess the student's answer above, determine the marking band, and award the appropriate marks. Provide a brief explanation of your marking and suggestions on how to improve the answer.

Here is the rubric for this question: [insert rubric as text or link]

Here are suggested answers: [insert suggested answers]

Here are general marking principles for this assessment: [insert marking principles]

Here are subject-specific marking principles: [insert marking principles]

Here is the specific marking scheme for the assessment: [insert marking scheme]

Examples of excellent marking: [insert examples]

Remember to always test the LLM's grading and cross-check it with your own. The more information you provided in the prompt in terms of guidance and examples, the better the output. You are the human-in-the-loop, protecting students from inaccurate or irrelevant marking. 

By using LLMs in a thoughtful and deliberate way, you can improve the accuracy and consistency of feedback while still maintaining the essential human touch.

Here is an older post with some general recommendations on this topic.

#AIinEducation #TeachersCheatSheet #ChatGPT #Grading #Marking

Works cited

Blake, J. (2024, April 22). Study shows grading by alphabetical ordered hurts fairness. Retrieved from https://www.insidehighered.com/news/quick-takes/2024/04/18/study-shows-grading-alphabetical-ordered-hurts-fairness

Di Liberto, A., Casula, L., & Pau, S. (2022). Grading practices, gender bias and educational outcomes: evidence from Italy. Education Economics. Retrieved from https://www.tandfonline.com/doi/full/10.1080/09645292.2021.2004999



πŸ„΄πŸ„ΌπŸ„ΏπŸ„ΎπŸ…†πŸ„΄πŸ… πŸ…ƒπŸ„΄πŸ„°πŸ„²πŸ„·πŸ„΄πŸ…πŸ…‚


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Balancing Precision and Flexibility: Using AI for Grading and Feedback in Education

Background Grading or marking student work and providing timely and accurate feedback is a time-consuming and monotonous, yet essential task...