Followers

Thursday, March 19, 2026

Why Your AI Teaching Assistant Keeps Getting It Wrong (And the One Skill That Fixes It)

#EdTech #AIinEducation #PromptCraft #ContextEngineering #TeacherAI #DigitalLearning


Background

In January 2026, The Economist published "Failing the Screen Test," a sweeping investigation into educational technology [1]. The verdict was stark: ed tech is "mostly useless," a $165 billion global industry that delivers marginal gains while student achievement collapses worldwide. The piece opened with Principal Inge Esping in a Kansas middle school, watching laptops go back into closets after three years of broken promises from adaptive math software. Paper and pencil returned. The magic never came.



The Economist got a lot right. It also got some critical things terribly wrong. But the most important lesson from that article has nothing to do with whether technology works in classrooms. It has everything to do with how we ask technology to work for us, and what has changed in the last two months that makes that question more urgent than ever.

The Vendor Trap: How $165 Billion Bought Declining Test Scores

The numbers are brutal. From 1994 to 2012, student achievement rose steadily across 21 nationwide benchmark tests in the United States. Then, precisely as screen usage began its exponential climb between 2012 and 2015, achievement started falling. Not slowly. Consistently. Year after year [2].

The OECD's Programme for International Student Assessment confirmed this pattern internationally. Students who use computers moderately at school tend to have somewhat better learning outcomes than students who rarely use computers, but students who use computers very frequently at school do much worse in most learning outcomes, even after accounting for social background and student demographics[6] [3]. The gap can equal an entire year of learning.

Fourth-graders who used tablets in "all or almost all" classes scored 14 points lower on federal reading tests than students who never used tablets. On the NAEP scale, that is roughly one full year of schooling, lost [2]. And by 2024, 90% of high school students and 84% of elementary students had school-issued devices. Eighty percent of kindergarteners were given personal learning devices [1].

How did we get here? The answer, as The Economist puts it, is that the prevalence of technology in schools owes less to rigorous evidence than to aggressive marketing [1]. Teachers report being flooded with daily offers for free technology. Districts make procurement decisions based on vendor presentations, not randomized controlled trials. The sales pitch replaces the research paper. The demo replaces the pilot study. And the result is predictable: when vendors drive acquisition, outcomes suffer. Vendor-led software procurement is, almost by design, guaranteed to produce the worst possible results. The vendor's incentive is adoption, not learning. The metric they optimize for is licenses sold, not test scores raised.

This is the core problem The Economist identified correctly. The procurement model is broken. Most of the $165 billion global market and $30 billion in annual U.S. K-12 spending flows toward generic devices that enable distraction, low-quality drill software, and platforms optimized for engagement metrics rather than learning transfer [2]. Roughly 60 to 70% goes to hardware and generic devices. Another 20 to 30% funds low-quality software licenses. Only 5 to 10% reaches the adaptive, intelligent systems that research actually supports.

Meanwhile, the systems that do work sit on the margins. A meta-analysis of findings from 50 controlled evaluations of intelligent computer tutoring systems found that the median effect of intelligent tutoring was to raise test scores 0.66 standard deviations over conventional levels[8] [4]. Students learn significantly more in less time when using a well-designed AI tutor, and they also feel more engaged and more motivated[4] [5]. A meta-analysis across 119 studies revealed positive effects for decoding (0.33), language comprehension (0.30), reading comprehension (0.23), and writing proficiency (0.81)[3] [6]. These represent a sliver of total spending. The problem was never technology itself. It was which technology, chosen by whom, and deployed how.

But here The Economist's analysis stopped short. The article created a false binary: technology helps or technology hurts. The actual evidence reveals something more nuanced. The relationship between technology use and learning outcomes follows an inverted U-curve [3]. At zero technology use, students miss essential digital literacy and practice tools. At moderate, targeted use of around 30 minutes daily for specific skills, gains peak. At heavy use spanning multiple hours of screen time, distraction costs overwhelm learning benefits. The Economist reported only the extremes and ignored the middle entirely [2].

The honest headline would have been: "We're Spending $87 Billion on the Wrong Ed Tech." That is a very different story from "technology is useless."

What Changed in February 2026

Now fast-forward to the present. In the last two months, something fundamental shifted in the AI models available to educators and everyone else. Models with autonomous agent capabilities shipped in rapid succession, changing both what is possible and what is required from the humans using them [7].

These models do not just answer questions better. They can work autonomously for hours, sometimes days, against structured specifications without checking in. As Nate B Jones argued in his recent analysis of the new prompting skills required in 2026, this changes what "good at prompting" means at a fundamental level [7]. The word "prompting" now conceals four completely different skill sets, and most people, including most teachers, are practicing only one of them.

The old model of prompting was conversational. You type a question in a chat window, read the output, iterate, refine. That skill still matters, but it has become table stakes [7]. What matters now is what sits above basic prompt craft: context engineering, which involves curating the information environment the AI operates within; intent engineering, which encodes your goals and quality standards; and specification engineering, which means writing complete, structured descriptions of what the output should be [7]. For educators, the most immediately relevant of these is context engineering, because it speaks directly to the biggest risk of using AI in teaching and learning: hallucination.

The Hallucination Problem Is a Context Problem

Whether AI can act as an effective teaching assistant or student tutor depends critically on preventing it from hallucinating. When a large language model does not receive enough context, it fills in the gaps based on its own statistical guesses. Those guesses can be subtly wrong, confidently stated, and deeply misleading for a student who has no basis to evaluate them [7].

This is the connection between the ed tech failures The Economist documented and the prompting revolution Jones describes. The Kansas middle school did not just suffer from bad software. It suffered from a deployment model that provided no context, no structure, and no quality controls around how the technology interacted with students and teachers [2]. The same failure pattern applies to AI tools: give a language model a vague prompt with no context, no examples, no constraints, and you will get vague, unreliable, sometimes fabricated output. As Jones puts it, AI does not fill in gaps reliably; it fills them with statistical plausibility, which is a polite way of saying it guesses in ways that are often subtly wrong [7].

The Harvard AI tutoring study is instructive here. The researchers did not simply hand students a chatbot and hope for the best. Through a design that involves targeted, content-rich prompt engineering, they developed an online tutor that uses generative AI and best practices from pedagogy and educational psychology[4] [5]. The AI was given rich context: specific learning objectives, structured problem sequences, pedagogical principles, and defined quality criteria. The result was that students learned significantly more, and faster, than those in traditional active learning classrooms. The context made the technology work.

Jones cites Shopify CEO Tobi Lütke's definition of the core challenge: can you state a problem with enough context that, without any additional information, the task becomes plausibly solvable [7]? Applied to education, this means a teacher asking an AI to "explain photosynthesis" will get a generic, possibly inaccurate response. But a teacher who provides the grade level, the curriculum standard, the students' prior knowledge, the common misconceptions to address, specific examples to include and to exclude, and the assessment criteria will get something usable and reliable. The technology is identical. The context is everything.

The gap is not marginal. Jones describes a 10x productivity difference between people who provide structured, context-rich specifications and people who simply chat [7]. The same gap holds in education. An AI given rich context becomes a powerful tutor. An AI given a bare prompt becomes a confident but unreliable source of misinformation.

The practical framework is straightforward. Good educational prompts need clear instructions that specify exactly what the AI should produce. They need relevant examples and counterexamples so the model understands boundaries. They need explicit quality standards defining what counts as a correct explanation, what level of complexity is appropriate, and what errors are unacceptable. And they need what Jones calls constraint architecture: what the AI must do, what it must not do, what it should prefer when multiple approaches exist, and when it should flag uncertainty rather than guess [7].

This is not abstract theory. It is the difference between an AI tutor that tells a struggling math student that dividing by zero equals infinity and one that correctly explains the concept, flags the common misconception, and suggests a specific practice problem to work through.

The Real Lesson from a Decade of Mistakes

The ed tech disaster chronicled by The Economist and the prompting revolution described by Jones are two chapters of the same story. In both cases, the technology is not the bottleneck. The bottleneck is how humans set it up, what context they provide, what quality standards they enforce, and whether decisions are driven by evidence or by sales pitches.

The intelligent tutoring systems that produce effect sizes of 0.66 SD work because they are structured [4]. They adapt to the student's level. They provide immediate, targeted feedback. They limit scope to specific skills where digital practice is most effective. Among the programs that showed positive effects were those that provided adaptive instruction based on embedded assessments, those that had clear scope and sequences with skill building over time, and those that included opportunities for practice and feedback[7] [6]. They are, in essence, well-engineered specifications for learning.

The Chromebooks sitting in closets across Kansas failed because they were generic, unstructured, and deployed without the contextual infrastructure that makes technology effective [2]. They were the educational equivalent of a bare prompt typed into a chat window: vague input producing unreliable output.

A Call to Action for Teachers

If you are a teacher using AI in any capacity, whether for lesson planning, assessment design, student tutoring, or administrative work, the single most important thing you can do right now is improve your prompt craft and start providing more context in every interaction, for example, the full syllabus or subject guide, you annual planning, other relevante documents on assessment, instructional design etc.

This means moving beyond simple questions and requests. Specify your audience, your learning objectives, your constraints, your quality standards, and your examples every time you interact with an AI tool. Treat the AI not as a search engine you query with a few words, but as a new colleague who needs a thorough briefing before producing useful work.

Start small. Take one task you regularly ask AI to help with. Rewrite your prompt as if the recipient has never taught your subject, does not know your students, and has access to no information beyond what you include [7]. Then compare the output to what you were getting before. The improvement will be immediate and obvious.

An LLM badly used will always end up disappointing. It will fill gaps with plausible-sounding guesses, miss your intent, and produce work that requires more effort to fix than it saved. But an LLM given rich context, clear constraints, and explicit quality standards can be the teaching partner that the $165 billion ed tech industry has promised and failed to deliver for over a decade.

The technology was never the problem. The instructions were. And now, with models more capable than anything we have seen before, the quality of those instructions matters more than ever. Brush up on your prompt craft. Provide the context. Define the standard. The tools are ready. The question is whether we are.


References

[1] The Economist. (2026, January). Failing the screen test. The Economist.

[2] Brown, N. B. (2026, February 8). What The Economist got right (and terribly wrong) about education technology. Skepticism.ai. https://skepticism.ai/p/the-165-billion-question-what-the

[3] OECD. (2015). Students, computers and learning: Making the connection. PISA, OECD Publishing. https://doi.org/10.1787/9789264239555-en

[4] Kulik, J. A., & Fletcher, J. D. (2016). Effectiveness of intelligent tutoring systems: A meta-analytic review. Review of Educational Research, 86(1), 42-78. https://doi.org/10.3102/0034654315581420

[5] Kestin, G., Miller, K., Klales, A., Milbourne, T., & Ponti, G. (2025). AI tutoring outperforms in-class active learning: An RCT introducing a novel research-based design in an authentic educational setting. Scientific Reports, 15, 17458. https://doi.org/10.1038/s41598-025-97652-6

[6] Silverman, R. D., Keane, K., Darling-Hammond, E., & Khanna, S. (2024). The effects of educational technology interventions on literacy in elementary school: A meta-analysis. Review of Educational Research, 95(3), 972-1012. https://doi.org/10.3102/00346543241261073

[7] Jones, N. B. (2026, February). Prompting just split into 4 skills. You only know one. Here's why you need the other 3 in 2026. AI News & Strategy Daily. https://natesnewsletter.substack.com


Learn more:

  1. The $165 Billion Question: What The Economist Got Right (and Terribly Wrong) About Education Technology
  2. The Effects of Educational Technology Interventions on Literacy in Elementary School: A Meta-Analysis - Rebecca D. Silverman, Kristin Keane, Elena Darling-Hammond, Saurabh Khanna, 2025
  3. Effectiveness of Intelligent Tutoring Systems - James A. Kulik, J. D. Fletcher, 2016
  4. ERIC - ED560576 - Students, Computers and Learning: Making the Connection. PISA. Revised, OECD Publishing, 2015-Oct
  5. AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting - PubMed
  6. Will computers revolutionize education? | News, Sports, Jobs - The Intermountain
  7. How the ‘science of reading’ is reshaping literacy education | Stanford Report
  8. (PDF) Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic Review
  9. (PDF) Is ICT contributing for a novel education quality? Report Review: OECD (2015). Students, Computers and Learning. Making the connection. Programme for International Student Assessment PISAPISA, OECD Publishing.
  10. AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting | Scientific Reports
  11. (PDF) The Effects of Educational Technology Interventions on Literacy in Elementary School: A Meta-Analysis
  12. Effectiveness of intelligent tutoring systems: A meta- ...
  13. New landmark OECD PISA study on 'Students, Computers and Learning: Making the Connection'
  14. AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting - PMC
  15. Publications | The Language to Literacy Research Lab
  16. A meta-analysis of the effectiveness of intelligent tutoring systems on K–12 students’ mathematical learning.
  17. Programme for International Student Assessment Students, Computers and Learning
  18. Review of Kestin et al.’s June 2025 Harvard Study on AI Tutoring | Educational Technology and Change Journal
  19. Turning the page on the reading wars | Stanford Graduate School of Education
  20. Effectiveness of Intelligent Tutoring Systems: A Meta‑ ...
  21. ERIC - ED574086 - PISA 2015 Results: Students' Well-Being. Volume III, OECD Publishing, 2017-Apr-19
  22. AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting - ADS
  23. Worldbank
  24. ERIC - EJ1090502 - Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic Review, Review of Educational Research, 2016-Mar
  25. ICT use at home for school-related tasks: what is the effect on a student’s achievement? Empirical evidence from OECD PISA data
  26. (PDF) AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting
  27. Q&A: How the 'science of reading' is reshaping literacy education
  28. PDFPDF Effectiveness of Intelligent Tutoring Systems | Semantic Scholar
  29. (DOC) Is ICT contributing to improved education quality? Report Review: OECD (2015). Students, Computers and Learning. Making the connection. Programme for International Student Assessment PISAPISA, OECD Publishing.
  30. AI Tutoring Outperforms Active Learning | Research Square%C3%83%C2%A7al%C3%84%C2%B1%C3%85%C5%B8mas%C3%84%C2%B1ndan)
  31. A meta-analysis on the effect of technology on the achievement of less advantaged students - ScienceDirect
  32. A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 education - PMC
  33. PISA 2015 Database | OECD
  34. AI Tutoring Outperforms Active Learning – A Major Study from Harvard
  35. The Effects of Educational Technology Interventions on Literacy in Elementary School: A Meta-Analysis | Semantic Scholar
  36. Effectiveness of Intelligent Tutoring Systems: A Meta Analytic Review
  37. PISA data and methodology | OECD
  38. An AI tutor helped Harvard students learn more physics in less time
  39. A meta-analysis of technology-delivered literacy instruction for elementary students | Educational technology research and development | Springer Nature Link
  40. ERIC - Search Results
  41. PISA 2015 Results (Volume I) | OECD
  42. AI Tutors Double Rates of Learning in Less Learning Time

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Why Your AI Teaching Assistant Keeps Getting It Wrong (And the One Skill That Fixes It)

#EdTech #AIinEducation #PromptCraft #ContextEngineering #TeacherAI #DigitalLearning Background In January 2026, The Economist published ...