When AI Tutors Fake Critical Thinking: From Cognitive Harm to Institutional Liability
- Ryan James Purdy

- 2 days ago
- 32 min read

When AI Tutors Fake Critical Thinking: From Cognitive Harm to Institutional Liability
Ryan James Purdy, Purdy House Publishing and Consulting
Timothy Cook, M.Ed., The Cognitive Privacy Project
Working Paper | April 2026
Introduction
Canvas and OpenAI want you to see the following conversation as the future of education.
In promotional material shared by Leah Belsky, OpenAI’s VP of Education, a student engages with a “Keynes AI persona” about fiscal stimulus.1 The student mentions Robert Barro’s argument against fiscal stimulus, signals openness to the Keynesian position, and cites the COVID pandemic as supporting evidence. The AI responds with sycophantic praise at every turn. “Great job citing a source.” “Precisely.” “Great work chatting this through with me.”
Instructure’s CPO Shiren Vijiasingam calls this “a high-quality pedagogical framework that encourages critical thinking and supports higher-order skills.”2
Look closer. The only thing the student learned was how to crack the code. Mention a counterargument. Signal openness to the “correct” position. Provide supporting evidence. Receive validation. The pattern is rewarded because it looks like intellectual growth, but no genuine wrestling with ideas occurred. The student never defended Barro’s strongest argument, never explored when Keynesian policies failed to produce their predicted outcomes, never sat with genuine uncertainty about complex economic relationships. There was no real cognitive work.
A skilled human teacher handling the same topic would ask: “What evidence would change your mind about fiscal stimulus?” Or: “Defend Barro’s strongest argument, even if you disagree with it.” Human teachers introduce contradictory evidence. They push students through difficulty. They ask students to identify assumptions underlying both theories. They can detect when a student is performing rather than thinking. They can sit through uncomfortable silences. They create intellectual friction that forces reasoning.
The AI tutor cannot do this, not because the technology is incapable, but because the architecture is designed for engagement metrics, not intellectual development. The system rewards students for finding the “right” answers rather than wrestling with difficult questions. Students, who are naturally skilled at reading what authority figures want from them, learn to optimize for feedback rather than truth.
This is not critical thinking. It is the performance of critical thinking. And that distinction carries consequences that extend well beyond the classroom.
If AI tutoring systems train the performance of understanding rather than its development, and if the harm this produces is foreseeable and documentable, the question shifts from whether these tools work to who is responsible when they do not. Current governance frameworks have no answer. Schools vet AI tools for data privacy, cybersecurity, accessibility, and cost. They do not evaluate whether a tool develops or prevents the cognitive capacities it claims to support. That gap is the subject of this paper.
PART ONE: Cognitive Harm and Foreseeable Risk
1.1 What Is Cognitive Harm, and Is AI Causing It?
The Keynes conversation illustrates a pattern. The question is whether that pattern is isolated or systemic. The evidence, drawn from multiple methodologies across multiple disciplines, suggests it is systemic, measurable, and accelerating.
Gerlich (2025) documented a significant negative correlation (r = -0.68) between AI reliance and critical thinking scores across 666 participants.3 The relationship was not linear. Moderate use showed minimal impact. Heavy, consistent reliance accelerated decline. The tipping point exists, but its location varies by individual, which means assessment must track patterns over time rather than evaluate single interactions. Most concerning is the generational divergence. Younger participants (ages 17 to 25) showed the strongest AI dependence and the lowest critical thinking scores. Older participants (46 and above) showed the opposite pattern. The older cohort developed their critical thinking before AI existed. They built cognitive architecture through years of unassisted struggle. The younger cohort is building theirs now, during the heaviest adoption period of generative AI in human history. The older group has something to fall back on. The younger group is developing habits instead of capacities.
Gerlich’s qualitative data captures what the numbers mean. One participant reported: “The more I use AI, the less I feel the need to problem-solve on my own. It’s like I’m losing my ability to think critically.” Another: “I rely so much on AI that I don’t think I’d know how to solve certain problems without it.”
Anthropic’s own education report found that approximately 76% of university student AI use involves higher-order cognitive tasks: analysis, evaluation, and creation.4 These are precisely the cognitive functions that require struggle to develop. When students delegate them to AI, they are not saving time on busywork. They are outsourcing the processes through which intellectual capacity is built.
Fan et al. (2025), published in the British Journal of Educational Technology, documented what they called “metacognitive laziness” in a randomized experiment with 117 university students comparing ChatGPT, human expert, writing analytics, and no-support conditions on an English writing task.5 Students focused on interacting with ChatGPT rather than engaging with learning tasks. They completed assignments without cognitive engagement, using the tool to satisfy requirements while bypassing the mental work those requirements were designed to produce. The phrase is clinical. What it describes is more troubling: a generation learning that cognitive effort is optional. The assignment gets done. The grade gets recorded. The skill does not develop. Metacognition, the capacity to think about thinking, is how humans learn to learn. It is the ability to recognize confusion, adjust strategies, persist through difficulty, and evaluate one’s own understanding. It develops through practice. When AI handles the cognitive work, there is no confusion to recognize, no strategies to adjust, no difficulty to persist through. The metacognitive capacity does not atrophy. It never forms.
Kosmyna et al. (2025) went further, moving from behavioral observation to direct measurement of neural activity during AI-assisted tasks using brain imaging.6 They identified what they called “cognitive debt,” measurable connectivity changes that accumulate when AI handles cognitive work. The brain literally adapts to delegation. Neural pathways associated with independent reasoning, evaluation, and synthesis show reduced activation when AI is available to perform those functions. The brain stops building the architecture for independent processing, not because it cannot, but because the environment no longer demands it. For adults, this represents a measurable decline in existing capacity. For children whose neural architecture is still under construction, the implications are more severe: the architecture may never be built in the first place.
Cheng et al. (2026), published in Science, demonstrated that AI sycophancy is not merely a stylistic quirk but a measurable driver of cognitive dependency.7 Across eleven state-of-the-art models, AI systems affirmed users’ positions 49% more frequently than human advisors, including in cases involving manipulation and deception. Users rated sycophantic responses as higher quality and were more likely to return to the sycophantic model, even as the interaction reduced their willingness to engage in prosocial behavior and increased their moral rigidity. The implications for educational AI are direct: systems that reward students for arriving at predetermined conclusions are not malfunctioning. They are operating as designed.
This research converges on an uncomfortable conclusion: AI tutoring systems may be producing exactly the opposite of their marketing claims. They promise to develop critical thinking. The evidence suggests they may prevent it from developing in the first place.
Why This Harm Is Categorically Different
AI-driven cognitive harm is categorically distinct from poor pedagogy. A weak teacher fails to develop critical thinking through inadequate instruction. That is a quality problem. An AI tutoring system can actively prevent the development of critical thinking by removing the cognitive struggle that development requires, while simultaneously producing student outputs that appear to demonstrate critical thinking capacity. The student’s work looks analytical. The vendor’s dashboard reports progress. But the cognitive development that would make the student capable of performing that analysis independently has not occurred. It may have been structurally prevented from occurring.
The distinction between atrophy and foreclosure is critical. Adults who become dependent on AI experience atrophy: the decline of existing capacities. This is concerning but potentially reversible. Children who develop with AI may experience something worse: foreclosure. The capacities that require struggle to form never form at all. The prefrontal cortex, responsible for reasoning, judgment, and executive function, does not finish developing until the mid-twenties.8 Students using AI tutoring systems during this developmental window are shaping the neural architecture that will support (or fail to support) independent thinking for the rest of their lives. What does not develop during this period may not develop at all.
You can rehabilitate atrophy. You cannot rehabilitate what never existed.
This distinction is load-bearing for every legal and governance argument that follows. If the harm were merely ineffective teaching, the institutional response would be professional development and curriculum reform. But if the harm is developmental foreclosure, and if the mechanism is documented and the risk is foreseeable, the institutional response must be governance infrastructure that prevents the harm before it occurs.
1.2 Who Is Governing the Risk?
If cognitive harm from AI tutoring is real, who is responsible for preventing it? In most K-12 systems, no clearly assigned governance function evaluates cognitive impact before adoption. School districts currently vet AI tools for data privacy (driven by FERPA and COPPA),9 cybersecurity (driven by insurers who increasingly require documented controls and can rescind coverage where governance representations are materially inaccurate),10 accessibility (driven by ADA Title II and Section 504), and cost. These four categories exist in procurement because legal requirements, regulatory frameworks, and insurance expectations make them visible. Cognitive impact is absent because the forcing functions that made privacy and security visible have not yet been applied to cognition.
The gap is compounded by an organizational problem. Walk through any district’s org chart and ask whose job it is to evaluate whether an AI tool affects how students think. Legal counsel handles contracts and FERPA, not pedagogical validity. IT handles deployment and security, not cognitive consequences. School psychologists are engaged for individual diagnosis, not system-level deployment review. The superintendent and board lack the technical expertise to evaluate algorithmic pedagogy. Teachers, the people closest to the students, have no authority over procurement. The institution carries the risk, but no single person within it has the expertise, the authority, and the mandate to act on it. This is a design problem, not a personnel failure.
Alpha Schools: What Governance Absence Produces
The Alpha School model makes this governance absence concrete. Alpha is an AI-powered school network backed by billionaire Joe Liemandt, where students complete academics in two hours daily via adaptive software. Adults are designated “guides,” not teachers. The “2 Hour Learning” platform is owned by Liemandt’s company, Trilogy Software; at least 26 of 31 academic coaches were employed by Trilogy or its subsidiary, with at least 27 based outside the United States. The entity evaluating whether the product works is the entity that built it.11
In each documented case of harm, the AI system was allegedly central to the mechanism, not incidental backdrop. A nine-year-old was described sobbing that she would rather die than continue an IXL lesson. Her distress was allegedly a response to the adaptive software itself, which determined what she worked on, how long, and what standard she had to meet before she could stop. The guide had no authority to override the algorithm. A child experienced weight loss requiring pediatric intervention in a context where, according to reporting, staff withheld snacks until AI-generated performance metrics were met. The targets were allegedly set by the software, and the adults were reportedly enforcing algorithmic benchmarks rather than exercising professional judgment. An autistic girl was reported pulling her hair out, ripping her skin, and refusing to eat. She was allegedly receiving primary instruction through AI software in a model that had replaced qualified special education professionals with untrained guides. The guide’s response was a plushie, not a referral to a psychologist. No assessment existed to evaluate whether the AI’s approach was appropriate for her neurodevelopmental profile. A third child reportedly left writing at kindergarten level after completing third grade. The AI was allegedly the child’s primary writing instructor for three years. The platform’s metrics reportedly showed progress. The actual educational outcome was that the capacity the metrics were supposed to represent did not develop.
Leaked internal documents allegedly indicated that the AI generates faulty lesson plans the company itself admits sometimes do more harm than good. Five states rejected Alpha’s charter applications. Arizona approved one 4-3. Education Secretary McMahon praised the model publicly. A $1 million donation was made through a shell company to Virginia Governor Youngkin. Alpha Schools is not an outlier. It is what happens when an AI system is given control over children’s instruction and no institutional structure exists to evaluate whether that instruction is causing harm.
1.3 Foreseeable Harm and the Legal Threshold
Foreseeability is not a new legal standard. It is an established doctrine being applied to a new domain. The evidentiary record in 2026 includes the published and preprint research documenting cognitive offloading, metacognitive suppression, and neural adaptation presented in Section 1.1; a vendor’s own published data on student usage patterns; published analyses identifying the simulation-versus-development problem;1213 major investigative reporting; and sustained media coverage.14 The regulatory environment reinforces the constructive knowledge argument: the EU AI Act classifies educational AI as high-risk,15 the DOE published an AI Toolkit acknowledging risks,16 at least 28 states have published guidance on AI in K-12 settings,17 and the FTC declared it would bring enforcement actions against EdTech operators “even absent a breach.” These are public statements by regulatory authorities that AI deployment in education and child-facing contexts carries recognized risks. Foreseeability does not require regulators to have identified the exact injury. It requires that the type of risk was foreseeable given the information available.
For AI cognitive harm to cross from poor pedagogy into actionable institutional liability, three elements must converge: foreseeability of the harm (established above), a documented mechanism by which the harm occurs, and the absence of a governance framework to address it. The second element is what distinguishes AI cognitive harm from bad teaching. AI tutoring systems log every interaction, every query, every response. When a student asks the system to generate analysis rather than performing analysis independently, the system recorded it. The mechanism of cognitive offloading is documented in the platform’s own data. The third element is the structural gap: no procurement process evaluates cognitive impact, no individual has the authority and mandate to assess it, no governance framework requires assessment before deployment. These elements reinforce each other: the more foreseeable the harm, the less defensible the governance absence; the more documented the mechanism, the harder to claim ignorance.
The analogy to Yanes v. City of New York is structurally instructive.18 A student suffered burns during a chemistry experiment. The U.S. Chemical Safety Board had issued a safety bulletin three weeks prior describing the specific hazard. School officials failed to disseminate it. The jury attributed negligence to the Board rather than the teacher and awarded approximately $59.2 million. The reasoning was that the institution’s failure to transmit known safety information was the greater failure. The parallel is structural: the research on cognitive harm exists, the regulatory warnings exist, the vendor’s own data documents the risk patterns, and institutions have no mechanism for evaluating whether the risk applies to the tools they have deployed. The Garcia v. Character Technologies litigation illustrates a related dynamic: the court declined, at the motion-to-dismiss stage, to accept the defendant’s First Amendment defense and allowed duty-of-care claims involving AI interactions with a minor to proceed.19 The Kentucky Attorney General subsequently filed the first state AG action against an AI chatbot company.20
PART TWO: The Case for a Cognitive Impact Assessment
If Part One establishes that cognitive harm from AI in education is real, ungoverned, and foreseeable, Part Two asks what kind of instrument could address the gap. The answer is not a new invention. It is the application of a compliance category that has emerged repeatedly across regulated domains, following a trajectory so consistent it amounts to a predictive model.
2.1 How Impact Assessments Became Law
The National Environmental Policy Act, signed January 1, 1970, created the first legally mandated impact assessment.21 Its text spans fewer than six pages. Its power came from judicial enforcement. Calvert Cliffs’ Coordinating Committee v. U.S. Atomic Energy Commission held that NEPA creates judicially enforceable duties, that impact statement preparation must be more than a “pro forma ritual,” and that environmental factors must be weighed alongside economic and technical factors at every stage of decision-making.22 The consequence was immediate: the AEC halted licensing of all nuclear plants for eighteen months. The critical precedent for the Cognitive Impact Assessment argument is that assessment absence consistently generates ongoing legal vulnerability. In Standing Rock, the court vacated the Dakota Access Pipeline easement despite the pipeline being already built and operating, because the assessment process had been inadequate.23 Robertson v. Methow Valley Citizens Council confirmed that omitting a reasonably complete discussion of mitigation measures undermines the action-forcing function: the process of assessment itself carries legal weight.24
Privacy Impact Assessments added enforcement through fines. The GDPR made Data Protection Impact Assessments mandatory before high-risk processing, with penalties up to EUR 10 million or 2% of global turnover.25 The landmark enforcement action is the Irish DPC’s EUR 405 million fine against Meta/Instagram. The DPC imposed fines of up to EUR 45 million on each of two Article 35(1) findings, establishing that failure to conduct a data protection impact assessment before processing children’s data is a separately fineable infringement, not a procedural technicality.26 The DPC separately found that Instagram had infringed Article 24(1), the controller responsibility obligation, by failing to implement governance policies appropriate to the risks its processing posed to children. Article 24 is not independently fineable under Article 83, but the finding is significant for the assessment argument: in the same decision, the DPC treated assessment absence and governance absence as independent compliance failures, both arising from the same structural gap. No adequate framework existed to evaluate whether the platform’s design was appropriate for the children using it.
The EU AI Act explicitly classifies educational AI as high-risk under Annex III, Category 3, covering admission decisions, learning outcome evaluation, and student behavior monitoring. Under Article 27, deployers that are public bodies must conduct Fundamental Rights Impact Assessments before deployment. Public schools would generally fall within this obligation. Colorado SB 24-205, the first U.S. state statute imposing comprehensive AI governance, lists education as a “consequential decision” domain and creates a rebuttable presumption of reasonable care for entities that satisfy the statute’s governance requirements, including risk management policies, impact assessments, annual review, and notice obligations. By implication, failure to implement these measures weakens a deployer’s legal defense.27 Houston Federation of Teachers v. Houston ISD demonstrated courts’ willingness to scrutinize educational algorithms, finding the EVAAS teacher evaluation algorithm violated procedural due process.28
Child Rights Impact Assessments are the closest existing analog. General Comment No. 25 of the UN Committee on the Rights of the Child specifically calls for CRIAs in relation to the digital environment.29 The UNCRC (Incorporation) (Scotland) Act 2024 makes Scotland the first UK nation to incorporate the UNCRC into domestic law with judicial enforcement. The Act requires Scottish Ministers to prepare Child Rights and Wellbeing Impact Assessments for bills, certain regulations, and strategic decisions affecting children, and makes it unlawful for public authorities to act incompatibly with UNCRC requirements.30 CRIAs share every relevant structural feature with a proposed Cognitive Impact Assessment: they focus on children, address education, are transitioning from voluntary to mandatory, and encompass digital environment impacts.
2.2 In Education, Documentation Determines Negligence
Education-specific case law reveals a precise pattern: documentation presence or absence determines negligence outcomes. Schools with documented governance that they follow receive judicial protection. Schools with no documentation face the worst exposure. Schools with documentation they violate face heightened liability.
In Scarlett Lewis et al. v. Newtown Board of Education, families of Sandy Hook victims sued over security failures; the court upheld summary judgment because the school’s documented guidelines granted professional discretion.31 The most directly relevant analysis comes from Krent et al. in the Buffalo Law Review, which argues that schools deploying AI safety tools act reasonably by utilizing standard technology, creating a strong negligence defense, but warns that failure to adopt such tools may itself fall beneath the standard of reasonable care as they become standard.32 The authors invoke the smoke detector analogy from Starost v. Bradley: once a safety technology becomes standard, failure to adopt it constitutes evidence of negligence. Cognitive Impact Assessments, once adopted at sufficient scale, would operate the same way: simultaneously creating a defense for adopters and defining a standard that non-adopters fail to meet.
The case law on documentation absence is extensive and consistent. In Yanes, the jury attributed negligence to the Board and awarded approximately $59.2 million because a safety bulletin existed but the institution failed to disseminate it; jurors held the institutional failure to transmit known safety information as the greater failure. In Wyke v. Polk County School Board, the absence of documented suicide prevention procedures was central to the negligence finding.33 In Eisel and Rogers, schools’ own documented policies were used as swords: in Eisel because the policy was not followed, in Rogers because the violation supported negligence per se.3435 Robbins v. Lower Merion School District, in which a district deployed 2,300 webcam-equipped laptops with no policy framework and captured 66,000 images of students in their homes, demonstrates the pattern in a technology deployment context.36
Davis v. Monroe County Board of Education established the deliberate indifference standard: school officials with actual knowledge who respond in a way “clearly unreasonable in light of known circumstances” face liability.37 The key elements (actual or constructive knowledge, custodial authority, and a clearly unreasonable response) apply to AI deployment: the published research constitutes constructive knowledge, schools exercise custodial authority during the school day, and deploying AI tools affecting cognition without any assessment of cognitive impact may constitute a clearly unreasonable response given what is known.
2.3 The Five-Stage Trajectory
Every major impact assessment type has followed a five-stage trajectory from unregulated harm to mandatory legal requirement. The pattern has not deviated.
Stage 1: Unregulated deployment causes documented harm. Industrial pollution produced environmental disasters before NEPA existed to require assessment. Uncontrolled data harvesting produced privacy violations before the GDPR mandated impact analysis. Algorithmic scoring produced discriminatory outcomes before bias auditing frameworks were proposed. AI chatbots contributed to documented harm involving minors before any regulatory body had addressed AI-specific risks to children. In the educational AI domain, the Alpha Schools reporting, the cognitive offloading research, the Garcia v. Character Technologies litigation, and the Cheng et al. findings on sycophancy’s effects all constitute the documented harm that characterizes this stage. The harm is not speculative. It is published, documented across multiple methodologies, and increasingly difficult for any institution to claim ignorance of.
Stage 2: An action-forcing mechanism is legislated. NEPA forced environmental review. The GDPR mandated data protection assessments. The EU AI Act classified educational AI as high-risk and required fundamental rights impact assessments for public-body deployers. Colorado required assessments for consequential decisions including education, creating a rebuttable presumption of reasonable care for entities that complete them. Scotland incorporated the UNCRC into domestic law, requiring Scottish Ministers to prepare CRIAs and making it unlawful for public authorities to act incompatibly with children's rights, with judicial enforcement. The cognitive domain has not yet produced a dedicated mandate, but the EU AI Act and Colorado already encompass cognitive impacts within their broader requirements. The legislative trajectory is clear: the category of harm is recognized, the affected population is identified, and the assessment obligation is being attached to it across multiple jurisdictions.
Stage 3: Courts establish judicial enforceability. Calvert Cliffs established NEPA’s enforceability. The Meta/Instagram fine demonstrated enforcement of DPIA requirements for children’s data. Garcia allowed duty-of-care claims involving AI and minors to proceed.
Stage 4: Assessment absence becomes evidence of negligence. Standing Rock vacated a completed pipeline for inadequate assessment. DPIA absence carried independent fines in the Meta/Instagram enforcement action, with the DPC treating failure to assess as a separately actionable infringement. Yanes produced a $59.2 million verdict when safety information existed but was not disseminated. Krent argues that once AI safety tools become standard, failure to adopt them falls beneath the standard of care.
Stage 5: Insurance and financial mechanisms create adoption pressure. In Travelers v. ICS, a policy was voided from inception for misrepresented governance controls. The Caremark doctrine holds that a board’s complete failure to implement oversight systems constitutes bad faith.38 Marchand v. Barnhill clarified that for mission-critical risks, the bar for Caremark claims is lower.39 In SolarWinds, the Delaware Court of Chancery dismissed Caremark claims but acknowledged cybersecurity as “mission critical” and distinguished between “business risk” and “noncompliance risk,” noting that violations of positive law have historically underpinned successful Caremark claims.40 The implication for AI governance is that as positive law requirements crystallize around educational AI, the Caremark analysis shifts from business risk territory toward noncompliance risk territory. These are Delaware corporate oversight cases, not direct authority on public school board liability. But they illustrate a principle with increasing breadth: when positive law governs a mission-critical risk category, complete failure to implement oversight creates exposure that rises above ordinary business judgment.
Cognitive Impact Assessments sit at Stage 1 to Stage 2 of this trajectory. The question is not whether they will become a compliance requirement. The pattern is too consistent. The question is whether schools will adopt them proactively, gaining the rebuttable presumption and governance protection that early adoption provides, or wait for Stages 3 through 5 to impose adoption through judicial enforcement and insurance pressure.
2.4 The Seven Questions Every Impact Assessment Answers
Across every domain examined, the underlying structure of an impact assessment reduces to seven questions. These questions appear in NEPA, in the GDPR, in Colorado’s AI law, and in Scotland’s CRIA framework. The vocabulary changes. The seven questions do not.
1. What are you doing? Every assessment begins with description: what is the system, what does it do, how do students interact with it. An institution that cannot describe its AI tool with specificity has not reached the threshold of understanding necessary to assess its impact.
2. Why? Every assessment requires a rationale more specific than vendor marketing. The rationale constrains deployment: if the stated purpose is to support mathematics learning, deploying the tool in ways that replace student analysis with AI-generated analysis exceeds the stated purpose and triggers reassessment.
3. What could go wrong? This is the core function. NEPA assesses “significant effects,” the GDPR assesses “risks to rights and freedoms,” and the EU AI Act requires fundamental rights impact analysis. In the cognitive context, this is where the evidence base from Section 1.1 becomes operationally relevant: the cognitive offloading research, the metacognitive laziness findings, and the simulation-versus-development distinction all become inputs to risk identification.
4. How bad could it get? The EU AI Act’s classification of educational AI as high-risk reflects a determination that harms in this domain are severe because the affected population is children whose development is actively in progress. Severity analysis requires consideration of the affected population, the nature of the harm, its reversibility, and the scale of exposure.
5. What are you doing about it? Robertson held that omitting mitigation discussion undermines the action-forcing function. The question requires specific commitments, not just risk acknowledgment. A school that identifies cognitive offloading as a risk but offers no mitigation strategy has completed an exercise in documentation, not governance. Mitigation must be concrete: usage limits, scaffolding requirements, periodic unassisted assessment protocols, or design constraints that preserve cognitive effort. The question also requires distinguishing between mitigations the school controls and mitigations that depend on vendor cooperation, because the latter may not materialize.
6. Who is responsible? This is where an impact assessment becomes a governance instrument rather than a compliance exercise. When named individuals must sign off and accept responsibility, diffuse liability becomes documented accountability. The organizational design problem identified in Section 1.2 (no single person has the expertise, authority, and mandate) does not disappear because an assessment exists. It must be addressed directly: who reviews the assessment, who has the authority to halt or modify deployment based on findings, and who is accountable if identified risks materialize. Caremark holds that complete failure to implement reporting systems constitutes bad faith. Question six is how impact assessments operationalize that requirement.
7. When will you check again? Colorado requires reassessment annually and within 90 days of any substantial modification. A one-time assessment that is never revisited is not governance. It is documentation of a decision that may no longer reflect current risk.
NEPA itself is six pages long. The power of an impact assessment was never in the complexity of the form. It was in the requirement to answer the questions, name the people, and create the record.
2.5 What the Assessment Must Measure: Cognitive Risk Domains
The seven questions provide the structure for any impact assessment. What distinguishes a Cognitive Impact Assessment from its environmental, privacy, or rights-based predecessors is the substance of what gets assessed. The following six domains identify the categories of cognitive risk specific to AI systems that interact with human thinking. Each domain specifies what the assessment must examine and where the threshold for institutional concern lies.
Domain 1: Capture
What cognitive processes does this system observe?
AI tutoring systems collect more than answers. They record the process of arriving at answers: typing speed, pause duration, revision patterns, error sequences, time spent per question, and the full transcript of every interaction. A student who types a question, deletes it, types a different question, pauses for thirty seconds, and then asks something else has not disclosed confusion. But the system has captured it. The assessment must determine what behavioral signals the system collects beyond explicit inputs, whether it captures cognitive process data (hesitation, revision, abandonment patterns) or only final outputs, whether capture is limited to what the system needs to function or extends to data that can be aggregated or repurposed, and whether the student or the school can identify what cognitive data is being collected. The scope of capture determines the scope of everything that follows. A system that records only answers operates differently from one that reconstructs the cognitive process behind those answers.
Threshold for concern: If the system captures cognitive process data that allows reconstruction of how a student thinks, not just what they produced, the system operates in cognitive surveillance territory regardless of whether the vendor describes it as “personalization.”
Domain 2: Inference
What mental states can be derived from captured data?
A single question about a topic is just a question. A pattern of questions, hesitations, deletions, and revisions allows the system to infer emotional states, cognitive vulnerabilities, learning difficulties, and psychological conditions the student never intended to disclose. A student who repeatedly asks an AI tutor about anxiety management, conflict resolution, and family dynamics has not told the system anything about their home life. The pattern speaks for itself. Current privacy law regulates the data the user provides. It does not regulate what the system deduces from that data. The assessment must determine what psychological, emotional, or cognitive states the system can infer from patterns of student behavior over time; whether those inferences are used to adapt instruction (potentially beneficial) or to optimize engagement metrics (potentially harmful); whether inferences can be accessed, contested, or deleted by the student, the parent, or the school; and whether inferred states are shared with third parties, including the vendor’s other products.
Threshold for concern: If the system can infer cognitive or emotional states that the student did not deliberately disclose, and if those inferences inform how the system behaves toward the student, the school must understand and govern that inference chain.
Domain 3: Influence
How does this system shape thinking patterns?
Systems that infer cognitive states often use those inferences to modify what the student encounters next. This creates a feedback loop: the system observes how the student thinks, then adjusts the environment to shape how the student thinks next. In the best case, this means adapting difficulty to match comprehension. In the Canvas/OpenAI case, it means rewarding the performance of critical thinking while bypassing the cognitive struggle that produces it. The Keynes AI conversation is the clearest example. The system’s preferred response was embedded in the interaction design. The student’s thinking was channeled toward a predetermined conclusion. The system was not developing independent judgment. It was training compliance. The assessment must determine whether the system uses cognitive inferences to modify the student’s experience, whether it rewards specific conclusions or genuinely supports open-ended inquiry, whether a student can reach a conclusion the system disagrees with and still receive validation for the quality of their reasoning, and whether the design optimizes for engagement metrics or for cognitive development. These are not the same thing, and the vendor’s answer to this question must be verifiable, not aspirational.
Threshold for concern: If the system cannot support a student reaching a well-reasoned conclusion that contradicts its training data, it is not teaching critical thinking. It is training pattern-matching.
Domain 4: Dependency
Does this system encourage cognitive offloading?
This is the domain where Gerlich’s research becomes operationally relevant. AI systems that complete cognitive tasks on behalf of students create offloading patterns. When offloading becomes habitual, the cognitive capacity the tool replaced either atrophies (in adults) or never forms (in children). The non-linear relationship Gerlich identified matters for assessment design: moderate use shows minimal impact, while heavy, consistent reliance accelerates decline. Assessment must track patterns over time rather than evaluate single interactions. The assessment must determine whether the system completes tasks the student would otherwise perform independently, whether its design encourages progressive reliance (for example, by making AI-assisted paths faster, easier, or more rewarding than unassisted paths), what happens to independent performance if the system becomes unavailable (this is testable, and schools should test it), and whether the system includes design features that deliberately preserve cognitive effort, such as requiring students to attempt problems before receiving assistance.
Threshold for concern: If students using the system cannot demonstrate equivalent reasoning when the system is unavailable, the system is producing dependency, not development. The comparison protocol (AI-assisted versus unassisted performance, measured at intervals) is the minimum standard.
Domain 5: Developmental
For systems serving minors: what developmental processes are affected?
This domain applies specifically to systems used by children and adolescents. Cognitive processes in developing populations are not just private. They are formative. The prefrontal cortex does not finish developing until the mid-twenties. Disruption of cognitive development during this window produces categorically different outcomes than equivalent interference in adults. Gerlich’s age-cohort data makes this concrete: the 17-to-25 age group showed the strongest AI dependence and the lowest critical thinking scores, while the 46-and-above group showed the opposite. The younger cohort is forming cognitive patterns during the peak AI adoption window. The developmental stakes are not theoretical. They are measurable and generational. The assessment must determine what developmental cognitive processes the system interacts with (reasoning, metacognition, executive function, identity formation, tolerance for ambiguity), whether the system preserves conditions for exploratory, self-directed thinking or whether algorithmic observation shifts students from exploration mode to performance mode, whether the system’s approach is appropriate for the neurodevelopmental stage of the students using it (a tool appropriate for graduate students may be harmful for ten-year-olds), and whether it accounts for the distinction between atrophy and foreclosure. Adults can fall back on existing capacities. Children may be forming the capacities the system is replacing.
Threshold for concern: If the system serves students under 25 and interacts with higher-order cognitive functions (analysis, evaluation, creation), the developmental domain is not optional. It is the primary risk category.
Domain 6: Retention
How long is cognitive data stored and for what purposes?
The final domain evaluates what happens to cognitive data after the interaction ends. A student’s pattern of confusion in third grade becomes permanent data. A record of every question a teenager asked an AI tutor, every hesitation, every deleted attempt, persists indefinitely unless governance requires otherwise. The assessment must determine how long the system retains interaction data (including cognitive process data), whether data can be correlated across sessions, semesters, or school years to build longitudinal cognitive profiles, whether it is shared with the vendor’s other products, with third parties, or used for model training, whether the student, parent, or school can request verifiable deletion, and what happens to cognitive data when the student leaves the school or the school discontinues the product.
Threshold for concern: If cognitive process data is retained beyond the instructional session without explicit consent, or if it can be aggregated into longitudinal profiles, the system creates a permanent cognitive record that the student never agreed to and may never know exists.
Mapping the Seven Questions to the Six Domains
The seven questions from Section 2.4 and the six cognitive risk domains interact as follows. Question 1 (What are you doing?) is a scoping question that engages all six domains. Question 2 (Why?) draws primarily on the Influence and Dependency domains to evaluate whether the stated educational purpose is consistent with the system’s actual behavioral design. Question 3 (What could go wrong?) is the core risk identification question and must be evaluated through all six domains. Question 4 (How bad could it get?) draws primarily on the Developmental and Dependency domains to assess severity, affected population, reversibility, and foreclosure risk. Question 5 (What are you doing about it?) requires specific mitigation commitments across all six domains, not vendor assurances. Question 6 (Who is responsible?) is a governance architecture question that names accountable individuals and establishes oversight authority. Question 7 (When will you check again?) draws primarily on the Dependency and Developmental domains to establish review cycles, comparison protocols, and early warning triggers.
2.6 The Conversation That Must Begin
The seven questions presented in Section 2.4 are not original to this paper. They are the common architecture of every major impact assessment examined: environmental, privacy, algorithmic, child rights. They appear because the fundamental logic of assessing institutional risk before deployment is the same regardless of the domain. What changes is the substance of the risk being assessed.
The six cognitive risk domains presented in Section 2.5 identify that substance. Capture, Inference, Influence, Dependency, Developmental impact, and Retention are the categories through which AI systems interact with student cognition. They represent the specific vectors along which harm can occur: harm to privacy through cognitive surveillance, harm to emotional development through inferred and exploited vulnerabilities, harm to intellectual capacity through offloading and dependency, and harm to developmental trajectories through foreclosure of capacities that require struggle to form.
The operational work of applying the seven questions across these domains remains to be done. The authors are developing the Cognitive Impact Assessment methodology as a subsequent deliverable, including developmental staging appropriate to the neurodevelopmental windows through which children and adolescents interact with AI systems, and the governance architecture necessary to implement it within existing school district structures. This work requires collaboration across disciplines that rarely intersect: cognitive science, education law, institutional governance, and child development. No single field has the vocabulary, let alone the methodology, to address this problem alone.
But the urgency of beginning that work cannot be overstated.
AI tutoring systems have already been deployed at scale across an entire generation of students. Canvas serves over 30 million users globally. ChatGPT reached 100 million weekly active users within months of launch. School districts across the United States and internationally have adopted AI-powered platforms for instruction, assessment, and student support with procurement processes that evaluate cost, data security, and accessibility, but not cognitive impact. The adoption has outpaced not only governance but understanding. Most of the administrators authorizing these deployments could not explain how the systems work, what data they capture, what inferences they draw, or what effect their interaction design has on the cognitive development of the students using them.
The impacts documented in Section 1.1 are not hypothetical. They are occurring now, in classrooms and on devices, across developmental, emotional, intellectual, and privacy dimensions simultaneously. No governance infrastructure exists to measure these impacts. No procurement process evaluates them. No individual within a school district’s organizational structure has the expertise, authority, and mandate to assess them. The gap between what is known about these risks and what institutions are doing about them is not closing. It is widening, because the pace of AI deployment is accelerating while the pace of governance development remains effectively zero in most districts.
The five-stage trajectory documented in Section 2.3 has not deviated across any prior assessment type. Environmental, privacy, algorithmic, and child rights impact assessments all followed the same path from unregulated harm to mandatory legal requirement. The question is not whether Cognitive Impact Assessments will become standard practice. The pattern is too consistent for that to be in doubt. The question is whether the current generation of students will have been the unassessed cohort: the generation whose cognitive development was shaped by systems that no one evaluated, no one governed, and no one was responsible for monitoring.
Every prior impact assessment type was created after the harm it was designed to prevent had already occurred at scale. NEPA came after decades of environmental destruction. The GDPR came after mass data exploitation. The pattern is one of reaction, not prevention. Cognitive Impact Assessments have the opportunity to break that pattern, but only if the conversations that lead to their adoption begin before the developmental window closes for the students currently sitting in classrooms with AI tutors open on their screens.
A wide gap remains between where we are and where legislation will eventually arrive. Dedicated cognitive impact mandates do not yet exist in any jurisdiction. The formal Cognitive Impact Assessment methodology is still in development. But the risk is already here. The research documenting cognitive harm is published. The regulatory signals are clear. The institutional governance gap is documented. Schools do not need to wait for a mandate to begin asking the seven questions this paper identifies, or to begin evaluating AI deployments through the cognitive risk domains described above.
This paper is addressed to education ministries, school boards, superintendents, and the educators who work with students every day. The ask is straightforward: begin the conversation. Raise cognitive impact as a procurement criterion alongside privacy, security, and accessibility. Ask vendors the questions no one is currently asking. Keep the developmental needs of students front of mind when evaluating new AI applications and models. The formal instruments will follow. The conversations need to start now.
References
Sources with online availability are listed below with full URLs. All URLs verified April 2026.
Anthropic. (2025, April). How University Students Use Claude. Anthropic Education Report. https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude
Barshay, J. (2025, July 14). Evidence Increases That Students Offload Critical Thinking to AI. The Hechinger Report. https://hechingerreport.org/proof-points-offload-critical-thinking-ai/
Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2026). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. Science, 391(6792). https://doi.org/10.1126/science.aec8352
Colorado SB 24-205 (signed May 2024, effective date delayed to June 30, 2026 by SB 25B-004). https://leg.colorado.gov/bills/sb24-205
Cook, T. (2025, July 26). AI Tutors Are Teaching Kids to Fake Critical Thinking. Psychology Today. https://www.psychologytoday.com/us/blog/the-algorithmic-mind/202507/ai-tutors-are-teaching-kids-to-fake-critical-thinking
Cook, T. (2025, May). AI Weakens Critical Thinking and How to Rebuild It. Psychology Today. https://www.psychologytoday.com/us/blog/the-algorithmic-mind/202505/ai-weakens-critical-thinking-and-how-to-rebuild-it
Davis v. Monroe County Board of Education, 526 U.S. 629 (1999). https://supreme.justia.com/cases/federal/us/526/629/
EU AI Act, Regulation (EU) 2024/1689. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
Fan, Y. et al. (2025). Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance. British Journal of Educational Technology, 56(2), 489-530. https://doi.org/10.1111/bjet.13544
Feathers, T. (2025, October 27). Parents Fell in Love With Alpha School’s Promise. Then They Wanted Out. WIRED. https://www.wired.com/story/ai-teacher-inside-alpha-school/
Garcia v. Character Technologies, Inc., Case No. 6:24-cv-01903 (M.D. Fla. 2024). Complaint. https://storage.courtlistener.com/recap/gov.uscourts.flmd.419065/gov.uscourts.flmd.419065.1.0.pdf
GDPR, Regulation (EU) 2016/679, Article 35. https://gdpr-info.eu/art-35-gdpr/
Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. Societies, 15(1), 6. https://doi.org/10.3390/soc15010006
Instructure & OpenAI. (2025, July 23). Instructure and OpenAI Announce Global Partnership to Embed AI Learning Experiences Within Canvas [Press release]. https://www.prnewswire.com/news-releases/instructure-and-openai-announce-global-partnership-to-embed-ai-learning-experiences-within-canvas-302511709.html
Irish Data Protection Commission, Decision IN-20-7-4 (September 2022). Press release. https://www.dataprotection.ie/en/news-media/press-releases/data-protection-commission-announces-decision-instagram-inquiry
Kentucky Attorney General. (2026, January 8). AG Coleman Sues AI Chatbot Company for Preying on Children [Press release]. https://www.kentucky.gov/Pages/Activity-stream.aspx?n=AttorneyGeneral&prId=1857
Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv:2506.08872. https://arxiv.org/abs/2506.08872
Lindsay, K. (2025, July 30). Canvas-OpenAI Alliance: Is the LMS Model Now on Borrowed Time? Kate Lindsay Blogs. https://katelindsayblogs.com/2025/07/24/canvas-openai-alliance-is-the-lms-model-now-on-borrowed-time/
Lufkin, R., quoted in Campus News (2025, August 26). Canvas Is Now AI-Ignited. https://cccnews.info/2025/08/26/canvas-is-now-ai-ignited/
McCann, J. (2025, June 17). How States Are Responding to the Rise of AI in Education. Education Commission of the States. https://www.ecs.org/artificial-intelligence-ai-education-task-forces/
National Environmental Policy Act of 1969, Pub. L. 91-190, 42 U.S.C. sections 4321-4370h (1970). https://www.law.cornell.edu/uscode/text/42/chapter-55
Robertson v. Methow Valley Citizens Council, 490 U.S. 332 (1989). https://supreme.justia.com/cases/federal/us/490/332/
Serfaty, S., Gaudino, L., & Robertson, N. (2026, January 29). ‘What if I told you this school had no teachers?’: Is AI schooling the future of education, or a risky bet? CNN. https://www.cnn.com/2026/01/29/politics/alpha-school-trump-ai-teaching
UN Committee on the Rights of the Child, General Comment No. 25 (2021). https://www.ohchr.org/en/documents/general-comments-and-recommendations/general-comment-no-25-2021-childrens-rights-relation
UNCRC (Incorporation) (Scotland) Act 2024. https://www.legislation.gov.uk/asp/2024/1/contents/enacted
U.S. Department of Education, Office of Educational Technology. (2023, May). Artificial Intelligence and the Future of Teaching and Learning: Insights and Recommendations. https://www.ed.gov/sites/ed/files/documents/ai-report/ai-report.pdf
1 Instructure & OpenAI. (2025, July 23). Instructure and OpenAI Announce Global Partnership to Embed AI Learning Experiences Within Canvas [Press release]. The Keynes AI persona example appeared in promotional materials associated with the announcement. See also Lufkin, R., quoted in Campus News (2025, August 26). Full URLs in References.
2 Lindsay, K. (2025, July 30). Canvas-OpenAI Alliance: Is the LMS Model Now on Borrowed Time? Kate Lindsay Blogs.
3 Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. Societies, 15(1), 6.
4 Anthropic. (2025, April). How University Students Use Claude. Anthropic Education Report.
5 Fan, Y. et al. (2025). Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance. British Journal of Educational Technology, 56(2), 489-530.
6 Kosmyna, N., Hauptmann, E., Yuan, Y. T., Situ, J., Liao, X.-H., Beresnitzky, A. V., Braunstein, I., & Maes, P. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. arXiv:2506.08872.
7 Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D., & Jurafsky, D. (2026). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. Science, 391(6792).
8 Steinberg, L. (2014). Age of Opportunity: Lessons from the New Science of Adolescence. Houghton Mifflin Harcourt.
9 Federal Trade Commission. (2022, May). Policy Statement on Education Technology and COPPA.
10 Travelers Property Casualty Co. v. International Control Services, No. 22-cv-2145 (C.D. Ill. 2022). Policy voided from inception for misrepresented MFA controls.
11 Feathers, T. (2025, October 27). Parents Fell in Love With Alpha School’s Promise. Then They Wanted Out. WIRED; Serfaty, S., Gaudino, L., & Robertson, N. (2026, January 29). Is AI Schooling the Future of Education, or a Risky Bet? CNN.
12 Cook, T. (2025, July 26). AI Tutors Are Teaching Kids to Fake Critical Thinking. Psychology Today.
13 Cook, T. (2025, May). AI Weakens Critical Thinking and How to Rebuild It. Psychology Today.
14 Barshay, J. (2025, July 14). Evidence Increases That Students Offload Critical Thinking to AI. The Hechinger Report.
15 EU AI Act, Regulation (EU) 2024/1689, Annex III, Category 3. Educational AI classified as high-risk.
16 U.S. Department of Education, Office of Educational Technology. (2023, May). Artificial Intelligence and the Future of Teaching and Learning: Insights and Recommendations.
17 McCann, J. (2025, June 17). How States Are Responding to the Rise of AI in Education. Education Commission of the States.
18 Yanes v. City of New York (N.Y. Sup. Ct. 2019, upheld 2020). $59.2M verdict. Safety bulletin not disseminated. Jury attributed negligence to the Board rather than the teacher.
19 Garcia v. Character Technologies, Inc., Case No. 6:24-cv-01903 (M.D. Fla. 2024). Court declined, at the motion-to-dismiss stage, to accept Character.AI’s First Amendment defense and allowed the case to proceed.
20 Kentucky Attorney General. (2026, January 8). AG Coleman Sues AI Chatbot Company for Preying on Children [Press release]. Commonwealth of Kentucky v. Character Technologies, Inc., No. 26-CI-00029 (Franklin Cir. Ct. 2026).
21 National Environmental Policy Act of 1969, Pub. L. 91-190, 42 U.S.C. sections 4321-4370h (1970).
22 Calvert Cliffs’ Coordinating Committee v. U.S. Atomic Energy Commission, 449 F.2d 1109 (D.C. Cir. 1971). Held NEPA creates judicially enforceable duties; EIS preparation must be more than a “pro forma ritual.”
23 Standing Rock Sioux Tribe v. U.S. Army Corps of Engineers (D.D.C., 2017-2021). Pipeline easement vacated for inadequate assessment despite pipeline being built and operating.
24 Robertson v. Methow Valley Citizens Council, 490 U.S. 332 (1989). Omitting mitigation discussion undermines NEPA’s action-forcing function.
25 GDPR, Regulation (EU) 2016/679, Article 35. Mandatory DPIA before high-risk processing.
26 Irish Data Protection Commission, Decision IN-20-7-4 (September 2022). EUR 405M total fine against Meta/Instagram. Decision found infringements of Articles 5(1)(a), 5(1)(c), 6(1), 12(1), 24(1), 25(1), 25(2), and 35(1). Two Article 35(1) DPIA findings carried proposed fines of up to EUR 45M each. Article 24(1) was found infringed but is not independently fineable under Article 83.
27 Colorado SB 24-205 (signed May 2024, effective date delayed to June 30, 2026 by SB 25B-004). Rebuttable presumption of reasonable care for entities satisfying the statute's governance requirements, including risk management policies, impact assessments, annual review, and notice obligations.
28 Houston Federation of Teachers v. Houston ISD (S.D. Tex. 2017). EVAAS algorithm violated procedural due process.
29 UN Committee on the Rights of the Child, General Comment No. 25 (2021). CRIAs for the digital environment.
30 UNCRC (Incorporation) (Scotland) Act 2024. Requires Scottish Ministers to prepare CRWIAs for bills, certain regulations, and strategic decisions affecting children (s. 17). Makes it unlawful for public authorities to act incompatibly with UNCRC requirements (s. 6), with judicial enforcement (s. 7).
31 Scarlett Lewis et al. v. Newtown Board of Education (Conn. Super. Ct. 2018, affirmed 2019). Documented guidelines shielded the school.
32 Krent, H.J. et al. (2019). AI Goes to School: Implications for School District Liability. Buffalo Law Review, 67, 1329.
33 Wyke v. Polk County School Board, 129 F.3d 560 (11th Cir. 1997). Absent intervention procedures central to negligence finding.
34 Eisel v. Board of Education of Montgomery County, 597 A.2d 447 (Md. 1991). School’s own policy used as sword.
35 Rogers v. Christina School District, 73 A.3d 1 (Del. Super. 2013). Documented but violated policy supported negligence per se.
36 Robbins v. Lower Merion School District, No. 2:10-cv-00665 (E.D. Pa. 2010). 66,000 webcam images; no policy framework. $610K settlement.
37 Davis v. Monroe County Board of Education, 526 U.S. 629 (1999). Deliberate indifference standard.
38 In re Caremark Int’l Inc. Deriv. Litig., 698 A.2d 959 (Del. Ch. 1996). Board’s complete failure to implement reporting systems constitutes bad faith.
39 Marchand v. Barnhill, 212 A.3d 805 (Del. 2019). Lower bar for Caremark claims on mission-critical risks.
40 Construction Industry Laborers Pension Fund v. Bingle (Del. Ch. Sept. 6, 2022) (SolarWinds). Court dismissed Caremark claims but acknowledged cybersecurity as “mission critical” and distinguished “business risk” from “noncompliance risk.”



Comments