Featured Work | CAIRE Research Lab | Washington State University

CAIRE’s Academic Highlights

Bridging the gap between artificial intelligence and real-world classrooms. Our research provides evidence-based insights and solutions that empower educators to enhance instructional quality.

Track 1: AI-Assisted Instruction & Teacher Support

AIED

SciEval: A Benchmark for Automatic Evaluation of K–12 Science Instructional Materials

Zhaohui Li, Peng He, Honglu Liu, Zeyuan Wang, Zhiyuan Chen, Tingting Li, Jinjun Xiong

Accepted as FULL Paper at AIED 2026 (March 2026)

Manual evaluation of AI-generated science materials is difficult to scale. We introduce SciEval, a benchmark dataset of 273 materials evaluated across 13 criteria. Our results show that domain-aligned fine-tuning of LLMs yields significant gains in automated pedagogical evaluation.

PDF

AIED

DrawSim-PD: Simulating Student Science Drawings to Support NGSS-Aligned Teacher Diagnostic Reasoning

Arijit Chakma, Peng He, Honglu Liu, Zeyuan Wang, Tingting Li, Tiffany D. Do, Feng Liu

Accepted as FULL Paper at AIED 2026 (March 2026)

To address privacy restrictions on sharing student work, we present DrawSim-PD, a generative framework that simulates NGSS-aligned student science drawings with controllable imperfections. We release a corpus of 10,000 artifacts to overcome data scarcity in visual assessment research.

arXiv

JSET

The Transformative Collaboration of Human Intelligence and Artificial Intelligence in Designing Knowledge-in-Use Science Assessment for Learning

Tingting Li, Joseph S. Krajcik, Rand Spiro

Published in Journal of Science Education and Technology (November 2025)

This study investigates the collaboration between human experts and GPT-4 to design NGSS-aligned, 3D science assessments. Using a design-based research approach, we demonstrate that principled human scaffolding—through structured prompts and iterative expert evaluation—enables AI to co-produce high-quality, equitable tasks. This work offers a transferable refinement framework, positioning generative AI as a collaborative design partner rather than a mere automated tool.

Article

AIED

Can Multimodal LLMs See Science Instruction? Benchmarking Pedagogical Reasoning in K-12 Classroom Videos

Yixuan Shen, Peng He, Honglu Liu, Jinxuan Fan, Yuyang Ji, Tingting Li, Tianlong Chen, Kaidi Xu, Feng Liu

Accepted as Short Paper at AIED 2026 (March 2026)

Existing benchmarks for classroom discourse overlook visual artifacts and model-based reasoning. We address this gap with SciIBI, the first video benchmark for analyzing science classroom discourse, featuring 113 NGSS-aligned clips. Our evaluation reveals current multimodal LLMs struggle to distinguish pedagogically similar practices, suggesting models should accelerate human expert review rather than replace it.

arXiv

EDM

Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation

Yucheng Chu, Peng He, Hang Li, Haoyu Han, Kaiqi Yang, Yu Xue, Tingting Li, Joseph Krajcik, Jiliang Tang

Accepted as Short Paper at EDM 2025

Large language models show promise in automated grading but often lack specific domain knowledge. We propose an adaptive Retrieval-Augmented Generation (RAG) framework that dynamically retrieves and incorporates curated educational sources to evaluate complex science understanding. Our system significantly improves grading accuracy compared to baseline LLM approaches.

arXiv

JSET

Utilizing Deep Learning AI to Analyze Scientific Models: Overcoming Challenges

Tingting Li, Kevin Haudek, Joseph Krajcik

Published in Journal of Science Education and Technology (April 2025)

Assessing complex student scientific models with AI is challenging due to data imbalances. This study employs deep learning and SMOTE (Synthetic Minority Over-sampling Technique) to enhance the fairness and accuracy of automated scoring. Our results demonstrate significant improvements in mirroring human judgment, while highlighting areas where AI must evolve to better interpret creative student expressions.

Article

IJCAI

Enhancing Automated Grading in Science Education through LLM-Driven Causal Reasoning and Multimodal Analysis

Haohao Zhu, Tingting Li, Peng He, Jiayu Zhou

Published at International Joint Conferences on Artificial Intelligence 2025 (November 2025)

Assessing multimodal student work in science education is challenging and often biased by traditional text-only methods. We propose a novel LLM-augmented multimodal evaluation framework that leverages LLMs to generate causal knowledge graphs, capturing essential conceptual relationships in student responses. Experimental results show this approach significantly improves grading accuracy and consistency by mitigating biases like handwriting neatness and answer length.

PDF

Track 2: Learning Sciences & Science Learning

JLS

Manufacturing authenticity as part of written PBL curriculum: Contrived versus spontaneous events

Emily Adah Miller, Tingting Li

Published in Journal of the Learning Sciences (September 2025)

Project-based learning (PBL) centers on authenticity, yet prepackaged curricula struggle to predict genuine classroom events. Using Portraiture, this study examines a third-grade bilingual class to contrast pre-planned lessons with a spontaneous departure caused by a spring snowstorm. Findings reveal that while contrived events support learning, spontaneous events uniquely enable students to actively craft authentic disciplinary tools. We highlight the critical role of teacher expertise in seizing these moments and advocate for trusting teachers to adapt curricula for authentic engagement.

Article