{"id":140,"date":"2025-11-07T09:14:44","date_gmt":"2025-11-07T17:14:44","guid":{"rendered":"https:\/\/labs.wsu.edu\/caire\/?page_id=140"},"modified":"2026-04-03T10:56:47","modified_gmt":"2026-04-03T17:56:47","slug":"blog","status":"publish","type":"page","link":"https:\/\/labs.wsu.edu\/caire\/blog\/","title":{"rendered":"Featured Work"},"content":{"rendered":"\n<h2 class=\"wp-block-heading has-text-align-left  wsu-heading--style-marked wsu-font-size--xlarge\">CAIRE&#8217;s Academic Highlights<\/h2>\n\n\n\n<p class=\"wsu-spacing-after--none wsu-spacing-bottom--sxxsmall wsu-font-size--large\">Bridging the gap between artificial intelligence and real-world classrooms. Our research provides evidence-based insights and solutions that empower educators to enhance instructional quality.<\/p>\n\n\n\n<section class=\"featured-work-section\">\n  <div class=\"featured-work-container\">\n  <h2 class=\"featured-work-section-title\">Track 1: AI-Assisted Instruction &amp; Teacher Support<\/h2>\n\n    <div class=\"featured-work-card reveal\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">AIED<\/div>\n        <img decoding=\"async\" src=\"https:\/\/wpcdn.web.wsu.edu\/wp-labs\/uploads\/sites\/3565\/2026\/03\/2c2105394dde0eedbb90092abad44b18.png\" alt=\"SciEval Benchmark\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">SciEval: A Benchmark for Automatic Evaluation of K\u201312 Science Instructional Materials<\/h3>\n        <p class=\"featured-work-authors\">Zhaohui Li, Peng He, Honglu Liu, Zeyuan Wang, Zhiyuan Chen, Tingting Li, Jinjun Xiong<\/p>\n        <p class=\"featured-work-meta\"><i>Accepted as FULL Paper at AIED 2026 (March 2026)<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          Manual evaluation of AI-generated science materials is difficult to scale. We introduce <strong>SciEval<\/strong>, a benchmark dataset of 273 materials evaluated across 13 criteria. Our results show that domain-aligned fine-tuning of LLMs yields significant gains in automated pedagogical evaluation.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"#\" class=\"featured-work-btn\">PDF<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n    <div class=\"featured-work-card reveal\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">AIED<\/div>\n        <img decoding=\"async\" src=\"https:\/\/wpcdn.web.wsu.edu\/wp-labs\/uploads\/sites\/3565\/2026\/03\/f4dc68deb7f32bae775484114e0c4e08.png\" alt=\"DrawSim-PD Framework\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">DrawSim-PD: Simulating Student Science Drawings to Support NGSS-Aligned Teacher Diagnostic Reasoning<\/h3>\n        <p class=\"featured-work-authors\">Arijit Chakma, Peng He, Honglu Liu, Zeyuan Wang, Tingting Li, Tiffany D. Do, Feng Liu<\/p>\n        <p class=\"featured-work-meta\"><i>Accepted as FULL Paper at AIED 2026 (March 2026)<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          To address privacy restrictions on sharing student work, we present <strong>DrawSim-PD<\/strong>, a generative framework that simulates NGSS-aligned student science drawings with controllable imperfections. We release a corpus of 10,000 artifacts to overcome data scarcity in visual assessment research.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"https:\/\/arxiv.org\/abs\/2602.01578\" class=\"featured-work-btn\">arXiv<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n<div class=\"featured-work-card reveal\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">JSET<\/div>\n        <img decoding=\"async\" src=\"https:\/\/images.unsplash.com\/photo-1456406644174-8ddd4cd52a06?ixlib=rb-4.0.3&amp;auto=format&amp;fit=crop&amp;w=600&amp;q=80\" alt=\"Human-AI Collaboration Assessment\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">The Transformative Collaboration of Human Intelligence and Artificial Intelligence in Designing Knowledge-in-Use Science Assessment for Learning<\/h3>\n        \n        <p class=\"featured-work-authors\">Tingting Li, Joseph S. Krajcik, Rand Spiro <\/p> \n        \n        <p class=\"featured-work-meta\"><i>Published in Journal of Science Education and Technology (November 2025)<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          This study investigates the collaboration between human experts and GPT-4 to design NGSS-aligned, 3D science assessments. Using a design-based research approach, we demonstrate that principled human scaffolding\u2014through structured prompts and iterative expert evaluation\u2014enables AI to co-produce high-quality, equitable tasks. This work offers a transferable refinement framework, positioning generative AI as a collaborative design partner rather than a mere automated tool.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"https:\/\/link.springer.com\/article\/10.1007\/s10956-025-10275-4\" class=\"featured-work-btn\">Article<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n<div class=\"featured-work-card reveal hidden-paper\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">AIED<\/div>\n        <img decoding=\"async\" src=\"https:\/\/wpcdn.web.wsu.edu\/wp-labs\/uploads\/sites\/3565\/2026\/04\/6bd5c826a2621a8c965c7c5412fd915b.png\" alt=\"SciIBI Video Benchmark\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">Can Multimodal LLMs See Science Instruction? Benchmarking Pedagogical Reasoning in K-12 Classroom Videos<\/h3>\n        <p class=\"featured-work-authors\">Yixuan Shen, Peng He, Honglu Liu, Jinxuan Fan, Yuyang Ji, Tingting Li, Tianlong Chen, Kaidi Xu, Feng Liu<\/p>\n        <p class=\"featured-work-meta\"><i>Accepted as Short Paper at AIED 2026 (March 2026)<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          Existing benchmarks for classroom discourse overlook visual artifacts and model-based reasoning. We address this gap with <strong>SciIBI<\/strong>, the first video benchmark for analyzing science classroom discourse, featuring 113 NGSS-aligned clips. Our evaluation reveals current multimodal LLMs struggle to distinguish pedagogically similar practices, suggesting models should accelerate human expert review rather than replace it.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"https:\/\/arxiv.org\/abs\/2602.18466\" class=\"featured-work-btn\">arXiv<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n<div class=\"featured-work-card reveal hidden-paper\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">EDM<\/div>\n        <img decoding=\"async\" src=\"https:\/\/images.unsplash.com\/photo-1555949963-ff9fe0c870eb?ixlib=rb-4.0.3&amp;auto=format&amp;fit=crop&amp;w=600&amp;q=80\" alt=\"RAG Grading Framework\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">Enhancing LLM-Based Short Answer Grading with Retrieval-Augmented Generation<\/h3>\n        <p class=\"featured-work-authors\">Yucheng Chu, Peng He, Hang Li, Haoyu Han, Kaiqi Yang, Yu Xue, Tingting Li, Joseph Krajcik, Jiliang Tang<\/p>\n        <p class=\"featured-work-meta\"><i>Accepted as Short Paper at EDM 2025<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          Large language models show promise in automated grading but often lack specific domain knowledge. We propose an adaptive <strong>Retrieval-Augmented Generation (RAG)<\/strong> framework that dynamically retrieves and incorporates curated educational sources to evaluate complex science understanding. Our system significantly improves grading accuracy compared to baseline LLM approaches.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"https:\/\/arxiv.org\/abs\/2504.05276\" class=\"featured-work-btn\">arXiv<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n<div class=\"featured-work-card reveal hidden-paper\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">JSET<\/div>\n        <img decoding=\"async\" src=\"https:\/\/images.unsplash.com\/photo-1620712943543-bcc4688e7485?ixlib=rb-4.0.3&amp;auto=format&amp;fit=crop&amp;w=600&amp;q=80\" alt=\"Deep Learning AI Scientific Models\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">Utilizing Deep Learning AI to Analyze Scientific Models: Overcoming Challenges<\/h3>\n        <p class=\"featured-work-authors\">Tingting Li, Kevin Haudek, Joseph Krajcik<\/p>\n        <p class=\"featured-work-meta\"><i>Published in Journal of Science Education and Technology (April 2025)<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          Assessing complex student scientific models with AI is challenging due to data imbalances. This study employs deep learning and <strong>SMOTE<\/strong> (Synthetic Minority Over-sampling Technique) to enhance the fairness and accuracy of automated scoring. Our results demonstrate significant improvements in mirroring human judgment, while highlighting areas where AI must evolve to better interpret creative student expressions.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"https:\/\/link.springer.com\/article\/10.1007\/s10956-025-10217-0\" class=\"featured-work-btn\">Article<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n<div class=\"featured-work-card reveal hidden-paper\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">IJCAI<\/div>\n        <img decoding=\"async\" src=\"https:\/\/images.unsplash.com\/photo-1551288049-bebda4e38f71?ixlib=rb-4.0.3&amp;auto=format&amp;fit=crop&amp;w=600&amp;q=80\" alt=\"LLM-driven Causal Reasoning\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">Enhancing Automated Grading in Science Education through LLM-Driven Causal Reasoning and Multimodal Analysis<\/h3>\n        <p class=\"featured-work-authors\">Haohao Zhu, Tingting Li, Peng He, Jiayu Zhou<\/p>\n        <p class=\"featured-work-meta\"><i>Published at International Joint Conferences on Artificial Intelligence 2025 (November 2025)<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          Assessing multimodal student work in science education is challenging and often biased by traditional text-only methods. We propose a novel <strong>LLM-augmented multimodal evaluation framework<\/strong> that leverages LLMs to generate causal knowledge graphs, capturing essential conceptual relationships in student responses. Experimental results show this approach significantly improves grading accuracy and consistency by mitigating biases like handwriting neatness and answer length.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"https:\/\/www.researchgate.net\/profile\/Peng-He-8\/publication\/394603299_Enhancing_Automated_Grading_in_Science_Education_through_LLM-Driven_Causal_Reasoning_and_Multimodal_Analysis\/links\/68da845df3032e2b4be43c66\/Enhancing-Automated-Grading-in-Science-Education-through-LLM-Driven-Causal-Reasoning-and-Multimodal-Analysis.pdf\" class=\"featured-work-btn\">PDF<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n<div class=\"load-more-container\">\n        <button id=\"loadMoreTrack1\" class=\"load-more-btn\">Load More Publications<\/button>\n    <\/div>\n\n\n  <\/div>\n<\/section>\n\n\n\n<section class=\"featured-work-section\">\n  <div class=\"featured-work-container\">\n    \n    <h2 class=\"featured-work-section-title\">Track 2: Learning Sciences &amp; Science Learning<\/h2>\n\n    <div class=\"featured-work-card reveal\">\n      <div class=\"featured-work-image-col\">\n        <div class=\"featured-work-badge\">JLS<\/div>\n        <img decoding=\"async\" src=\"https:\/\/images.unsplash.com\/photo-1503676260728-1c00da094a0b?ixlib=rb-4.0.3&amp;auto=format&amp;fit=crop&amp;w=600&amp;q=80\" alt=\"PBL and Authentic Learning\">\n      <\/div>\n      \n      <div class=\"featured-work-content-col\">\n        <h3 class=\"featured-work-title\">Manufacturing authenticity as part of written PBL curriculum: Contrived versus spontaneous events<\/h3>\n        <p class=\"featured-work-authors\">Emily Adah Miller, Tingting Li<\/p>\n        <p class=\"featured-work-meta\"><i>Published in Journal of the Learning Sciences (September 2025)<\/i><\/p>\n        <p class=\"featured-work-abstract\">\n          Project-based learning (PBL) centers on authenticity, yet prepackaged curricula struggle to predict genuine classroom events. Using Portraiture, this study examines a third-grade bilingual class to contrast pre-planned lessons with a spontaneous departure caused by a spring snowstorm. Findings reveal that while contrived events support learning, spontaneous events uniquely enable students to actively craft authentic disciplinary tools. We highlight the critical role of teacher expertise in seizing these moments and advocate for trusting teachers to adapt curricula for authentic engagement.\n        <\/p>\n        <div class=\"featured-work-actions\">\n          <a href=\"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/10508406.2025.2557896\" class=\"featured-work-btn\">Article<\/a>\n        <\/div>\n      <\/div>\n    <\/div>\n\n    <\/div>\n<\/section>\n","protected":false},"excerpt":{"rendered":"<p>CAIRE&#8217;s Academic Highlights Bridging the gap between artificial intelligence and real-world classrooms. Our research provides evidence-based insights and solutions that empower educators to enhance instructional quality. Track 1: AI-Assisted Instruction &amp; Teacher Support AIED SciEval: A Benchmark for Automatic Evaluation of K\u201312 Science Instructional Materials Zhaohui Li, Peng He, Honglu Liu, Zeyuan Wang, Zhiyuan Chen, [&hellip;]<\/p>\n","protected":false},"author":44109,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"categories":[],"tags":[],"wsuwp_university_location":[],"wsuwp_university_org":[],"_links":{"self":[{"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/pages\/140"}],"collection":[{"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/users\/44109"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/comments?post=140"}],"version-history":[{"count":53,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/pages\/140\/revisions"}],"predecessor-version":[{"id":2247,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/pages\/140\/revisions\/2247"}],"wp:attachment":[{"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/media?parent=140"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/categories?post=140"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/tags?post=140"},{"taxonomy":"wsuwp_university_location","embeddable":true,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/wsuwp_university_location?post=140"},{"taxonomy":"wsuwp_university_org","embeddable":true,"href":"https:\/\/labs.wsu.edu\/caire\/wp-json\/wp\/v2\/wsuwp_university_org?post=140"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}