Beyond Memorization: A Cognitive Psychology-Based Framework for AI Intelligence Assessment
Disciplines
Artificial Intelligence and Robotics | Cognitive Science | Other Engineering
Abstract (300 words maximum)
Traditional AI evaluation methods often fail to measure true intelligence due to their reliance on datasets easily incorporated into training regimes. As a result, many AI systems achieve inflated scores not by demonstrating genuine understanding but by recognizing patterns in previously encountered data. Recent research, including Apple’s study introducing GSM-Symbolic, underscores this issue and highlights the need for more robust testing frameworks. This project proposes a novel AI assessment methodology based on cognitive psychology, designed to evaluate intelligence beyond memorization and pattern recognition. Instead of relying on static datasets, our framework will present AI systems with dynamic, open-ended problems that test critical thinking, creativity, and ingenuity. These tests will incorporate principles from human intelligence research, such as problem-solving under novel conditions, generating original ideas, and adapting strategies when faced with unexpected challenges. Unlike existing benchmarks, which often evaluate AI on deterministic tasks with predefined solutions, our approach will focus on qualitative and emergent problem-solving abilities. Key research objectives include designing test scenarios that effectively differentiate between statistical inference and genuine reasoning, developing metrics for evaluating AI performance on open-ended tasks and validating the framework across various AI models. The results of this study could provide a standardized methodology for assessing machine intelligence in a way that is resistant to data contamination and memorization-based inflation. By shifting the focus from recall-driven benchmarks to adaptive reasoning and creativity, this framework aims to establish a more accurate measure of artificial intelligence, contributing to both theoretical and practical advancements in AI evaluation.
Academic department under which the project should be listed
CCSE - Computer Science
Primary Investigator (PI) Name
Razvan Voicu
Beyond Memorization: A Cognitive Psychology-Based Framework for AI Intelligence Assessment
Traditional AI evaluation methods often fail to measure true intelligence due to their reliance on datasets easily incorporated into training regimes. As a result, many AI systems achieve inflated scores not by demonstrating genuine understanding but by recognizing patterns in previously encountered data. Recent research, including Apple’s study introducing GSM-Symbolic, underscores this issue and highlights the need for more robust testing frameworks. This project proposes a novel AI assessment methodology based on cognitive psychology, designed to evaluate intelligence beyond memorization and pattern recognition. Instead of relying on static datasets, our framework will present AI systems with dynamic, open-ended problems that test critical thinking, creativity, and ingenuity. These tests will incorporate principles from human intelligence research, such as problem-solving under novel conditions, generating original ideas, and adapting strategies when faced with unexpected challenges. Unlike existing benchmarks, which often evaluate AI on deterministic tasks with predefined solutions, our approach will focus on qualitative and emergent problem-solving abilities. Key research objectives include designing test scenarios that effectively differentiate between statistical inference and genuine reasoning, developing metrics for evaluating AI performance on open-ended tasks and validating the framework across various AI models. The results of this study could provide a standardized methodology for assessing machine intelligence in a way that is resistant to data contamination and memorization-based inflation. By shifting the focus from recall-driven benchmarks to adaptive reasoning and creativity, this framework aims to establish a more accurate measure of artificial intelligence, contributing to both theoretical and practical advancements in AI evaluation.
