About the Project
Years of research document that multiple-choice tests are not good measures of complex cognition and inquiry processes. In the 1990's, researchers attempted to use performance assessments in accountability programs. However, the developers of both hands-on and virtual performance assessments encountered a number of technical, resource, and reliability problems in large scale administration. At that time, these problems were substantial enough to undercut the potentially greater construct validity for science inquiry that performance assessments can provide over paper-and-pencil tests.
We believe that virtual performance assessments based on modern interactive media could mitigate many of these problems encountered historically, because today's technologies have advanced capabilities such as data tracking, access to large data sets, GIS map visualizations, ability to compare and contrast visualizations of different data, and the ability to model phenomena that can't be observed with the naked eye, that were not available a decade ago.
We will develop three assessments that measure scientific inquiry and study their generalizability across students.
Use the links below to find out more about the project.
Purpose: This project will develop and study the feasibility of using virtual performance assessments to assess middle grade (6th and 7th) students' science inquiry learning in a standardized testing setting.
Outcomes: To develop virtual performance assessments that are valid, reliable, and feasible way to measure middle school students inquiry learning.
Content: The content will be based on the National Science Education Standards (NSES) for 6th and 7th grades.
Assessment Development/Framework: To insure construct validity, we plan to use a modified version of the Evidence Centered Design framework.
Research Questions
RQ 1: Can we construct a virtual assessment that measures scientific inquiry, as defined by the NSES? What is the evidence that our assessments are designed to test NSES inquiry abilities?
RQ 2: Are these assessments reliable?
Analytic Method:
RQ 1: We will conduct an alignment study to test the alignment of questions to inquiry standards and test performance expectations (Webb, 1999; Quellmalz, Kreikemeier, Haydel-DeBarger, Haertel, 2006; Quellmalz, 2007) as well as cognitive studies to ensure our questions are measuring what we intend them to measure (Baxter & Glaser, 1998; Messick, 1989; Quellmalz, 2007).
RQ 2: We will conduct generalizability studies (G-study) on their effectiveness (Cronbach, Gleser, Nanda, & Rajaratnam, 1972; Shavelson & Webb, 1991).
Evidence Centered Design is a comprehensive framework that contains four stages of design: domain analysis, domain modeling, conceptual assessment framework and compilation, and a four phase delivery architecture.
Phases 1 and 2 focus on the purposes of the assessment, nature of knowing, and structures for observing and organizing knowledge. In Phase 3, assessment designers focus on the student model (what skills are being assessed), the evidence model (what behaviors/performances elicit the knowledge and skills being assessed), and the task model (situates that elicit the behaviors/evidence). These aspects of the design are inter-related. In the compilation phase, tasks are created. The purpose is to develop models for schema-based task authoring and developing protocols for fitting and estimation of psychometric models. Phase 4 of the delivery architecture, focuses on the presentation and scoring of the task.
Mislevy, R., & Haertel, G. (2006). Implications of Evidence-Centered Design for Educational Testing (Draft PADI Technical Report 17). Menlo Park, CA: SRI International.
Mislevy, R.J., Steinberg, L.S., & Almond, R.G. (2003). On the structure of educational
assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3-62.