Researchers recruited schools from 12 districts to participate in the study for at least one year. The study team randomly assigned the four curricula to the participating schools within each study district. Though not a representative sample of all elementary schools in the United States, the study schools were dispersed geographically and in areas with various levels of urbanicity.
The participating schools also served a higher percentage of students eligible for free or reduced-price meals than the average elementary school. The study involved intense data collection during the —, —, and — school years. Researchers administered standardized tests to students, surveyed teachers, observed classroom instruction, observed teacher curriculum training, and collected student demographic information.
The first two study reports Agodini et al. New results Agodini et al. These second-year effects are based on 58 of the schools. As the United States and other nations strive to lead the world in scientific discovery and innovation, assessing the success of policies and programs in STEM—science, technology, engineering, and mathematics—becomes increasingly important.
CESA: Please select one Math Coach. How did you hear about us? This field is for validation purposes and should be left unchanged. Stay up-to-date. Join Our Mailing List. I Want to Collaborate with the Math Institute! Because mathematics has been stereotyped as cut and dried, some assessment designers have assumed that creating high-quality mathematics tasks is simple and straightforward.
That assumption is false. Because mathematics relies on precise reasoning, errors easily creep into the words, figures, and symbols in which assessment tasks are expressed.
Open-ended tasks can be especially difficult to design and administer because there are so many ways in which they can misrepresent what students know and can do with mathematics. They may be confused about what constitutes an adequate answer, or they may simply be reluctant to produce more than a single answer when multiple answers are called for. In an internal assessment constructed by a teacher, the administration and scoring can be adapted to take account of misunderstanding and confusion.
In an external assessment, such adjustments are more difficult to make. The contexts in which assessment tasks are administered and the interpretations students are making of them are critical in judging the significance of the content.
Give your answer to the nearest degree. Difficulties arise when attempts are made to put mathematics into realistic settings. The setting may be so unfamiliar that students cannot see mathematics in it. Or, the designer of the task may have strained too hard to make the mathematics applicable, ending up with an artificial reality, as in the example above.
As Swan notes, 8 the mathematical content is not incorrect, but mathematics is being misused in this task. A task designer who wants to claim the situation is realistic should pose a genuine question: Where should the stops be put under the board so that it will be convenient for people of different heights?
The thinking processes students are expected to use are as important as the content of the assessment tasks. The thinking processes students are expected to use in an assessment are as important as the content of the tasks.
The process dimension of mathematics has not merited sufficient attention in evaluations of traditional multiple-choice tests. The key issue is whether the assessment tasks actually call for students to use the kind of intellectual processes required to demonstrate mathematical power: reasoning, problem solving, communicating, making connections, and so on.
This kind of judgment becomes especially important as interesting tasks are developed that may have the veneer of mathematics but can be completed without students' ever engaging in serious mathematical thinking. To judge the adequacy of the thinking processes used in an assessment requires methods of analyzing tasks to reflect the steps that contribute to successful performance.
To paint a bathroom, a painter needs 2 gallons of light blue paint mixed in a proportion of 4 parts white to 3 parts blue.
From a previous job, she has I gallon of a darker blue paint mixed in the proportion of I part white to 2 parts blue. How much white paint must be added and how much blue paint? Discuss in detail how to model this problem, and then use your model to solve it.
The analysis of task demands, however, is not sufficient. The question of what processes students actually use in tackling the tasks must also be addressed. For example, could a particular problem designed to assess proportional reasoning be solved satisfactorily by using less sophisticated operations and knowledge? A problem on mixing paint, described at left, was written by a mathematics teacher to get at high-level understanding of proportions and to be approachable in a variety of ways. Does it measure what was intended?
Such questions can be answered by having experts in mathematics education and in cognitive science review tasks and evaluate student responses to provide information about the cognitive processes used. In the mixing paint example, there are solutions to the problem that involve computation with complicated fractions more than proportional reasoning, so that a student who finds a solution has not necessarily used the cognitive processes that were intended by the task developer.
Students' responses to the task, including what they say when they think aloud as they work, can suggest what those processes might be.
Students can be given part of a task to work on, and their reactions can be used to construct a picture of their thinking on the task. Students also can be interviewed after an assessment to detect what they were thinking as they worked on it. Their written work and videotapes of their activity can be used to prompt their recollections. None of these approaches alone can convey a complete picture of the student's internal processes, but together they can help clarify the extent to which an assessment taps the kinds of mathematical thinking that designers have targeted with various tasks.
Researchers are beginning to examine the structure of complex performance assessments in mathematics, but few studies have appeared so far in which labor-intensive tasks such as projects and investigations are used. Innovative assessment tasks are often assumed to make greater cognitive demands on students than traditional test items do. Because possibilities for responses to alternative assessment tasks may be broader than those of traditional items, developers must work harder to specify the type of response they want to evoke from the task.
For example, the QUASAR project has developed a scheme for classifying tasks that involves four dimensions: 1 cognitive processes such as understanding and representing problems, discerning mathematical relationships, organizing information, justifying procedures, etc. By classifying tasks along four dimensions, the QUASAR researchers can capture much of the richness and complexity of high-level mathematical performance. The QCAI is a paper-and-pencil instrument for large-group administration to individual students.
At each school site, several dozen tasks might be administered, but each student might receive only 8 or 9 of them. A sample task developed for use with sixth grade students is at left.
Yvonne is trying to decide whether she should buy a weekly bus pass. On Monday, Wednesday and Friday she rides the bus to and from work. On Tuesday and Thursday she rides the bus to work, but gets a ride home with her friends. The open-ended tasks used in the QCAI are in various formats. Some ask students to justify their answers; others ask students to show how they found their answers or to describe data presented to them. The tasks are tried out with samples of students and the responses are analyzed.
Tasks are given internal and external reviews. Internal reviews are iterative, so that tasks can be reviewed and modified before and after they are tried out. Tasks are reviewed to see whether the mathematics assessed is important, the wording is clear and concise, and various sources of bias are absent.
Data from pilot administrations, as well as interviews with students thinking aloud or explaining their responses, contribute to the internal review. Multiple variants of a task are pilot tested as a further means of making the task statement clear and unbiased.
External reviews consist of examinations of the tasks by mathematics educators, psychometricians, and cognitive psychologists. They look at the content and processes measured, clarity and precision of language in the task and the directions, and fairness.
They also look at how well the assessment as a whole represents the domain of mathematics. The scoring rubrics are both analytic and holistic. A general scoring rubric similar to that used in the California Assessment Program was developed that reflected the scheme used for classifying tasks. Criteria for each of the three interrelated components of. A specific rubric is developed for each task, using the general scoring rubric for guidance. The process of developing the specific rubric is also iterative, with students' responses and the reactions of reviewers guiding its refinement.
Each year, before the QCAI is administered for program assessment, teachers are sent sample tasks, sample scored responses, and criteria for assigning scores that they use in discussing the assessment with their students. This helps ensure an equitable distribution of task familiarity across sites and gives students access to the performance criteria they need for an adequate demonstration of their knowledge and understanding.
The mathematics in an assessment may be of high quality, but it may not be taught in school or it may touch on only a minor part of the curriculum. For some purposes that may be acceptable. An external assessment might be designed to see how students approach a novel piece of mathematics. A teacher might design an assessment to diagnose students' misconceptions about a single concept. Questions of relevance may be easy to answer.
The term alignment is often used to characterize the congruence that must exist between an assessment and the curriculum. Other purposes, however, may call for an assessment to sample the entire breadth of a mathematics curriculum, whether of a course or a student's school career.
Such purposes require an evaluation of how adequately the assessment treats the depth and range of curriculum content at which it was aimed. Is each important aspect of content given the same weight in the assessment that it receives in the curriculum? Is the full extent of the curriculum content reflected in the assessment? Alignment should be looked at over time and across instruments. Although a single assessment may not be well aligned with the curriculum because it is too narrowly focused, it may be part of a more comprehensive collection of assessments.
The question of alignment is complicated by the multidimensional nature of the curriculum. There is the curriculum as it exists. Depending on the purpose of the assessment, one of these dimensions may be more important than the others in determining alignment. Consider, for example, a curriculum domain consisting of a long list of specific, self-contained mathematical facts and skills.
Consider, in addition, an assessment made up of five complex open-ended mathematics problems to which students provide multi-page answers. Each problem might be scored by a quasi-holistic rubric on each of four themes emphasized in the NCTM Standards : reasoning, problem solving, connections, and communication. The assessment might be linked to an assessment framework that focused primarily on those four themes.
Better methods are needed to judge the alignment of new assessments new curricula. An evaluator interested in the intended curriculum might examine whether and with what frequency students actually use the specific content and skills from the curriculum framework list in responding to the five problems. This examination would no doubt require a reanalysis of the students' responses because the needed information would not appear in the scoring. The assessment and the intended curriculum would appear to be fundamentally misaligned.
An evaluator interested in the implemented curriculum, however, might be content with the four themes. To determine alignment, the evaluator might examine how well those themes had been reflected in the instruction and compare the emphasis they received in instruction with the students' scores.
The counting and matching procedures commonly used for checking alignment work best when both domains consist of lists or simple matrices and when the match of the lists or arrays can be counted as the proportion of items in common.
Curriculum frameworks that reflect important mathematics content and skills e. Better methods are needed to judge the alignment of new assessments with new characterizations of curriculum.
How are enhanced learning and good instruction supported by the assessment? Mathematics assessments should be judged as to how well they reflect the learning principle, with particular attention to two goals that the principle seeks to promote—improved learning and better instruction—and to its resulting goal of a high-quality educational system.
Student engagement in assessment tasks should be judged through various types of evidence, including teacher reports, student reports, and observations.
Assessments might enhance student learning in a variety of ways. Each needs careful investigation before a considered judgment is reached on the efficacy of specific assessment features. For example, a common claim is that assessment can and should raise both students' and teachers' expectations of performance, which will result in greater learning.
Research on new assessments should seek to document this assertion. Students are also presumed to need more active engagement in mathematics learning. Assessments support student learning to the extent that they succeed in engaging even those students with limited mathematical proficiency in solving meaningful problems. This support often involves activities about which students have some knowledge and interest or that otherwise motivate engagement. However, if challenging assessments are so far beyond the grasp of students whose knowledge lags behind the goals of reform, and such students are closed off from demonstrating what they do know, the assessments may well have negative effects on these students' learning.
This question, like many others, deserves further investigation. In any case, student engagement in assessment tasks should be judged through various types of evidence, including teacher reports, student reports, and observations. Learning to guide one's own learning and to evaluate one's own work is well recognized as important for developing the. Some new forms of assessment make scoring rubrics and sample responses available to students so they can learn to evaluate for themselves how they are doing.
There are indications that attention to this evaluative function in work with teachers and students has desirable effects. More research is needed to determine how best to design and use rubrics to help students' assess their own work. This is another avenue that might be explored to help assessors evaluate an assessment's potential to improve mathematics learning.
Changes in student learning can be assessed directly through changes in performance over time. Finally, changes in student learning can be assessed directly through changes in performance over time. The nature of the assessment used to reflect change is critical.
For example, should one use an assessment for which there is historical evidence, even if that assessment cannot capture changes in the mathematics considered most important for students to learn? Or should one use a new assessment reflecting the new goals but for which there is no historical evidence for comparison? The difficulty with the first situation is that it compromises the content principle.
For a short time, however, it may be desirable to make limited use of assessments for which there is historical evidence and to implement, as quickly as possible, measures that better reflect new goals in a systematic way. Attempts to investigate the consequences of an assessment program on instruction should include attention to changes in classroom activities and instructional methods in the assignments given, in the classroom assessments used, and in the beliefs about important mathematics.
Studies of the effects of standardized tests have made this point quite clearly. For example, a survey of eighth-grade teachers' perceptions of the impact of their state or district mandated testing program revealed an increased use of direct instruction and a decreased emphasis on project work and on the use of calculator or computer activities.
Assessments fashioned in keeping with the learning principle should result in changes more in line with that vision. New methods. The change from multiple-choice tests to directed writing assessments seem to have refocused classroom instruction in California schools. Evaluating instructional changes in mathematics requires evidence about how teachers spend their instructional time, the types of classroom activities they initiate, and how they have changed what they see as most important for instruction.
Shortly after the publication of the NCTM Standards , a study of teachers who were familiar with the document and with its notions about important mathematics showed that they continued to teach much as they had always taught.
The topics and themes recommended in the Standards had not been fully integrated into instruction, and traditional teaching practices continued to dominate. The importance of sustained attention to the professional development of teachers is critical to the sucess of reform. Some evidence of this change can be seen in schools where teachers are experimenting with new, more powerful forms of assessment. Early observations also raise warnings about superficial changes and about lip service paid to views that teachers have not yet internalized.
0コメント