« PreviousHomeNext »
Evaluation is a methodological area that is closely related to, but distinguishable from more traditional social research. Evaluation utilizes many of the same methodologies used in traditional social research, but because evaluation takes place within a political and organizational context, it requires group skills, management ability, political dexterity, sensitivity to multiple stakeholders and other skills that social research in general does not rely on as much. Here we introduce the idea of evaluation and some of the major terms and issues in the field.
Definitions of Evaluation
Probably the most frequently given definition is:
Evaluation is the systematic assessment of the worth or merit of some object
This definition is hardly perfect. There are many types of evaluations that do not necessarily result in an assessment of worth or merit -- descriptive studies, implementation analyses, and formative evaluations, to name a few. Better perhaps is a definition that emphasizes the information-processing and feedback functions of evaluation. For instance, one might say:
Evaluation is the systematic acquisition and assessment of information to provide useful feedback about some object
Both definitions agree that evaluation is a systematic endeavor and both use the deliberately ambiguous term 'object' which could refer to a program, policy, technology, person, need, activity, and so on. The latter definition emphasizes acquiring and assessing information rather than assessing worth or merit because all evaluation work involves collecting and sifting through data, making judgements about the validity of the information and of inferences we derive from it, whether or not an assessment of worth or merit results.
The Goals of Evaluation
The generic goal of most evaluations is to provide "useful feedback" to a variety of audiences including sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often, feedback is perceived as "useful" if it aids in decision-making. But the relationship between an evaluation and its impact is not a simple one -- studies that seem critical sometimes fail to influence short-term decisions, and studies that initially seem to have no influence can have a delayed impact when more congenial conditions arise. Despite this, there is broad consensus that the major goal of evaluation should be to influence decision-making or policy formulation through the provision of empirically-driven feedback.
'Evaluation strategies' means broad, overarching perspectives on evaluation. They encompass the most general groups or "camps" of evaluators; although, at its best, evaluation work borrows eclectically from the perspectives of all these camps. Four major groups of evaluation strategies are discussed here.
Scientific-experimental models are probably the most historically dominant evaluation strategies. Taking their values and methods from the sciences -- especially the social sciences -- they prioritize on the desirability of impartiality, accuracy, objectivity and the validity of the information generated. Included under scientific-experimental models would be: the tradition of experimental and quasi-experimental designs; objectives-based research that comes from education; econometrically-oriented perspectives including cost-effectiveness and cost-benefit analysis; and the recent articulation of theory-driven evaluation.
The second class of strategies are management-oriented systems models. Two of the most common of these are PERT, the Program Evaluation and Review Technique, and CPM, the Critical Path Method. Both have been widely used in business and government in this country. It would also be legitimate to include the Logical Framework or "Logframe" model developed at U.S. Agency for International Development and general systems theory and operations research approaches in this category. Two management-oriented systems models were originated by evaluators: the UTOS model where U stands for Units, T for Treatments, O for Observing Observations and S for Settings; and the CIPP model where the C stands for Context, the I for Input, the first P for Process and the second P for Product. These management-oriented systems models emphasize comprehensiveness in evaluation, placing evaluation within a larger framework of organizational activities.
The third class of strategies are the qualitative/anthropological models. They emphasize the importance of observation, the need to retain the phenomenological quality of the evaluation context, and the value of subjective human interpretation in the evaluation process. Included in this category are the approaches known in evaluation as naturalistic or 'Fourth Generation' evaluation; the various qualitative schools; critical theory and art criticism approaches; and, the 'grounded theory' approach of Glaser and Strauss among others.
Finally, a fourth class of strategies is termed participant-oriented models. As the term suggests, they emphasize the central importance of the evaluation participants, especially clients and users of the program or technology. Client-centered and stakeholder approaches are examples of participant-oriented models, as are consumer-oriented evaluation systems.
With all of these strategies to choose from, how to decide? Debates that rage within the evaluation profession -- and they do rage -- are generally battles between these different strategists, with each claiming the superiority of their position. In reality, most good evaluators are familiar with all four categories and borrow from each as the need arises. There is no inherent incompatibility between these broad strategies -- each of them brings something valuable to the evaluation table. In fact, in recent years attention has increasingly turned to how one might integrate results from evaluations that use different strategies, carried out from different perspectives, and using different methods. Clearly, there are no simple answers here. The problems are complex and the methodologies needed will and should be varied.
Types of Evaluation
There are many different types of evaluations depending on the object being evaluated and the purpose of the evaluation. Perhaps the most important basic distinction in evaluation types is that between formative and summative evaluation. Formative evaluations strengthen or improve the object being evaluated -- they help form it by examining the delivery of the program or technology, the quality of its implementation, and the assessment of the organizational context, personnel, procedures, inputs, and so on. Summative evaluations, in contrast, examine the effects or outcomes of some object -- they summarize it by describing what happens subsequent to delivery of the program or technology; assessing whether the object can be said to have caused the outcome; determining the overall impact of the causal factor beyond only the immediate target outcomes; and, estimating the relative costs associated with the object.
Formative evaluation includes several evaluation types:
- needs assessment determines who needs the program, how great the need is, and what might work to meet the need
- evaluability assessment determines whether an evaluation is feasible and how stakeholders can help shape its usefulness
- structured conceptualization helps stakeholders define the program or technology, the target population, and the possible outcomes
- implementation evaluation monitors the fidelity of the program or technology delivery
- process evaluation investigates the process of delivering the program or technology, including alternative delivery procedures
Summative evaluation can also be subdivided:
- outcome evaluations investigate whether the program or technology caused demonstrable effects on specifically defined target outcomes
- impact evaluation is broader and assesses the overall or net effects -- intended or unintended -- of the program or technology as a whole
- cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing outcomes in terms of their dollar costs and values
- secondary analysis reexamines existing data to address new questions or use methods not previously employed
- meta-analysis integrates the outcome estimates from multiple studies to arrive at an overall or summary judgement on an evaluation question
Evaluation Questions and Methods
Evaluators ask many different kinds of questions and use a variety of methods to address them. These are considered within the framework of formative and summative evaluation as presented above.
In formative research the major questions and methodologies are:
What is the definition and scope of the problem or issue, or what's the question?
Formulating and conceptualizing methods might be used including brainstorming, focus groups, nominal group techniques, Delphi methods, brainwriting, stakeholder analysis, synectics, lateral thinking, input-output analysis, and concept mapping.
Where is the problem and how big or serious is it?
The most common method used here is "needs assessment" which can include: analysis of existing data sources, and the use of sample surveys, interviews of constituent populations, qualitative research, expert testimony, and focus groups.
How should the program or technology be delivered to address the problem?
Some of the methods already listed apply here, as do detailing methodologies like simulation techniques, or multivariate methods like multiattribute utility theory or exploratory causal modeling; decision-making methods; and project planning and implementation methods like flow charting, PERT/CPM, and project scheduling.
How well is the program or technology delivered?
Qualitative and quantitative monitoring techniques, the use of management information systems, and implementation assessment would be appropriate methodologies here.
The questions and methods addressed under summative evaluation include:
What type of evaluation is feasible?
Evaluability assessment can be used here, as well as standard approaches for selecting an appropriate evaluation design.
What was the effectiveness of the program or technology?
One would choose from observational and correlational methods for demonstrating whether desired effects occurred, and quasi-experimental and experimental designs for determining whether observed effects can reasonably be attributed to the intervention and not to other sources.
What is the net impact of the program?
Econometric methods for assessing cost effectiveness and cost/benefits would apply here, along with qualitative methods that enable us to summarize the full range of intended and unintended impacts.
Clearly, this introduction is not meant to be exhaustive. Each of these methods, and the many not mentioned, are supported by an extensive methodological research literature. This is a formidable set of tools. But the need to improve, update and adapt these methods to changing circumstances means that methodological research and development needs to have a major place in evaluation work.
« PreviousHomeNext »
Copyright ©2006, William M.K. Trochim, All Rights Reserved
Purchase a printed copy of the Research Methods Knowledge Base
Last Revised: 10/20/2006
This article focuses on the use formative and summative assessments for evaluating student progress. Discussion is presented on the positive and negative aspects of each, their commonalities, as well as how they differ from each other. Examples of formative and summative assessments are also included. Comments on assessment for district reporting and high stakes testing are also provided.
Keywords Assessment; Classroom Testing; Cognition; Diagnostic Assessment; Evaluation; Formative Assessment; High-Stakes Testing; Learning Styles; No Child Left Behind Act of 2001 (NCLB); Portfolio; Predictive Assessment; Reflection; Rubric; Summative Assessment; Valid Feedback
There are two major types of assessments being used in classrooms today-summative and formative. These assessments have very obvious differences but also share some similarities depending on how they are administered and evaluated. Summative assessments are intended to summarize what students have learned and occur after instruction has been completed at the end of a predetermined point in time or instructional component. It can occur at the end of the school year or term; at the end of an instructional unit or chapter; and at the end of elementary, middle or high school. Formative assessments are generally considered part of the instructional process and are intended to provide information needed to help instructors adjust their instruction and help students learn while instruction is occurring. Formative assessment is not graded and is used as an ongoing diagnostic tool, which means it should occur regularly and the results should be shared with students in a timely manner in order to be effective. Any adjustments that need to be made in instruction are intended to ensure that all students meet pre-established learning goals within a specific timeframe (Garrison & Ehringhaus, n.d.).
There are quite a few similarities between formative and summative assessments. Both assessments require active instructor involvement to be effective. It also does not matter what kind of assessment is used; instructors must be able to help motivate their students to learn and get them excited about the learning process. However, the similarities stem more from how the two need to be codependent in order to produce the desired results for students. The formative assessment must align with the summative to produce valid grades and scores. This can be accomplished by reviewing student work and looking at past test questions and answers to determine any areas of weakness and then successfully addressing them before the summative assessment is administered (Harlen, 2005).
There are a few distinct differences between formative and summative assessments. The primary goal of summative assessment is to be able to provide an overall measure of student performance at a particular point in time in a grade or score format. This report can be given to parents, districts, states, and others and can have serious consequences attached to it for both the student and the school, such as students not being promoted to the next grade, not getting into their college of choice or the school not receiving funding. The primary goal of formative assessment is to provide feedback within the classroom with no real consequences attached (Starkman, 2006). Another way to distinguish between formative and summative assessments is that formative assessments can be considered a type of practice for students because they are not being graded, whereas summative assessments depend completely on a grade or score. Formative assessments depend on student involvement and feedback to be effective and summative assessments do not (Garrison & Ehringhaus, n.d.).
Examples of Formative
Almost any assessment can be either formative or summative, depending on whether a grade or score is given and recorded and whether or not feedback and reflection are involved in the process. Some of the more familiar examples of summative assessments include tests, final exams, graded projects, work portfolios, PSAT exams and SAT and ACT exams. Some examples of formative assessments, which can be both formal and informal, include ungraded quizzes, instructor questioning and observations, draft work, and portfolio reviews (McTighe & O'Connor, 2005). Other examples of formative assessments include “reviewing homework and classroom work for errors or misunderstandings; observing students as they read, work with others, carry out assignments, or solve problems; talking with students”; and listening to student responses during a lesson. Instructors may also give a pretest before beginning a unit or chapter to determine students' existing knowledge (Nitko, 1994, p.7). A post-test would work as a formative assessment if the knowledge of the completed unit or chapter is necessary for understanding the next unit or chapter. This can be especially true for mathematics where one concept can build on all previously learned concepts and a solid foundation is crucial for any future success (Nitko, 1994). Any opportunity for revisions on tests or any other type of assessment that gives students a chance to work through, think about, and eventually understand an area they did not understand or were not able to clearly articulate before, is a type of formative assessment ("The Relationship Between Formative," 2001).
Some formative assessment ideas that can work for practically any grade are having students write a paragraph, having students keep a journal, asking text-based questions, checking students' notebooks, conducting impromptu quizzes, creating and handing out worksheets, assigning homework, conducting oral questioning and having all students respond, having daily review questions before beginning the new lesson, and having students compare answers.
Some ideas for summative assessment include creating a newsletter, critiquing an article or book, having students create their own books, assigning research papers, having students present to the rest of the class, having students analyze a book or specific text, and basically assigning anything that can be graded once students have a clear understanding of the grading rubric and what is expected of them (Minneapolis Public Schools, 2005).
Often, predictive-- or diagnostic-- assessment is used before more formal assessments. These can be considered a combination of both summative and formative assessment that has become more prevalent with the enactment of the No Child Left Behind Act of 2001 (NCLB). Predictive/diagnostic assessments may also be known as pre-assessments and are designed to closely follow what will more than likely be asked on a summative assessment. The intent for using predictive/diagnostic assessments is to predict how well students will perform on high-stakes tests used to meet NCLB guidelines and state standards. Diagnostic reports can show specific errors that students make so teachers can target instruction to classroom needs, which makes it simpler to increase student performance and help schools, districts, and states meet their achievement goals. In fact, at least one publishing company has developed predictive assessments that are specifically aligned to each state's high-stakes tests (Starkman, 2006). A purely diagnostic assessment can be used to profile students' interests and help determine their preferred learning styles. They can also help instructors plan their instruction and develop curriculum by helping to determine whether or not classroom instruction is closely aligned with federal and/or state high-stakes tests. Since these assessments are intended for diagnostic/predictive purposes, they are generally not graded (McTighe & O'Connor, 2005).
Positive Aspects of Formative Assessment
Provides feedback to Instructors
Formative assessments occur at the same time as instruction. This means that formative assessments can provide specific feedback to both instructors and students regarding each student's learning, thus allowing instructors to modify and improve instruction midstream. Formative assessment can provide immediate, contextualized feedback. With formative assessment, there is improved feedback between students and their teachers; and students become actively involved in their own learning, which can help stimulate student motivation, engagement, and learning. Instructors can use the feedback attained to adapt their teaching practices to specific student needs. Since formative assessment does occur concurrently with instruction, teachers can determine what concepts and skills have been mastered and revisit as often as necessary those concepts and skills that have not been mastered. Formative assessment can also influence other factors that come into play when discussing student achievement. When instructors determine that they are simply not reaching the class using their own preferred teaching techniques and methods, they are forced to rethink how they teach. Attempting to use different techniques to reach students with different learning styles challenges instructors and keeps them engaged in the learning process since formative assessment provides immediate feedback about any success or failure that any particular technique may produce. Evaluation of all the feedback received from various formative assessments can show where there are deficiencies and strengths. This can help the instructor arrange for additional resources to help those students who are not thriving or doing well in class instead of waiting until after the high-stakes test (Irving, 2007).
Reveals Learning Gap
Formative assessments may also reveal that there is a learning gap since it may be assumed that students know concepts that were taught in previous classes, for example, in mathematics and algebra classes. It is generally assumed...