Trends
in Program Evaluation Literature:
The Emergence of Pragmatism,
TCALL Occasional Research Paper No. 5
Roemer
M.S. Visser
Texas Center for Adult Literacy & Learning
A review of recent
program evaluation literature reveals two forces that have influenced
program evaluation practice in the 1990s. The first is the drive toward
accountability, leading to the imposition of business practices and accountability
standards on government agencies and nonprofits. The second is the democratization
movement, driven by philosophically charged debates within the social
sciences (including program evaluation), and characterized by attempts
to make evaluation more inclusive, transparent, and democratic. These
trends are largely incompatible, and I argue that the persistence of philosophical
debates has spawned an emerging trend: the increased popularity of pragmatic
approaches that mix methods according to the characteristics of the particular
program to be evaluated.
Program evaluation is
rooted in two realms. It has one foot in the academic realm of the social
sciences with its philosophical and methodological debates. The other
foot is securely planted in the realm of program evaluation practice with
its shifting political winds. Changes in both of these realms have influenced
program evaluation practice in the 1990s. The purpose of this paper is
to discuss these changes and their effects as they emerge from a review
of the literature. First, a brief historical overview is presented, identifying
the genesis of mainstream evaluation practice – the traditional evaluation
– and an early alternative approach, responsive evaluation. Next, two
major forces are presented, one from the realm of practice, and one from
the academic realm, along with the influences they have had on program
evaluation. The trends are then summarized in a Table. First, I argue
that these two main forces are disparate and essentially reflect incompatible
philosophical underpinnings. The second argument is a new trend is emerging
as a result of this unresolved debate: the advent of mixed-method designs,
based on a pragmatic approach to program evaluation.
Early Program Evaluation:
Traditional Versus Responsive Evaluation
Stufflebeam (2001)
provides a succinct historical overview of the field of evaluation. After
a period of relative inactivity in the 1950s, several events and developments
sparked an increased interest in evaluation in the 1960s (Henry, 2001).
The Civil Rights Act of 1964 mandated equitable treatment for minorities
and the disabled. Great Society programs (Greene, 2001) such as the War
on Poverty (Weiss, 1998) were initiated and needed to be evaluated. The
1960s were also a very successful period for the natural sciences. Achievements
such as putting a man on the moon helped create an almost unshakable faith
in the natural sciences, and led social scientists to adopt these methods
to tackle society’s ills. Patton (1997) refers to this as “a new order
of rationality in government – a rationality undergirded by social scientists”
(p.7). With the application of scientific methods to program evaluations,
traditional evaluation (TE) was born.
Traditional evaluation is characterized by its emphasis on scientific
methods. Reliability and validity of the collected data are key, while
the main criterion for a quality evaluation is methodological rigor. TE
requires the evaluator to be objective and neutral and to be outcome-focused
(Fine, Thayer, & Coghlan, 2000; Torres & Preskill, 2001). This
leads to a preoccupation with experimental methods, numbers (as opposed
to words), statistical tools, and an emphasis on summative evaluations
(aimed to determine whether or not to continue a particular program) rather
than formative ones (aimed at program improvement).
Although TE is still
widely used today, it is not the only available approach to program evaluation.
Competing approaches have since been developed, mostly in response to
one of TE’s most serious drawbacks – the fact that many TE reports are
not used or even read (Torres & Preskill, 2001; Fetterman, 2001; Patton,
1997). One of the earliest alternatives to TE is what is known as Responsive
Evaluation (Stake, 1973).
Briefly, responsive evaluation is an approach to evaluation that is less
objective and more tailored to the needs of those running the program.
In Stake’s own words, responsive evaluation “sacrifices some precision
in measurement, hopefully to increase the usefulness of the findings to
persons in and around the program” (Stake, 1973). It calls attention to
the complexity and the uncertainty of the program, the difficulty in measuring
outcomes, and the importance of descriptive and judgmental data. Rather
than oversimplifying through numbers, Stake argues for storytelling as
a means of conveying the “holistic impression, the mood, even the mystery
of the experience” (Stake, 1973). In essence, the debate hinges on legitimacy:
whereas TE draws legitimacy from scientific rigor, responsive evaluation
draws legitimacy from endorsements by a majority of important stakeholders.
Although Stake took
pains to suggest that responsive evaluation should supplement traditional
evaluation, rather than replacing it, it is easy to see the conflicting
orientations of the two approaches. Thus, the seeds were sown for the
debates discussed in subsequent sections of this paper. This early offshoot
of TE would be a precursor to what has since been referred to as the “paradigm
wars” (Caracelli, 2000.)
This very brief depiction of the historical context of evaluation practice
is intended to provide a backdrop against which recent developments can
be assessed. In short, the 1970s were characterized by a predominantly
social-scientific approach to program evaluation. Other approaches were
not generally accepted as valid or scientific, so the variety of methods
at the evaluator’s disposal was limited.
Of course, much has
happened since the 1970s; as the subsequent sections aim to show, the
1980s and 1990s were characterized by a host of developments both in the
political realm and in the academic realm.
Developments
in the Political Realm: Increased Accountability
Many of the articles
reviewed list examples of recent events that have somehow impacted evaluation
practice. First is an increasing imbalance in supply and demand of funding.
While government funding is declining (Boardman & Vining, 2000), there
is a proliferation of agencies competing for funds (Kaplan, 2001; Rojas,
2000; Lindenberg, 2001), leading to increased funder demands and restrictions
(Poole, Davis, Reisman, & Nelson, 2001; Pratt, McGuigan, & Katzev,
2000), partially fed by publicized, high profile mismanagement cases (Rojas,
2000; Hoefer, 2000).
Second, the information
revolution (e-government, data access, & real-time evaluation) (Mark,
2001; Love, 2001; Datta, 2001) and other improvements in technology have
combined with an increased public demand for evaluation information and
resulting media interest (Henry, 2001).
The third, and most
important development, however, is the Government Performance Recording
Act (GPRA) of 1993 (Poole, Nelson, Carnahan, Chepenik, & Tubiak, 2000;
Pratt et al., 2000; Love, 2001; Datta, 2001; Tassey, 1999; Youtie, Bozeman,
& Shapira, 1999; Toffolon-Weiss, Bertrand, & Terrell, 1999), which
links government agencies’ performance results to future funding.
Taken together, these
influences suggest a political landscape of increased scrutiny (and increased
technological ability to scrutinize), increased competition for decreased
levels of funding, and as a result, increased demand to demonstrate results.
The
Impact of Increased Accountability on Program Evaluation
As a consequence of
this increased emphasis on accountability, nonprofits and government agencies
are facing pressure to demonstrate results, be held accountable, show
high performance, and to behave like business generally (Fine et al.,
2000; Bozzo, 2000; Hoefer, 2000; Renz, 2001; Lindenberg, 2001; Poole et
al., 2000; Love, 2001; Wholey, 2001). The underlying assumption appears
to be that agencies and nonprofits can and should be run the way businesses
are run, so therefore, it would be useful to adopt some of their practices.
The practices identified
in the literature can be divided into three broad categories: strategic
analysis/alignment and organizational effectiveness; impact evaluation;
and performance management. Methodological refinements and technological
innovation have facilitated the adoption of business practices by making
large-scale data collection and processing possible.
Strategic
Analysis and Organizational Effectiveness.
Articles in the strategic
analysis and organizational effectiveness category focus on improving
the alignment of nonprofits’ missions, goals, and strategies in order
to make them more effective. Such alignment is imperative for any evaluation
to be useful (Sawhill & Williamson, 2001). Several articles report
on applying common business tools and methods to nonprofits (Boardman
& Vining, 2000; Lindenberg, 2001; Rojas, 2000; Kaplan, 2001).
Impact
Evaluations
Impact evaluations
are somewhat different from Traditional Evaluation: whereas TE historically
measured outputs, impact evaluations examine the eventual results of those
outputs. In other words, impact evaluations include one more step in the
causality chain. For example, in the case of a political advertisement,
TE might consider the amount of money spent, the amount of people reached,
and perhaps an assessment of the quality of the ad. An impact evaluation,
however, would focus on polls to assess the extent to which voters have
been swayed by the ad.
While the articles
on strategy formulation and alignment largely report success, those dealing
with impact evaluations expose some of the problems associated with simply
running nonprofits as if they were for-profits. After all, nonprofits
do not have a “bottom line” the same way for-profits do. While Hoefer
(2000) links impact evaluations to the legal requirements of accountability,
such as those mandated by the GPRA, the problem with impact evaluation
is that the intended outcomes are often either complex, intangible, or
both. Thus, most of the articles present novel ways of assessing impact.
For example, the program outcomes Owen (1998) reports were not predetermined,
which goes against what goal-setting and strategy formulation would suggest.
Reed and Brown (2001) recognize the complexity of outcomes: impacts occur
at various levels (individual, family, agency, interagency system, and
community) that are systemically linked. Programs may have outcomes at
all five levels. Similarly, Mohr (1999a) argues against trying to deliver
a single, composite score when different kinds of impacts are involved.
Aggregating those measures, he argues, would be futile and misleading.
Instead, the proper way to evaluate is to use an impact profile, where
each impact is presented and analyzed on its own terms and merits.
Complexity does not
only apply to the outcomes, but also to the process of evaluating them.
According to Dunnagan, Duncan, and Paul (2000), the problem with one-shot
assessments is that they usually do not do justice to a program, regardless
of how comprehensive they are. This is especially true if there is a time
lag between the program’s intervention and the appearance of its results.
Instead, evaluation should be an on-going process so that the process
of evaluation itself can be evaluated and improved.
With regard to intangible
outcomes, Stame (1999) argues that they need to be specifically included
(as well as quantified, however difficult that may be) in order to evaluate
a program realistically. Moore (1995) points out that while for-profits
create private value (for shareholders), non-profits often create public
value. Quarter and Richmond (2001) argue that prevalent accounting practices
need to be adjusted in order to reflect the social value created by a
program.
Performance
Measurement
Performance measurement
is another business tool that is being adapted to nonprofits. Although
related to both strategy formulation and impact evaluation, it is mentioned
so often in the literature that it deserves separate mention. Performance
measurement refers to the systematic monitoring of certain key variables
(e.g. money spent, people served, raw materials used, etc.) often referred
to as indicators of program quality (IPQ). Any significant change in these
variables would allow adjustments to be made before too much damage is
done.
Renz (2001) argues
that measuring and managing performance is the key to moving from a focus
on activity to one on long-term, sustainable impact. Wholey (1997; 2001)
also sees tremendous value in performance measurement systems as they
can improve government management of programs, decision-making, and the
public’s confidence in government. Toffolon-Weiss, Bertrand, and Terrell
(1999) report success using a performance measurement framework in use
with USAID. Of course, performance measurement systems themselves are
not exempt from evaluation. Poole et al. (2000, 2001) introduce instruments
designed to assess performance measurement systems. Similarly, Youtie
et al. (1999) promote using evaluability assessments. All of these articles
report either success or promise of success for performance measurement
as a nonprofit management tool. Clearly, performance measurement is here
to stay (Poole et al., 2001; Newcomer, 1997).
Performance measurement
is not without critics, however. Campbell (2002) warns against too much
emphasis on performance management, saying that we should “never substitute
indicators for judgment” (p. 255). Perrin (1998) echoes this critique
when he warns against what he calls “goal displacement.” An example of
goal displacement might be when cost-effectiveness, one possible measure
of a program’s success, takes priority over the overarching, but less
measurable goal of, say, health education. Stake (2001) concurs: “we are
increasingly the promoters of impressionistic tallies, the façade
of technology” (p. 349).
Methodological
Innovations
Besides the business
tools that nonprofits are being introduced to, there are also methodological
innovations that aim to help evaluators determine impact. Aside from some
statistical innovations (Hess, 2000), the most significant methodological
refinement is an evaluation approach called theory-based evaluation (TBE).
Davidson (2000) and Weiss (1998) are two proponents of the method, which
uses what is called Program Logic Models. Essentially, these models are
graphical depictions of the essence of the program, much like a flow chart.
In the words of Rogers, Petrosino, Huebner, and Hacsi (2000), it “consists
of an explicit theory or model of how the program causes the intended
or observed outcomes” (p. 5). The program’s activities are listed and
their relationships to the desired end results are depicted by means of
arrows (Wandersman, Imm, Chinman, & Kaftarian, 2000). According to
Reynolds (1998) and McLaughlin and Jordan (1999), the main strength of
this approach lies in the evaluator’s ability to make causal inferences.
That way, the achieved results can be attributed to the program rather
than to other influences. Although Birckmayer and Weiss (2000) found that
in research papers describing TBE practice, the relationship between data
and theory is not always clear, they concur that the benefits outweigh
these drawbacks and the authors even suggest that TBE is applicable to
small, a-theoretical organizations. With its focus on causality and on
outcomes, TBE is clearly congruent with and an extension of TE.
Technological
Innovations
Lastly, technological
innovations are facilitating the TE’s transition from activity to impact.
Rossi (1997) and Watt (1999) point out that the tremendous changes in
computing capacity and data availability over the last ten years have
led to faster, more complex, and more valid analysis techniques (i.e.
modeling, meta-analysis, inference from multi-stage samples, etc.).
In sum, the net result
of the increased popularity of business practices, methodological innovations
like theory-based evaluation, and improved technology could be called
a “hardening” of the traditional evaluation. Still concerned with numbers,
objectivity, and rigor, TE has shifted its attention from activities (Sawhill
& Williamson, 2001) and indicators such as operating expense ratios
(Kaplan, 2001) to outcomes or impacts.
Developments in the Academic Realm:
The Drive Toward Democratization
Similar to the drive
toward accountability, the drive toward democratization is a collection
of separate, yet related forces. While the accountability drive seems
to come from government and business, the democratization drive appears
to originate in the academic world.
Although it is impossible
to identify a single event that triggered this drive, one seminal work
is worth mentioning. In 1962, a book was published (Kuhn, 1962) arguing
that scientific knowledge is not “discovered” (and therefore self-evident),
but “constructed” in a social context (and therefore not value-free or
objective). The knowledge “constructed” depended on the particular “paradigm”
within which the research was situated. (A paradigm is a set of preconceptions
through which the researcher habitually views the world.) This was the
first serious challenge to the supposed universality of truth from within
the natural sciences and led to a long-standing debate in scientific circles.
Lincoln and Guba (1985) brought the debate to the field of evaluation,
launching what has often been referred to as the “paradigm wars” (Caracelli,
2000) and challenging the privileged status of the traditional evaluation
over alternative approaches. Essentially, all disagreement boils down
to a philosophical argument: whether or not the world is ultimately knowable,
and whether or not there is such a thing as objectivity.
If, as constructivists
Lincoln and Guba (1985) argue, each “truth” is socially constructed, then
whose truth (i.e. assessment) matters most? That of the evaluator? Evaluators
can base the legitimacy of their findings on their research expertise
and their distance from the program (i.e., neutrality). However, program
staff members can claim legitimacy because of exactly the opposite – their
intimate familiarity with the program.
This critical stance
vis-à-vis the scientific establishment in general, and vis-à-vis
the legitimacy of traditional evaluations in particular, has spawned many
different debates and evaluation approaches. One debate centers on race
and ethnicity. The constructivists’ argument that the cultural context
of research is an important determinant of its outcomes has serious implications
for program evaluation. Since the vast majority of researchers have historically
been white males belonging to the middle class, it follows that mainstream
theories and methods are at least influenced by their value-orientations.
Stanfield (1999) argues that traditional evaluation draws legitimacy from
white male hegemony. In other words, since scientists are the evaluation
experts, and since scientists are predominantly white and male, white
males design and carry out all traditional evaluations – including those
of programs serving African-American populations. If Stanfield is right,
it is crucial to achieve compatibility between the researcher’s culture
and that of the program to be evaluated. Otherwise, evaluations risk being
irrelevant at best, and harmful at worst.
It should be noted
that traditional evaluators disagree with this position. According to
their research paradigm, values constitute error in the research process.
Therefore, they say, any empirical research that betrays the value orientation
of the researcher is inherently flawed (see Stufflebeam, 1994, for a detailed
explanation of this view). Again, it comes down to whether or not objectivity
can be achieved.
There is a closely
related debate in evaluation, also quite philosophical in nature and revolving
around the nature of language. Whereas the above discussion centered around
the neutrality of the researcher, this debate focuses on the neutrality
of language. The essential argument is that words are not necessarily
neutral vehicles for the delivery of a particular message (for example,
in an evaluation report). Instead, the choice of words can have a deep
impact on the meaning of the message (Patton, 2000). Hopson (2000) dedicated
an entire issue of the New Directions for Evaluations series to language
issues in evaluation. Its focus is “to consider and illuminate how language
shapes meanings of the social policies and programs we evaluate” (p. 2).
He challenges evaluators by arguing that “language needs to be used with
great care and attention to the subtleties and nuances of culture, context,
and setting” (p. 2). Ryan and DeStefano (2000) propose that “a critical
concern for evaluation theorists is to explore the meaning of dialogue
and the differences in the various meanings of dialogue” (p. 75). Zajano
and Lochtefeld (1999) point out that because legislators belong to an
oral culture, evaluation reports should, include compelling stories that
are representative of the main findings. After Lincoln and Guba (1985)
dispelled the myth of an evaluator’s neutrality and objectivity, it appears
that it is being increasingly recognized that language itself is not neutral,
further undermining the authority of the traditional evaluation.
Moves toward alternative
approaches to evaluation are not only driven by academic or philosophical
debates. It also flows from a major shortcoming of the traditional evaluation:
its lack of use. Too many evaluations are conducted by distanced, outside
experts who write a final report that ends up in a drawer (Torres &
Preskill, 2001; Fetterman, 2001; Patton, 1997). If an objective of an
evaluation is program improvement, then traditional evaluations tend to
come up short. Participation of the stakeholders tends to increase buy-in
and, according to some, increase the quality and the credibility of the
findings. The aims of evaluation appear to be key in this debate. Mertens
(1999) and House and Howe (2000) are very clear in their espoused aims
for evaluation: rather than coming to an objective assessment of some
underlying truth or merit or value, the ultimate goal of the evaluation
profession is one of equality, social justice, and inclusion.
Taken together, the
challenges to the legitimacy of traditional research methods, the recognition
that language in itself is not neutral, the acknowledgment that the aims
of evaluation may vary, and TE’s under-utilization all have significantly
undermined its authority. They can be identified as the driving forces
behind the second trend in evaluation practice, discussed below.
The
Impact of Democratization on Program Evaluation
The erosion of the
legitimacy of TE in evaluation practice and the calls for more transparency
and democracy in scientific research have resulted in an increased popularity
of more participative approaches in program evaluation (Mertens, 2001;
Thayer & Fine, 2000), alternatively called community-based (Cockerill,
Myers, & Allman, 2000), participatory (Quintanilla & Packard,
2002), collaborative (Brandon, 1998), inclusive (Ryan, 1998), or empowerment
(Fetterman, 2001) evaluations. For the purpose of this paper, participative
approaches refer to those evaluation strategies that depend on input and
cooperation from program stakeholders (specifically, program staff and
clients) in order to succeed. Ryan (1998) argues that such approaches
improve decision-making, are more credible, and consistent with evaluation’s
overall goal of being democratic and inclusive. Fine et al. (2000) found
that both the use of evaluation and the ongoing design of evaluation systems
increase in the nonprofit sector when stakeholders are involved in the
evaluation process.
Many of the articles
reviewed report their experiences with participative evaluation. Though
not all report unequivocal success (Schnoes, Murhpy-Berman, & Chambers,
2000), many of them do, collectively building evidence in favor of the
utility and credibility of participatory program evaluations (Fine et
al., 2000; Thayer & Fine, 2000; Ryan, 1998; Johnson, Willeke, &
Steiner, 1998; Unrau 2001).
Different participative
strategies call for different levels of stakeholder involvement and, by
extension, different roles for the evaluator. The three main categories
of participative approaches that are found in the literature are stakeholder-based
evaluation (SBE), empowerment evaluation (EE), and self-evaluation (SE).
The main differences between the three relate to the primary goal(s) of
the evaluation and the relationship between the evaluator and the stakeholders.
Stakeholder-Based
Evaluation
In stakeholder-based
evaluations, the (external) evaluator is the expert on evaluation methods.
She designs the process, collects the data, and writes up the report.
In contrast to TE, however, there is a recognition that the stakeholders
are the experts on their own program. In SBE, they have significant input
when it comes to the selection of the evaluation criteria and the interpretation
of the findings. The primary objective of SBE is to provide the stakeholders
with feedback for program improvement while not sacrificing any rigor,
validity, or objectivity in the process, so that the needs of the main
client (e.g. the funding agency) are met. Johnson et al. (1998) describe
their experiences with SBE. On the one hand, they found that involving
stakeholders indeed improved the evaluation’s credibility among stakeholders.
Other reported advantages are: a focus on goals rather than activities;
staff development; and improved respect for cultural diversity. On the
other hand, they found that it was very time-intensive and that it was
driven mostly by program staff, while involvement from the clients remained
limited. While Unrau (2001) reports that involving stakeholders in the
formulation of the Program Logic model may improve the evaluation, Quintanilla
& Packard (2002) found that involving stakeholders increased their
sense of ownership of the evaluation process, which in turn facilitated
its integration into the daily activities of the program.
Empowerment
Evaluation
Fetterman (2001) is
the most vocal proponent of the empowerment approach to evaluation (EE).
In the words of Torres and Preskill (2001), the goal is to “facilitate
learning and change” (p. 388) rather than merely evaluate after the fact.
The role of the evaluator, therefore, changes from content expert to facilitator.
EE puts the program stakeholders in the center of the process while the
evaluator assists and coaches them. Although this approach has received
a lot of press, empirical studies are limited. Schnoes et al. (2000) report
on their attempt to implement EE. They ran into problems, including disagreement
among participants and the amount of time required of everyone involved.
EE is not suitable for each evaluation context (nor is it intended to
be), and successful implementation requires foresight and a significant
amount of work in advance of the process.
Self-Assessment
One could argue that
self-assessment (SA) no longer qualifies as evaluation because there are
no guarantees that any kind of rigor or systematic approach is safeguarded.
It is included here because it is one intended outcome of empowerment
evaluation and because it can be very useful to the program’s staff and
other stakelholders. Empirical research is scarce, however. Paton, Foot,
and Payne (2000) worked with several non-profits that assessed their own
programs’ quality by self-administering existing quality assessment instruments.
The results were mixed: on the one hand, the instruments were not used
as intended by its authors, thereby undermining the validity of its outcomes.
On the other hand, they did serve to generate dialogue, which in itself
was considered very useful.
In sum, it seems fair
to say that while TE has “hardened” because of its shift in emphasis from
activities and outputs to outcomes and results, the competing approaches
have “softened” because of the evaluator’s gradual move from content expert
to methodological expert and, finally, coach and mentor.
Synthesis:
An Increased Evaluation Spectrum
The “hardening” of
TE and the concurrent “softening” of the participative approaches strongly
imply that the field of evaluation practice has diversified. This is in
line with other authors’ observations (e.g. Caracelli, 2000; Smith, 2001).
As a result, evaluators have a more diverse set of tools to tackle evaluations,
and the days of the one-type-fits-all approach to evaluation are past.
An examination of the
spectrum of available approaches shows that the role of the evaluator
as well as other variables change according to the evaluation approach,
as summarized in Table 1.
The vertical
line in Table 1 (between SBE and EE) represents the parting line in the
paradigm war, suggesting that the debate has not yet been settled. Smith
(2001) agrees, saying that the debate “is and was about differences in
philosophy and “world view” […] No sooner is it put to bed under one guise
than to raise its ugly head under another” (p. 292). Datta (2001), speaking
about the distance between the extremes (SA and TE) adds:
If anything, the
distance is greater, at least in terms of articulated positions, between
those who see evaluation as a quest for social justice which requires
advocacy for the disenfranchised and those who see evaluation as the
most nonpartisan, fair search we can mount for understanding what is
happening and why, and for reaching judgments on merit, worth, and value
(p. 405)
In conclusion, while
the argument originally revolved around incompatible philosophical positions
on knowability and objectivity, it now focuses on the espoused purpose
of program evaluation. Those who argue for social justice are the former
constructivists and those who still subscribe to the assessment of value
or worth generally fall into the objectivist camp.
Table 1: Comparison
of Different Evaluation Approaches
| |
TE
|
SBE |
|
EE |
SA |
| Stakeholders’ influence
|
None |
In design and reporting
only |
|
Throughout |
Throughout |
| Extent of evaluators’
control |
Complete |
Majority |
|
Shared with stakeholders
|
None |
| Image(s) of evaluators
|
Doctor; scientist;
professor |
Chief executive;
policy-maker |
|
Mentor; facilitator;
teacher; coach |
n/a |
| Purpose |
Summative only |
Mostly summative |
|
Mostly formative |
Formative only |
| Utilization rate
|
Very low |
Low |
|
High |
Very high |
| Basis for credibility
|
Evaluator expertise;
methodological rigor |
Evaluator expertise
and stakeholder involvement |
|
Utilization of
findings and evaluator endorsement |
Usefulness of findings |
The
Emerging Trend: Advent of Pragmatic Approaches
In spite of the continued
paradigm war, which tends to polarize the field between two alternatives
(objectivist or constructivist assumptions; quantitative or qualitative
methods; summative or formative purpose; etc.), the literature shows an
increase in popularity of pragmatic approaches (e.g., Lawrenz & Huffman,
in press; Bengston & Fan, 1999; Mohr, 1999b; Pratt et al., 2000).
These approaches essentially ignore the paradigm debate and show no hesitation
to mix approaches in ways that loyalists to either paradigm would never
do out of fear of compromising their findings. One might even speculate
that these pragmatic approaches are appearing because of the persistence
of the paradigm war – its abstract debates have not addressed the questions
and problems that evaluators in the “real world” wrestle with, and may
have led to the advent of “mixed-method approaches” (Rog & Fournier,
1997). For example, Johnson, McDaniel, and Willeke (2000) argue that assessments
of portfolios can satisfy psychometric demands of reliability. McConney,
Rudd, and Ayres (in press) suggest that when qualitative and quantitative
measures yield contradicting results, a synthesis can still be achieved
by assigning the measures a particular weight or importance. In a similar
vein, MacNeil (2000) introduces the reader to the possible utility of
including poetic representation in evaluation reports.
Possibly the best justification
for calling the advent of mixed-method approaches a trend is the work
by Henry, Julnes, and Mark (1997) and Mark, Henry and Julnes (2000). These
authors attempt to give the pragmatic approach more legitimacy by providing
a theoretical basis for it, called emergent realism. Datta (2001) concurs:
“as the ends draw apart, the widening middle ground is getting filled
with new approaches to unify us, such as realistic evaluation” (p. 405).
Although a treatise of realistic evaluation falls beyond the scope of
this paper, it is a noteworthy contribution worthy of further examination.
Thus far, there are no articles reporting on an application of this philosophy
to program evaluation. Time will tell whether or not emergent realism
will catch on in the field.
Implications
If this trend continues,
it may have profound implications for program evaluation as an emerging
field of practice. For one, philosophically oriented academicians who
subscribed to a particular position in the paradigm debates are essentially
being ignored by peers and practitioners who go their own way under the
“whatever works” motto. A skeptic might argue that the recent work on
realist evaluation by Mark, Henry and Julnes (2000) is an effort to salvage
academia’s credibility in leading the field through theory and research.
Nevertheless, attempts to find a sound philosophical basis for mixing
methods – if successful – might bring the field forward further as it
still is struggling to find answers to accountability requirements (Government
Accounting Office, 1998).
If we extrapolate this
trend, however, another possible implication becomes clear: that the purpose
of evaluation – subject of debate as argued above – will be determined
not by academic evaluators but by evaluation stakeholders, in particular
those who are funding the evaluation efforts. With no clear guidance or
agreement from academia on how to properly conduct evaluations, decision-makers
are most likely to approach those evaluators whose views of evaluation
most fit their needs. For example, a program director looking for program
improvement may look to an empowerment evaluator, while a funder may look
to a traditional evaluator. Although not necessarily a detriment to the
field, it does point to the possibility that program evaluation may become
a service industry, much like the hospitality or entertainment industries,
for example, rather than one oriented toward applied social science research.
As long as the funder and other stakeholders are satisfied with the evaluation
methods, and of course, outcomes, who cares if it does not adhere to stringent
scientific principles?
The danger of this
eroded credibility of academic evaluators is of course that evaluations
may evolve into even more politically charged events than they already
are, and the evaluator becomes just one of the stakeholders. For example,
program staff may not agree with the choice of evaluator that the program
funder has made. Evaluators’ voting records may become a point of concern.
The skeptic here will observe that the decreased credibility of academically
oriented evaluators will strike at the heart of program evaluation practice
because it allows political agendas and other motives to explicitly drive
the choice of evaluator and evaluation method, perhaps even the outcomes.
Time will tell which of the skeptics are right. Either way, scholar-practitioners
have their work cut out for them.
References
Bengston, D. N., &
Fan, D. P. (1999). An innovative method for evaluating strategic goals
in a public agency: Conservation leadership. Evaluation Review, 23(1),
77-10.
Birckmayer, J. D., & Weiss, C. H. (2000). Theory-based evaluation
in practice: What do we learn? Evaluation Review, 24(4), 407-431.
Boardman, A. E., & Vining, A. R. (2000). Using service-customer Matrices
in strategic analysis of nonprofits. Nonprofit Management and Leadership,
10(4), 397-420.
Bozzo, S. L. (2000). Evaluation resources for nonprofit organizations.
Nonprofit Management and Leadership, 10(4), 463-472.
Brandon, P. R. (1998). Stakeholder participation for the purpose of helping
ensure evaluation validity: Bridging the gap between collaborative and
non-collaborative evaluation. American Journal of Evaluation, 19(3), 325-337.
Campbell, D. (2002). Outcomes assessment and the paradox of nonprofit
accountability. Nonprofit Management and Leadership, 12(3), 243-259.
Caracelli, V. J. (2000). Evaluation use at the threshold of the twenty-first
century. In V. J. Caracelli & H. Preskill (Eds.), The expanding scope
of evaluation use (pp. 99-111). New Directions for Evaluation, no. 88.
San Francisco: Jossey-Bass.
Cockerill, R., Myers, T., & Allman, D. (2000). Planning for community-based
evaluation. . American Journal of Evaluation, 21(3), 351-357.
Datta, L. E. (2001).
Coming attractions. American Journal of Evaluation, 22(3), 403-408.
Davidson, E. J. (2000). Ascertaining causality in theory-based evaluation.
In P. J. Rogers, T. A. Hacsi, A. Petrosino, & T. A. Huebner (Eds.),
Program theory in evaluation: Challenges and opportunities (pp. 5-13).
New Directions for Evaluation, no. 87. San Francisco: Jossey-Bass.
Dunnagan, T., Duncan, S. F., & Paul, L. (2000). Doing effective evaluations:
A case study of family empowerment due to welfare reform. Evaluation and
Program Planning, 23, 125-136.
Fetterman, D. M. (2001). Foundations of empowerment evaluation. Thousand
Oaks, CA: Sage Publications, Inc.
Fine, A. H., Thayer, C. E., & Coghlan, A. T. (2000). Program evaluation
practice in the nonprofit sector. Nonprofit Management and Leadership,
10(3), 331-339.
Government Accounting Office. (1998). Program evaluation: Agencies challenged
by new demand for information on program results (No. GAO/GGD-98-53).
Washington, DC: GAO.
Greene, J. C. (2001). Evaluation extrapolations. American Journal of Evaluation,
22(3), 397-402.
Henry, G. T. (2001). How modern democracies are shaping evaluation and
the emerging challenges for evaluation. American Journal of Evaluation,
22(3), 419-429.
Henry, G. T., Julnes, G., & Mark, M. M. (Eds.). (1997). Realist evaluation:
An emerging theory in support of practice. New Directions for Evaluation,
no. 78. San Francisco: Jossey-Bass
.
Hess, B. (2000). Assessing program impact using latent growth modeling:
A primer for the evaluator. Evaluation and Program Planning, 23, 419-428.
Hoefer, R. (2000). Accountability in action? Program evaluation in nonprofit
human service agencies. Nonprofit Management and Leadership, 11(2), 167-177.
Hopson, R. K. (Ed.). (2000). How and why language matters in evaluation.
New Directions for Evaluation, no. 86. San Francisco: Jossey-Bass.
House, E. R., & Howe, K. R. (2000). Deliberative democratic evaluation.
In K. E. Ryan & L. DeStefano (Eds.), Evaluation as a democratic process:
Promoting inclusion, dialogue, and deliberation (pp. 3-12). New Directions
for Evaluation, no. 85. San Francisco: Jossey-Bass.
Johnson, R. L., McDaniel, F., & Willeke, M. J. (2000). Using portfolio’s
in program evaluation: an investigation of interrater reliability. American
Journal of Evaluation, 21(1), 65-80.
Johnson, R. L., Willeke, M. J., & Steiner, D. J. (1998). Stakeholder
collaboration in the design and implementation of a family literacy portfolio
assessment. American Journal of Evaluation, 19(3), 339-353.
Kaplan, R. S. (2001). Strategic performance measurement and management
in nonprofit organizations. Nonprofit Management and Leadership, 11(3),
353-370.
Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago:
University of Chicago Press.
Lawrenz, F., & Huffman, D. (in press). The archipelago approach to
mixed method evaluation. American Journal for Evaluation.
Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Thousand
Oaks, CA: Sage.
Lindenberg, M. (2001). Are we at the cutting edge or the blunt edge? Improving
NGO organizational performance with private and public sector strategic
management frameworks. Nonprofit Management and Leadership, 11(3), 247-270.
Love, A. J. (2001). The future of evaluation: Catching rocks with cauldrons.
American Journal of Evaluation, 22(3), 437-444.
MacNeil, C. (2000). The prose and cons of poetic representation in evaluation
reporting. American Journal of Evaluation, 21(3), 359-367.
Mark, M. M. (2001). Evaluation’s future: Furor, futile, or fertile? American
Journal of Evaluation, 22(3), 457-479.
Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation: An integrated
framework for understanding, guiding, and improving public and nonprofit
policies and programs. San Francisco: Jossey-Bass.
McConney, A., Rudd, A., & Ayres, R. (in press). Getting to the bottom
line: A method for synthesizing findings within mixed-method program evaluations.
American Journal of Evaluation.
McLaughlin. J. A. & Jordan, G. B. (1999). Logic models: A tool for
telling tour program’s performance story. Evaluation and Program Planning,
22, 65-72.
Mertens, D. M. (2001). Inclusitvity and transformation: Evaluation in
2010. American Journal of Evaluation, 22(3), 367-374.
Mertens, D. M. (1999). Inclusive evaluation: Implications of transformative
theory for evaluation. American Journal of Evaluation, 20(1), 1-14.
Mohr, L. B. (1999a). The impact profile approach to policy merit: The
case of research grants and the university. Evaluation Review, 23(2),
212-249.
Mohr, L. B. (1999b). The qualitative method of impact analysis. American
Journal of Evaluation, 20(1), 69-84.
Moore, M. H. (1995). Creating public value; Strategic management in government.
Cambridge, MA: Harvard University Press.
Newcomer, K. E. (1997). Using performance measurement to improve programs.
In K. E. Newcomer (Ed.), Using performance measurement to improve public
and nonprofit progams (pp. 5-14). New Directions for Evaluation, no. 75.
San Francisco: Jossey-Bass.
Owen, J. M. (1998). Towards an outcome hierarchy for professional university
programs. Evaluation and Program Planning, 21, 315-321.
Paton, R., Foot, J., & Payne, G. (2000). What happens when nonprofits
use quality models for self-assessment? Nonprofit Management and Leadership,
11(1), 21-34.
Patton, M. Q. (1997). Utilization-focused evaluation: The new century
text (3rd ed.). Thousand Oaks, CA: Sage Publications.
Patton, M. Q. (2000). Overview: Language matters. In R. K. Hopson (Ed.),
How and why language matters in evaluation (pp. 5-16). New Directions
for Evaluation, no. 86. San Francisco: Jossey-Bass.
Perrin, B. (1998). Effective use and misuse of performance management.
American Journal of Evaluation, 19(3), 367-379.
Poole, D. L., Davis, J. K., Reisman, J., & Nelson, J. E. (2001). Improving
the quality of outcome evaluation plans. Nonprofit Management and Leadership,
11(4), 405-421.
Poole, D. L., Nelson, J., Carnahan, S., Chepenik, N. G., & Tubiak,
C. (2000). Evaluating performance measurement systems in nonprofit agencies:
The program accountability quality scale (PAQS). American Journal of Evaluation,
21(1), 15-26.
Pratt, C. C., McGuigan, W. M., & Katzev, A. R. (2000). Measuring program
outcomes: Retrospective pretest methodology. American Journal of Evaluation,
21(3), 341-349.
Quarter, J. & Richmond, B. J. (2001). Accounting for social value
in nonprofits and for-profits. Nonprofit Management and Leadership, 12(1),
75-85.
Quintanilla, G., & Packard, T. (2002). A participatory evaluation
of an inner-city science enrichment prgram. Evaluation and Program Planning,
25, 15-22.
Reed, C. S., & Brown, R. E. (2001). Outcome-asset impact model: linking
outcomes and assets. Evaluation and Program Planning, 24, 287-295.
Renz, D. O. (2001). Changing the face of nonprofit management. Nonprofit
Management and Leadership, 11(3), 387-396.
Reynolds, A. J. (1998). Confirmatory program evaluation: A method for
strengthening causal inference. American Journal of Evaluation, 19(2),
203-221.
Rog, D. J. & Fournier, D. (1997). Editors’ notes. In D. J. Rog &
D. Fournier (Eds.), Progress and future directions in evaluation: Perspectives
on theory, practice, and methods (pp. 1-3). New Directions for Evaluation,
no. 76. San Francisco: Jossey-Bass.
Rogers, P. J., Petrosino, A., Huebner, T. A., & Hacsi, T. A. (2000).
Program theory evaluation: Practice, promise, and problems. In P. J. Rogers,
T. A. Hacsi, A. Petrosino, & T. A. Huebner (Eds.), Program theory
in evaluation: Challenges and opportunities (pp. 5-13). New Directions
for Evaluation, no. 87. San Francisco: Jossey-Bass.
Rojas, R. R. (2000). A review of models for measuring organizational effectiveness
among for-profit and nonprofit organizations. Nonprofit Management and
Leadership, 11(1), 97-104.
Rossi, P. H. (1997). Advances in quantitative evaluation, 1987-1996. In
D. J. Rog & D. Fournier (Eds.), Progress and future directions in
evaluation: Perspectives on theory, practice, and methods (pp. 57-68).
New Directions for Evaluation, no. 76. San Francisco: Jossey-Bass.
Ryan, K. (1998). Advantages and challenges of using inclusive evaluation
approaches in evaluation practice. American Journal of Evaluation, 19(1),
101-122.
Ryan, K. E., & DeStefano, L. (2000). Disentangling dialogue: Issues
from practice. In K. E. Ryan & L. DeStefano (Eds.), Evaluation as
a democratic process: Promoting inclusion, dialogue, and deliberation
(pp. 63-76). New Directions for Evaluation, no. 85. San Francisco: Jossey-Bass.
Sawhill, J. C., & Williamson, D. (2001). Mission impossible? Measuring
success in nonprofit organizations. Nonprofit Management and Leadership,
11(3), 371-386.
Schnoes, C. J., Murphy-Berman, V., & Chambers, J. (2000). Empowerment
evaluation applied: Experiences, analysis, and recommendations from a
case study. American Journal of Evaluation, 21(1), 53-64.
Smith, M. F. (2001). Evaluation: Preview of the future #2. American Journal
of Evaluation, 22(3), 281-300.
Stake, R. E. (1973, October). Program evaluation, particularly responsive
evaluation. Keynote address at the conference “New trends in evaluation,”
Institute of Education, University of Goteborg, Sweden. In G. F. Madaus,
M. S. Scriven, & D. L. Stufflebeam (Eds.), Evaluation models: Viewpoints
on educational and human services evaluation. Boston: Kluwer-Nijhoff,
1987.
Stake, R. E. (2001). A problematic heading. American Journal of Evaluation,
22(3), 349-354.
Stame, N. (1999). Small and medium enterprise aid programs: Intangible
effects and evaluation practice. Evaluation and Program Planning, 22,
105-111.
Stanfield, H. H. (1999). Slipping through the front door: Relevant social
science evaluation in the people of color century. American Journal of
Evaluation, 20(3), 415-431.
Stufflebeam, D.L. (1994). Empowerment evaluation, objectivist evaluation,
and evaluation standards: Where the future of evaluation should not go
and where it needs to go. Evaluation practice, 15(3), 321-338.
Stufflebeam, D. L. (2001). Evaluation models. New Directions for Evaluation,
No. 89. San Francisco: Jossey-Bass.
Tassey, G. (1999). Lessons learned about the methodology of economic impact
studies: The NIST experience. Evaluation and Program Planning, 22, 113-119.
Thayer, C. E., & Fine, A. H. (2000). Evaluation and outcome measurement
in the non-profit sector: Stakeholder participation. Evaluation and Program
Planning, 23, 103-108.
Toffolon-Weiss, M. M., Bertrand, J. J., & Terrell, S. S. (1999). The
results framework – an innovative tool for program planning and evaluation.
Evaluation Review, 23(3), 336-359.
Torres, R. T., & Preskill, H. (2001). Evaluation and organizational
learning: Past, present, and future. American Journal of Evaluation, 22(3),
387-395.
Unrau, Y. A. (2001). Using client interviews to illuminate outcomes in
program logic models: A case example. Evaluation and Program Planning,
24, 353-361.
Wandersman, A., Imm, P., Chinman, M., & Kaftarian, S. (2000). Getting
to outcomes: A results-based approach to accountability. Evaluation and
Program Planning, 23, 389-395.
Watt, J. H. (1999). Internet systems for evaluation research. In G. Gay
& T. L. Bennington (Eds.), Information technologies in evaluation:
Social, moral, epistemological, and practical implications (pp. 23-43).
New Directions for Evaluation, no. 84. San Francisco: Jossey-Bass.
Weiss, C. H. (1998). Evaluation: Methods for studying programs and policies
(2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Wholey, J. S. (1997). Clarifying goals, reporting results. In D. J. Rog
& D. Fournier (Eds.), Progress and future directions in evaluation:
Perspectives on theory, practice, and methods (pp. 95-105). New Directions
for Evaluation, no. 76. San Francisco: Jossey-Bass.
Wholey, J. S. (2001). Managing for results: Roles for evaluators in a
new management era. American Journal of Evaluation, 22(3), 343-347.
Youtie, J., Bozeman, B., & Shapira, P. (1999). Using an evaluability
assessment to select methods for evaluating state technology development
programs: The case of the Georgia Research Alliance. Evaluation and Program
Planning, 22, 55-64.
Zajano, N. C., & Lochtefeld, S. S. (1999). The nature of knowledge
and language in the legislative arena. In R. K. Jonas (Ed.), Legislative
program evaluation: Utilization-driven research for decision makers (pp.
85-94). New Directions for Evaluation, no. 81. San Francisco: Jossey-Bass.
|