Trends in Program Evaluation Literature:
The Emergence of Pragmatism,
TCALL Occasional Research Paper No. 5

by Roemer M.S. Visser (2003)
Texas Center for Adult Literacy & Learning

A review of recent program evaluation literature reveals two forces that have influenced program evaluation practice in the 1990s. The first is the drive toward accountability, leading to the imposition of business practices and accountability standards on government agencies and nonprofits. The second is the democratization movement, driven by philosophically charged debates within the social sciences (including program evaluation), and characterized by attempts to make evaluation more inclusive, transparent, and democratic. These trends are largely incompatible, and I argue that the persistence of philosophical debates has spawned an emerging trend: the increased popularity of pragmatic approaches that mix methods according to the characteristics of the particular program to be evaluated.

Program evaluation is rooted in two realms. It has one foot in the academic realm of the social sciences with its philosophical and methodological debates. The other foot is securely planted in the realm of program evaluation practice with its shifting political winds. Changes in both of these realms have influenced program evaluation practice in the 1990s. The purpose of this paper is to discuss these changes and their effects as they emerge from a review of the literature. First, a brief historical overview is presented, identifying the genesis of mainstream evaluation practice – the traditional evaluation – and an early alternative approach, responsive evaluation. Next, two major forces are presented, one from the realm of practice, and one from the academic realm, along with the influences they have had on program evaluation. The trends are then summarized in a Table. First, I argue that these two main forces are disparate and essentially reflect incompatible philosophical underpinnings. The second argument is a new trend is emerging as a result of this unresolved debate: the advent of mixed-method designs, based on a pragmatic approach to program evaluation.

Early Program Evaluation:
Traditional Versus Responsive Evaluation

Stufflebeam (2001) provides a succinct historical overview of the field of evaluation. After a period of relative inactivity in the 1950s, several events and developments sparked an increased interest in evaluation in the 1960s (Henry, 2001). The Civil Rights Act of 1964 mandated equitable treatment for minorities and the disabled. Great Society programs (Greene, 2001) such as the War on Poverty (Weiss, 1998) were initiated and needed to be evaluated. The 1960s were also a very successful period for the natural sciences. Achievements such as putting a man on the moon helped create an almost unshakable faith in the natural sciences, and led social scientists to adopt these methods to tackle society’s ills. Patton (1997) refers to this as “a new order of rationality in government – a rationality undergirded by social scientists” (p.7). With the application of scientific methods to program evaluations, traditional evaluation (TE) was born.

Traditional evaluation is characterized by its emphasis on scientific methods. Reliability and validity of the collected data are key, while the main criterion for a quality evaluation is methodological rigor. TE requires the evaluator to be objective and neutral and to be outcome-focused (Fine, Thayer, & Coghlan, 2000; Torres & Preskill, 2001). This leads to a preoccupation with experimental methods, numbers (as opposed to words), statistical tools, and an emphasis on summative evaluations (aimed to determine whether or not to continue a particular program) rather than formative ones (aimed at program improvement).

Although TE is still widely used today, it is not the only available approach to program evaluation. Competing approaches have since been developed, mostly in response to one of TE’s most serious drawbacks – the fact that many TE reports are not used or even read (Torres & Preskill, 2001; Fetterman, 2001; Patton, 1997). One of the earliest alternatives to TE is what is known as Responsive Evaluation (Stake, 1973).

Briefly, responsive evaluation is an approach to evaluation that is less objective and more tailored to the needs of those running the program. In Stake’s own words, responsive evaluation “sacrifices some precision in measurement, hopefully to increase the usefulness of the findings to persons in and around the program” (Stake, 1973). It calls attention to the complexity and the uncertainty of the program, the difficulty in measuring outcomes, and the importance of descriptive and judgmental data. Rather than oversimplifying through numbers, Stake argues for storytelling as a means of conveying the “holistic impression, the mood, even the mystery of the experience” (Stake, 1973). In essence, the debate hinges on legitimacy: whereas TE draws legitimacy from scientific rigor, responsive evaluation draws legitimacy from endorsements by a majority of important stakeholders.

Although Stake took pains to suggest that responsive evaluation should supplement traditional evaluation, rather than replacing it, it is easy to see the conflicting orientations of the two approaches. Thus, the seeds were sown for the debates discussed in subsequent sections of this paper. This early offshoot of TE would be a precursor to what has since been referred to as the “paradigm wars” (Caracelli, 2000.)

This very brief depiction of the historical context of evaluation practice is intended to provide a backdrop against which recent developments can be assessed. In short, the 1970s were characterized by a predominantly social-scientific approach to program evaluation. Other approaches were not generally accepted as valid or scientific, so the variety of methods at the evaluator’s disposal was limited.

Of course, much has happened since the 1970s; as the subsequent sections aim to show, the 1980s and 1990s were characterized by a host of developments both in the political realm and in the academic realm.

Developments in the Political Realm: Increased Accountability

Many of the articles reviewed list examples of recent events that have somehow impacted evaluation practice. First is an increasing imbalance in supply and demand of funding. While government funding is declining (Boardman & Vining, 2000), there is a proliferation of agencies competing for funds (Kaplan, 2001; Rojas, 2000; Lindenberg, 2001), leading to increased funder demands and restrictions (Poole, Davis, Reisman, & Nelson, 2001; Pratt, McGuigan, & Katzev, 2000), partially fed by publicized, high profile mismanagement cases (Rojas, 2000; Hoefer, 2000).

Second, the information revolution (e-government, data access, & real-time evaluation) (Mark, 2001; Love, 2001; Datta, 2001) and other improvements in technology have combined with an increased public demand for evaluation information and resulting media interest (Henry, 2001).

The third, and most important development, however, is the Government Performance Recording Act (GPRA) of 1993 (Poole, Nelson, Carnahan, Chepenik, & Tubiak, 2000; Pratt et al., 2000; Love, 2001; Datta, 2001; Tassey, 1999; Youtie, Bozeman, & Shapira, 1999; Toffolon-Weiss, Bertrand, & Terrell, 1999), which links government agencies’ performance results to future funding.

Taken together, these influences suggest a political landscape of increased scrutiny (and increased technological ability to scrutinize), increased competition for decreased levels of funding, and as a result, increased demand to demonstrate results.

The Impact of Increased Accountability on Program Evaluation

As a consequence of this increased emphasis on accountability, nonprofits and government agencies are facing pressure to demonstrate results, be held accountable, show high performance, and to behave like business generally (Fine et al., 2000; Bozzo, 2000; Hoefer, 2000; Renz, 2001; Lindenberg, 2001; Poole et al., 2000; Love, 2001; Wholey, 2001). The underlying assumption appears to be that agencies and nonprofits can and should be run the way businesses are run, so therefore, it would be useful to adopt some of their practices.

The practices identified in the literature can be divided into three broad categories: strategic analysis/alignment and organizational effectiveness; impact evaluation; and performance management. Methodological refinements and technological innovation have facilitated the adoption of business practices by making large-scale data collection and processing possible.

Strategic Analysis and Organizational Effectiveness.

Articles in the strategic analysis and organizational effectiveness category focus on improving the alignment of nonprofits’ missions, goals, and strategies in order to make them more effective. Such alignment is imperative for any evaluation to be useful (Sawhill & Williamson, 2001). Several articles report on applying common business tools and methods to nonprofits (Boardman & Vining, 2000; Lindenberg, 2001; Rojas, 2000; Kaplan, 2001).

Impact Evaluations

Impact evaluations are somewhat different from Traditional Evaluation: whereas TE historically measured outputs, impact evaluations examine the eventual results of those outputs. In other words, impact evaluations include one more step in the causality chain. For example, in the case of a political advertisement, TE might consider the amount of money spent, the amount of people reached, and perhaps an assessment of the quality of the ad. An impact evaluation, however, would focus on polls to assess the extent to which voters have been swayed by the ad.

While the articles on strategy formulation and alignment largely report success, those dealing with impact evaluations expose some of the problems associated with simply running nonprofits as if they were for-profits. After all, nonprofits do not have a “bottom line” the same way for-profits do. While Hoefer (2000) links impact evaluations to the legal requirements of accountability, such as those mandated by the GPRA, the problem with impact evaluation is that the intended outcomes are often either complex, intangible, or both. Thus, most of the articles present novel ways of assessing impact. For example, the program outcomes Owen (1998) reports were not predetermined, which goes against what goal-setting and strategy formulation would suggest. Reed and Brown (2001) recognize the complexity of outcomes: impacts occur at various levels (individual, family, agency, interagency system, and community) that are systemically linked. Programs may have outcomes at all five levels. Similarly, Mohr (1999a) argues against trying to deliver a single, composite score when different kinds of impacts are involved. Aggregating those measures, he argues, would be futile and misleading. Instead, the proper way to evaluate is to use an impact profile, where each impact is presented and analyzed on its own terms and merits.

Complexity does not only apply to the outcomes, but also to the process of evaluating them. According to Dunnagan, Duncan, and Paul (2000), the problem with one-shot assessments is that they usually do not do justice to a program, regardless of how comprehensive they are. This is especially true if there is a time lag between the program’s intervention and the appearance of its results. Instead, evaluation should be an on-going process so that the process of evaluation itself can be evaluated and improved.

With regard to intangible outcomes, Stame (1999) argues that they need to be specifically included (as well as quantified, however difficult that may be) in order to evaluate a program realistically. Moore (1995) points out that while for-profits create private value (for shareholders), non-profits often create public value. Quarter and Richmond (2001) argue that prevalent accounting practices need to be adjusted in order to reflect the social value created by a program.

Performance Measurement

Performance measurement is another business tool that is being adapted to nonprofits. Although related to both strategy formulation and impact evaluation, it is mentioned so often in the literature that it deserves separate mention. Performance measurement refers to the systematic monitoring of certain key variables (e.g. money spent, people served, raw materials used, etc.) often referred to as indicators of program quality (IPQ). Any significant change in these variables would allow adjustments to be made before too much damage is done.

Renz (2001) argues that measuring and managing performance is the key to moving from a focus on activity to one on long-term, sustainable impact. Wholey (1997; 2001) also sees tremendous value in performance measurement systems as they can improve government management of programs, decision-making, and the public’s confidence in government. Toffolon-Weiss, Bertrand, and Terrell (1999) report success using a performance measurement framework in use with USAID. Of course, performance measurement systems themselves are not exempt from evaluation. Poole et al. (2000, 2001) introduce instruments designed to assess performance measurement systems. Similarly, Youtie et al. (1999) promote using evaluability assessments. All of these articles report either success or promise of success for performance measurement as a nonprofit management tool. Clearly, performance measurement is here to stay (Poole et al., 2001; Newcomer, 1997).

Performance measurement is not without critics, however. Campbell (2002) warns against too much emphasis on performance management, saying that we should “never substitute indicators for judgment” (p. 255). Perrin (1998) echoes this critique when he warns against what he calls “goal displacement.” An example of goal displacement might be when cost-effectiveness, one possible measure of a program’s success, takes priority over the overarching, but less measurable goal of, say, health education. Stake (2001) concurs: “we are increasingly the promoters of impressionistic tallies, the façade of technology” (p. 349).

Methodological Innovations

Besides the business tools that nonprofits are being introduced to, there are also methodological innovations that aim to help evaluators determine impact. Aside from some statistical innovations (Hess, 2000), the most significant methodological refinement is an evaluation approach called theory-based evaluation (TBE). Davidson (2000) and Weiss (1998) are two proponents of the method, which uses what is called Program Logic Models. Essentially, these models are graphical depictions of the essence of the program, much like a flow chart. In the words of Rogers, Petrosino, Huebner, and Hacsi (2000), it “consists of an explicit theory or model of how the program causes the intended or observed outcomes” (p. 5). The program’s activities are listed and their relationships to the desired end results are depicted by means of arrows (Wandersman, Imm, Chinman, & Kaftarian, 2000). According to Reynolds (1998) and McLaughlin and Jordan (1999), the main strength of this approach lies in the evaluator’s ability to make causal inferences. That way, the achieved results can be attributed to the program rather than to other influences. Although Birckmayer and Weiss (2000) found that in research papers describing TBE practice, the relationship between data and theory is not always clear, they concur that the benefits outweigh these drawbacks and the authors even suggest that TBE is applicable to small, a-theoretical organizations. With its focus on causality and on outcomes, TBE is clearly congruent with and an extension of TE.

Technological Innovations

Lastly, technological innovations are facilitating the TE’s transition from activity to impact. Rossi (1997) and Watt (1999) point out that the tremendous changes in computing capacity and data availability over the last ten years have led to faster, more complex, and more valid analysis techniques (i.e. modeling, meta-analysis, inference from multi-stage samples, etc.).

In sum, the net result of the increased popularity of business practices, methodological innovations like theory-based evaluation, and improved technology could be called a “hardening” of the traditional evaluation. Still concerned with numbers, objectivity, and rigor, TE has shifted its attention from activities (Sawhill & Williamson, 2001) and indicators such as operating expense ratios (Kaplan, 2001) to outcomes or impacts.

Developments in the Academic Realm:
The Drive Toward Democratization

Similar to the drive toward accountability, the drive toward democratization is a collection of separate, yet related forces. While the accountability drive seems to come from government and business, the democratization drive appears to originate in the academic world.

Although it is impossible to identify a single event that triggered this drive, one seminal work is worth mentioning. In 1962, a book was published (Kuhn, 1962) arguing that scientific knowledge is not “discovered” (and therefore self-evident), but “constructed” in a social context (and therefore not value-free or objective). The knowledge “constructed” depended on the particular “paradigm” within which the research was situated. (A paradigm is a set of preconceptions through which the researcher habitually views the world.) This was the first serious challenge to the supposed universality of truth from within the natural sciences and led to a long-standing debate in scientific circles. Lincoln and Guba (1985) brought the debate to the field of evaluation, launching what has often been referred to as the “paradigm wars” (Caracelli, 2000) and challenging the privileged status of the traditional evaluation over alternative approaches. Essentially, all disagreement boils down to a philosophical argument: whether or not the world is ultimately knowable, and whether or not there is such a thing as objectivity.

If, as constructivists Lincoln and Guba (1985) argue, each “truth” is socially constructed, then whose truth (i.e. assessment) matters most? That of the evaluator? Evaluators can base the legitimacy of their findings on their research expertise and their distance from the program (i.e., neutrality). However, program staff members can claim legitimacy because of exactly the opposite – their intimate familiarity with the program.

This critical stance vis-à-vis the scientific establishment in general, and vis-à-vis the legitimacy of traditional evaluations in particular, has spawned many different debates and evaluation approaches. One debate centers on race and ethnicity. The constructivists’ argument that the cultural context of research is an important determinant of its outcomes has serious implications for program evaluation. Since the vast majority of researchers have historically been white males belonging to the middle class, it follows that mainstream theories and methods are at least influenced by their value-orientations. Stanfield (1999) argues that traditional evaluation draws legitimacy from white male hegemony. In other words, since scientists are the evaluation experts, and since scientists are predominantly white and male, white males design and carry out all traditional evaluations – including those of programs serving African-American populations. If Stanfield is right, it is crucial to achieve compatibility between the researcher’s culture and that of the program to be evaluated. Otherwise, evaluations risk being irrelevant at best, and harmful at worst.

It should be noted that traditional evaluators disagree with this position. According to their research paradigm, values constitute error in the research process. Therefore, they say, any empirical research that betrays the value orientation of the researcher is inherently flawed (see Stufflebeam, 1994, for a detailed explanation of this view). Again, it comes down to whether or not objectivity can be achieved.

There is a closely related debate in evaluation, also quite philosophical in nature and revolving around the nature of language. Whereas the above discussion centered around the neutrality of the researcher, this debate focuses on the neutrality of language. The essential argument is that words are not necessarily neutral vehicles for the delivery of a particular message (for example, in an evaluation report). Instead, the choice of words can have a deep impact on the meaning of the message (Patton, 2000). Hopson (2000) dedicated an entire issue of the New Directions for Evaluations series to language issues in evaluation. Its focus is “to consider and illuminate how language shapes meanings of the social policies and programs we evaluate” (p. 2). He challenges evaluators by arguing that “language needs to be used with great care and attention to the subtleties and nuances of culture, context, and setting” (p. 2). Ryan and DeStefano (2000) propose that “a critical concern for evaluation theorists is to explore the meaning of dialogue and the differences in the various meanings of dialogue” (p. 75). Zajano and Lochtefeld (1999) point out that because legislators belong to an oral culture, evaluation reports should, include compelling stories that are representative of the main findings. After Lincoln and Guba (1985) dispelled the myth of an evaluator’s neutrality and objectivity, it appears that it is being increasingly recognized that language itself is not neutral, further undermining the authority of the traditional evaluation.

Moves toward alternative approaches to evaluation are not only driven by academic or philosophical debates. It also flows from a major shortcoming of the traditional evaluation: its lack of use. Too many evaluations are conducted by distanced, outside experts who write a final report that ends up in a drawer (Torres & Preskill, 2001; Fetterman, 2001; Patton, 1997). If an objective of an evaluation is program improvement, then traditional evaluations tend to come up short. Participation of the stakeholders tends to increase buy-in and, according to some, increase the quality and the credibility of the findings. The aims of evaluation appear to be key in this debate. Mertens (1999) and House and Howe (2000) are very clear in their espoused aims for evaluation: rather than coming to an objective assessment of some underlying truth or merit or value, the ultimate goal of the evaluation profession is one of equality, social justice, and inclusion.

Taken together, the challenges to the legitimacy of traditional research methods, the recognition that language in itself is not neutral, the acknowledgment that the aims of evaluation may vary, and TE’s under-utilization all have significantly undermined its authority. They can be identified as the driving forces behind the second trend in evaluation practice, discussed below.

The Impact of Democratization on Program Evaluation

The erosion of the legitimacy of TE in evaluation practice and the calls for more transparency and democracy in scientific research have resulted in an increased popularity of more participative approaches in program evaluation (Mertens, 2001; Thayer & Fine, 2000), alternatively called community-based (Cockerill, Myers, & Allman, 2000), participatory (Quintanilla & Packard, 2002), collaborative (Brandon, 1998), inclusive (Ryan, 1998), or empowerment (Fetterman, 2001) evaluations. For the purpose of this paper, participative approaches refer to those evaluation strategies that depend on input and cooperation from program stakeholders (specifically, program staff and clients) in order to succeed. Ryan (1998) argues that such approaches improve decision-making, are more credible, and consistent with evaluation’s overall goal of being democratic and inclusive. Fine et al. (2000) found that both the use of evaluation and the ongoing design of evaluation systems increase in the nonprofit sector when stakeholders are involved in the evaluation process.

Many of the articles reviewed report their experiences with participative evaluation. Though not all report unequivocal success (Schnoes, Murhpy-Berman, & Chambers, 2000), many of them do, collectively building evidence in favor of the utility and credibility of participatory program evaluations (Fine et al., 2000; Thayer & Fine, 2000; Ryan, 1998; Johnson, Willeke, & Steiner, 1998; Unrau 2001).

Different participative strategies call for different levels of stakeholder involvement and, by extension, different roles for the evaluator. The three main categories of participative approaches that are found in the literature are stakeholder-based evaluation (SBE), empowerment evaluation (EE), and self-evaluation (SE). The main differences between the three relate to the primary goal(s) of the evaluation and the relationship between the evaluator and the stakeholders.

Stakeholder-Based Evaluation

In stakeholder-based evaluations, the (external) evaluator is the expert on evaluation methods. She designs the process, collects the data, and writes up the report. In contrast to TE, however, there is a recognition that the stakeholders are the experts on their own program. In SBE, they have significant input when it comes to the selection of the evaluation criteria and the interpretation of the findings. The primary objective of SBE is to provide the stakeholders with feedback for program improvement while not sacrificing any rigor, validity, or objectivity in the process, so that the needs of the main client (e.g. the funding agency) are met. Johnson et al. (1998) describe their experiences with SBE. On the one hand, they found that involving stakeholders indeed improved the evaluation’s credibility among stakeholders. Other reported advantages are: a focus on goals rather than activities; staff development; and improved respect for cultural diversity. On the other hand, they found that it was very time-intensive and that it was driven mostly by program staff, while involvement from the clients remained limited. While Unrau (2001) reports that involving stakeholders in the formulation of the Program Logic model may improve the evaluation, Quintanilla & Packard (2002) found that involving stakeholders increased their sense of ownership of the evaluation process, which in turn facilitated its integration into the daily activities of the program.

Empowerment Evaluation

Fetterman (2001) is the most vocal proponent of the empowerment approach to evaluation (EE). In the words of Torres and Preskill (2001), the goal is to “facilitate learning and change” (p. 388) rather than merely evaluate after the fact. The role of the evaluator, therefore, changes from content expert to facilitator. EE puts the program stakeholders in the center of the process while the evaluator assists and coaches them. Although this approach has received a lot of press, empirical studies are limited. Schnoes et al. (2000) report on their attempt to implement EE. They ran into problems, including disagreement among participants and the amount of time required of everyone involved. EE is not suitable for each evaluation context (nor is it intended to be), and successful implementation requires foresight and a significant amount of work in advance of the process.


One could argue that self-assessment (SA) no longer qualifies as evaluation because there are no guarantees that any kind of rigor or systematic approach is safeguarded. It is included here because it is one intended outcome of empowerment evaluation and because it can be very useful to the program’s staff and other stakelholders. Empirical research is scarce, however. Paton, Foot, and Payne (2000) worked with several non-profits that assessed their own programs’ quality by self-administering existing quality assessment instruments. The results were mixed: on the one hand, the instruments were not used as intended by its authors, thereby undermining the validity of its outcomes. On the other hand, they did serve to generate dialogue, which in itself was considered very useful.

In sum, it seems fair to say that while TE has “hardened” because of its shift in emphasis from activities and outputs to outcomes and results, the competing approaches have “softened” because of the evaluator’s gradual move from content expert to methodological expert and, finally, coach and mentor.

Synthesis: An Increased Evaluation Spectrum

The “hardening” of TE and the concurrent “softening” of the participative approaches strongly imply that the field of evaluation practice has diversified. This is in line with other authors’ observations (e.g. Caracelli, 2000; Smith, 2001). As a result, evaluators have a more diverse set of tools to tackle evaluations, and the days of the one-type-fits-all approach to evaluation are past.

An examination of the spectrum of available approaches shows that the role of the evaluator as well as other variables change according to the evaluation approach, as summarized in Table 1.

The vertical line in Table 1 (between SBE and EE) represents the parting line in the paradigm war, suggesting that the debate has not yet been settled. Smith (2001) agrees, saying that the debate “is and was about differences in philosophy and “world view” […] No sooner is it put to bed under one guise than to raise its ugly head under another” (p. 292). Datta (2001), speaking about the distance between the extremes (SA and TE) adds:

If anything, the distance is greater, at least in terms of articulated positions, between those who see evaluation as a quest for social justice which requires advocacy for the disenfranchised and those who see evaluation as the most nonpartisan, fair search we can mount for understanding what is happening and why, and for reaching judgments on merit, worth, and value (p. 405)

In conclusion, while the argument originally revolved around incompatible philosophical positions on knowability and objectivity, it now focuses on the espoused purpose of program evaluation. Those who argue for social justice are the former constructivists and those who still subscribe to the assessment of value or worth generally fall into the objectivist camp.

Table 1: Comparison of Different Evaluation Approaches
Stakeholders’ influence None In design and reporting only   Throughout Throughout
Extent of evaluators’ control Complete Majority   Shared with stakeholders None
Image(s) of evaluators Doctor; scientist; professor Chief executive; policy-maker   Mentor; facilitator; teacher; coach n/a
Purpose Summative only Mostly summative   Mostly formative Formative only
Utilization rate Very low Low   High Very high
Basis for credibility Evaluator expertise; methodological rigor Evaluator expertise and stakeholder involvement   Utilization of findings and evaluator endorsement Usefulness of findings

The Emerging Trend: Advent of Pragmatic Approaches

In spite of the continued paradigm war, which tends to polarize the field between two alternatives (objectivist or constructivist assumptions; quantitative or qualitative methods; summative or formative purpose; etc.), the literature shows an increase in popularity of pragmatic approaches (e.g., Lawrenz & Huffman, in press; Bengston & Fan, 1999; Mohr, 1999b; Pratt et al., 2000). These approaches essentially ignore the paradigm debate and show no hesitation to mix approaches in ways that loyalists to either paradigm would never do out of fear of compromising their findings. One might even speculate that these pragmatic approaches are appearing because of the persistence of the paradigm war – its abstract debates have not addressed the questions and problems that evaluators in the “real world” wrestle with, and may have led to the advent of “mixed-method approaches” (Rog & Fournier, 1997). For example, Johnson, McDaniel, and Willeke (2000) argue that assessments of portfolios can satisfy psychometric demands of reliability. McConney, Rudd, and Ayres (in press) suggest that when qualitative and quantitative measures yield contradicting results, a synthesis can still be achieved by assigning the measures a particular weight or importance. In a similar vein, MacNeil (2000) introduces the reader to the possible utility of including poetic representation in evaluation reports.

Possibly the best justification for calling the advent of mixed-method approaches a trend is the work by Henry, Julnes, and Mark (1997) and Mark, Henry and Julnes (2000). These authors attempt to give the pragmatic approach more legitimacy by providing a theoretical basis for it, called emergent realism. Datta (2001) concurs: “as the ends draw apart, the widening middle ground is getting filled with new approaches to unify us, such as realistic evaluation” (p. 405). Although a treatise of realistic evaluation falls beyond the scope of this paper, it is a noteworthy contribution worthy of further examination. Thus far, there are no articles reporting on an application of this philosophy to program evaluation. Time will tell whether or not emergent realism will catch on in the field.


If this trend continues, it may have profound implications for program evaluation as an emerging field of practice. For one, philosophically oriented academicians who subscribed to a particular position in the paradigm debates are essentially being ignored by peers and practitioners who go their own way under the “whatever works” motto. A skeptic might argue that the recent work on realist evaluation by Mark, Henry and Julnes (2000) is an effort to salvage academia’s credibility in leading the field through theory and research. Nevertheless, attempts to find a sound philosophical basis for mixing methods – if successful – might bring the field forward further as it still is struggling to find answers to accountability requirements (Government Accounting Office, 1998).

If we extrapolate this trend, however, another possible implication becomes clear: that the purpose of evaluation – subject of debate as argued above – will be determined not by academic evaluators but by evaluation stakeholders, in particular those who are funding the evaluation efforts. With no clear guidance or agreement from academia on how to properly conduct evaluations, decision-makers are most likely to approach those evaluators whose views of evaluation most fit their needs. For example, a program director looking for program improvement may look to an empowerment evaluator, while a funder may look to a traditional evaluator. Although not necessarily a detriment to the field, it does point to the possibility that program evaluation may become a service industry, much like the hospitality or entertainment industries, for example, rather than one oriented toward applied social science research. As long as the funder and other stakeholders are satisfied with the evaluation methods, and of course, outcomes, who cares if it does not adhere to stringent scientific principles?

The danger of this eroded credibility of academic evaluators is of course that evaluations may evolve into even more politically charged events than they already are, and the evaluator becomes just one of the stakeholders. For example, program staff may not agree with the choice of evaluator that the program funder has made. Evaluators’ voting records may become a point of concern. The skeptic here will observe that the decreased credibility of academically oriented evaluators will strike at the heart of program evaluation practice because it allows political agendas and other motives to explicitly drive the choice of evaluator and evaluation method, perhaps even the outcomes. Time will tell which of the skeptics are right. Either way, scholar-practitioners have their work cut out for them.


Bengston, D. N., & Fan, D. P. (1999). An innovative method for evaluating strategic goals in a public agency: Conservation leadership. Evaluation Review, 23(1), 77-10.

Birckmayer, J. D., & Weiss, C. H. (2000). Theory-based evaluation in practice: What do we learn? Evaluation Review, 24(4), 407-431.

Boardman, A. E., & Vining, A. R. (2000). Using service-customer Matrices in strategic analysis of nonprofits. Nonprofit Management and Leadership, 10(4), 397-420.

Bozzo, S. L. (2000). Evaluation resources for nonprofit organizations. Nonprofit Management and Leadership, 10(4), 463-472.

Brandon, P. R. (1998). Stakeholder participation for the purpose of helping ensure evaluation validity: Bridging the gap between collaborative and non-collaborative evaluation. American Journal of Evaluation, 19(3), 325-337.

Campbell, D. (2002). Outcomes assessment and the paradox of nonprofit accountability. Nonprofit Management and Leadership, 12(3), 243-259.

Caracelli, V. J. (2000). Evaluation use at the threshold of the twenty-first century. In V. J. Caracelli & H. Preskill (Eds.), The expanding scope of evaluation use (pp. 99-111). New Directions for Evaluation, no. 88. San Francisco: Jossey-Bass.

Cockerill, R., Myers, T., & Allman, D. (2000). Planning for community-based evaluation. . American Journal of Evaluation, 21(3), 351-357.

Datta, L. E. (2001). Coming attractions. American Journal of Evaluation, 22(3), 403-408.

Davidson, E. J. (2000). Ascertaining causality in theory-based evaluation. In P. J. Rogers, T. A. Hacsi, A. Petrosino, & T. A. Huebner (Eds.), Program theory in evaluation: Challenges and opportunities (pp. 5-13). New Directions for Evaluation, no. 87. San Francisco: Jossey-Bass.

Dunnagan, T., Duncan, S. F., & Paul, L. (2000). Doing effective evaluations: A case study of family empowerment due to welfare reform. Evaluation and Program Planning, 23, 125-136.

Fetterman, D. M. (2001). Foundations of empowerment evaluation. Thousand Oaks, CA: Sage Publications, Inc.

Fine, A. H., Thayer, C. E., & Coghlan, A. T. (2000). Program evaluation practice in the nonprofit sector. Nonprofit Management and Leadership, 10(3), 331-339.

Government Accounting Office. (1998). Program evaluation: Agencies challenged by new demand for information on program results (No. GAO/GGD-98-53). Washington, DC: GAO.

Greene, J. C. (2001). Evaluation extrapolations. American Journal of Evaluation, 22(3), 397-402.

Henry, G. T. (2001). How modern democracies are shaping evaluation and the emerging challenges for evaluation. American Journal of Evaluation, 22(3), 419-429.

Henry, G. T., Julnes, G., & Mark, M. M. (Eds.). (1997). Realist evaluation: An emerging theory in support of practice. New Directions for Evaluation, no. 78. San Francisco: Jossey-Bass
Hess, B. (2000). Assessing program impact using latent growth modeling: A primer for the evaluator. Evaluation and Program Planning, 23, 419-428.

Hoefer, R. (2000). Accountability in action? Program evaluation in nonprofit human service agencies. Nonprofit Management and Leadership, 11(2), 167-177.

Hopson, R. K. (Ed.). (2000). How and why language matters in evaluation. New Directions for Evaluation, no. 86. San Francisco: Jossey-Bass.

House, E. R., & Howe, K. R. (2000). Deliberative democratic evaluation. In K. E. Ryan & L. DeStefano (Eds.), Evaluation as a democratic process: Promoting inclusion, dialogue, and deliberation (pp. 3-12). New Directions for Evaluation, no. 85. San Francisco: Jossey-Bass.

Johnson, R. L., McDaniel, F., & Willeke, M. J. (2000). Using portfolio’s in program evaluation: an investigation of interrater reliability. American Journal of Evaluation, 21(1), 65-80.

Johnson, R. L., Willeke, M. J., & Steiner, D. J. (1998). Stakeholder collaboration in the design and implementation of a family literacy portfolio assessment. American Journal of Evaluation, 19(3), 339-353.

Kaplan, R. S. (2001). Strategic performance measurement and management in nonprofit organizations. Nonprofit Management and Leadership, 11(3), 353-370.

Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago: University of Chicago Press.

Lawrenz, F., & Huffman, D. (in press). The archipelago approach to mixed method evaluation. American Journal for Evaluation.

Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Thousand Oaks, CA: Sage.

Lindenberg, M. (2001). Are we at the cutting edge or the blunt edge? Improving NGO organizational performance with private and public sector strategic management frameworks. Nonprofit Management and Leadership, 11(3), 247-270.

Love, A. J. (2001). The future of evaluation: Catching rocks with cauldrons. American Journal of Evaluation, 22(3), 437-444.

MacNeil, C. (2000). The prose and cons of poetic representation in evaluation reporting. American Journal of Evaluation, 21(3), 359-367.

Mark, M. M. (2001). Evaluation’s future: Furor, futile, or fertile? American Journal of Evaluation, 22(3), 457-479.

Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation: An integrated framework for understanding, guiding, and improving public and nonprofit policies and programs. San Francisco: Jossey-Bass.

McConney, A., Rudd, A., & Ayres, R. (in press). Getting to the bottom line: A method for synthesizing findings within mixed-method program evaluations. American Journal of Evaluation.

McLaughlin. J. A. & Jordan, G. B. (1999). Logic models: A tool for telling tour program’s performance story. Evaluation and Program Planning, 22, 65-72.

Mertens, D. M. (2001). Inclusitvity and transformation: Evaluation in 2010. American Journal of Evaluation, 22(3), 367-374.

Mertens, D. M. (1999). Inclusive evaluation: Implications of transformative theory for evaluation. American Journal of Evaluation, 20(1), 1-14.

Mohr, L. B. (1999a). The impact profile approach to policy merit: The case of research grants and the university. Evaluation Review, 23(2), 212-249.

Mohr, L. B. (1999b). The qualitative method of impact analysis. American Journal of Evaluation, 20(1), 69-84.

Moore, M. H. (1995). Creating public value; Strategic management in government. Cambridge, MA: Harvard University Press.

Newcomer, K. E. (1997). Using performance measurement to improve programs. In K. E. Newcomer (Ed.), Using performance measurement to improve public and nonprofit progams (pp. 5-14). New Directions for Evaluation, no. 75. San Francisco: Jossey-Bass.

Owen, J. M. (1998). Towards an outcome hierarchy for professional university programs. Evaluation and Program Planning, 21, 315-321.

Paton, R., Foot, J., & Payne, G. (2000). What happens when nonprofits use quality models for self-assessment? Nonprofit Management and Leadership, 11(1), 21-34.

Patton, M. Q. (1997). Utilization-focused evaluation: The new century text (3rd ed.). Thousand Oaks, CA: Sage Publications.

Patton, M. Q. (2000). Overview: Language matters. In R. K. Hopson (Ed.), How and why language matters in evaluation (pp. 5-16). New Directions for Evaluation, no. 86. San Francisco: Jossey-Bass.

Perrin, B. (1998). Effective use and misuse of performance management. American Journal of Evaluation, 19(3), 367-379.

Poole, D. L., Davis, J. K., Reisman, J., & Nelson, J. E. (2001). Improving the quality of outcome evaluation plans. Nonprofit Management and Leadership, 11(4), 405-421.

Poole, D. L., Nelson, J., Carnahan, S., Chepenik, N. G., & Tubiak, C. (2000). Evaluating performance measurement systems in nonprofit agencies: The program accountability quality scale (PAQS). American Journal of Evaluation, 21(1), 15-26.

Pratt, C. C., McGuigan, W. M., & Katzev, A. R. (2000). Measuring program outcomes: Retrospective pretest methodology. American Journal of Evaluation, 21(3), 341-349.

Quarter, J. & Richmond, B. J. (2001). Accounting for social value in nonprofits and for-profits. Nonprofit Management and Leadership, 12(1), 75-85.

Quintanilla, G., & Packard, T. (2002). A participatory evaluation of an inner-city science enrichment prgram. Evaluation and Program Planning, 25, 15-22.

Reed, C. S., & Brown, R. E. (2001). Outcome-asset impact model: linking outcomes and assets. Evaluation and Program Planning, 24, 287-295.

Renz, D. O. (2001). Changing the face of nonprofit management. Nonprofit Management and Leadership, 11(3), 387-396.

Reynolds, A. J. (1998). Confirmatory program evaluation: A method for strengthening causal inference. American Journal of Evaluation, 19(2), 203-221.
Rog, D. J. & Fournier, D. (1997). Editors’ notes. In D. J. Rog & D. Fournier (Eds.), Progress and future directions in evaluation: Perspectives on theory, practice, and methods (pp. 1-3). New Directions for Evaluation, no. 76. San Francisco: Jossey-Bass.

Rogers, P. J., Petrosino, A., Huebner, T. A., & Hacsi, T. A. (2000). Program theory evaluation: Practice, promise, and problems. In P. J. Rogers, T. A. Hacsi, A. Petrosino, & T. A. Huebner (Eds.), Program theory in evaluation: Challenges and opportunities (pp. 5-13). New Directions for Evaluation, no. 87. San Francisco: Jossey-Bass.

Rojas, R. R. (2000). A review of models for measuring organizational effectiveness among for-profit and nonprofit organizations. Nonprofit Management and Leadership, 11(1), 97-104.

Rossi, P. H. (1997). Advances in quantitative evaluation, 1987-1996. In D. J. Rog & D. Fournier (Eds.), Progress and future directions in evaluation: Perspectives on theory, practice, and methods (pp. 57-68). New Directions for Evaluation, no. 76. San Francisco: Jossey-Bass.

Ryan, K. (1998). Advantages and challenges of using inclusive evaluation approaches in evaluation practice. American Journal of Evaluation, 19(1), 101-122.

Ryan, K. E., & DeStefano, L. (2000). Disentangling dialogue: Issues from practice. In K. E. Ryan & L. DeStefano (Eds.), Evaluation as a democratic process: Promoting inclusion, dialogue, and deliberation (pp. 63-76). New Directions for Evaluation, no. 85. San Francisco: Jossey-Bass.

Sawhill, J. C., & Williamson, D. (2001). Mission impossible? Measuring success in nonprofit organizations. Nonprofit Management and Leadership, 11(3), 371-386.

Schnoes, C. J., Murphy-Berman, V., & Chambers, J. (2000). Empowerment evaluation applied: Experiences, analysis, and recommendations from a case study. American Journal of Evaluation, 21(1), 53-64.

Smith, M. F. (2001). Evaluation: Preview of the future #2. American Journal of Evaluation, 22(3), 281-300.

Stake, R. E. (1973, October). Program evaluation, particularly responsive evaluation. Keynote address at the conference “New trends in evaluation,” Institute of Education, University of Goteborg, Sweden. In G. F. Madaus, M. S. Scriven, & D. L. Stufflebeam (Eds.), Evaluation models: Viewpoints on educational and human services evaluation. Boston: Kluwer-Nijhoff, 1987.

Stake, R. E. (2001). A problematic heading. American Journal of Evaluation, 22(3), 349-354.

Stame, N. (1999). Small and medium enterprise aid programs: Intangible effects and evaluation practice. Evaluation and Program Planning, 22, 105-111.

Stanfield, H. H. (1999). Slipping through the front door: Relevant social science evaluation in the people of color century. American Journal of Evaluation, 20(3), 415-431.

Stufflebeam, D.L. (1994). Empowerment evaluation, objectivist evaluation, and evaluation standards: Where the future of evaluation should not go and where it needs to go. Evaluation practice, 15(3), 321-338.

Stufflebeam, D. L. (2001). Evaluation models. New Directions for Evaluation, No. 89. San Francisco: Jossey-Bass.

Tassey, G. (1999). Lessons learned about the methodology of economic impact studies: The NIST experience. Evaluation and Program Planning, 22, 113-119.

Thayer, C. E., & Fine, A. H. (2000). Evaluation and outcome measurement in the non-profit sector: Stakeholder participation. Evaluation and Program Planning, 23, 103-108.

Toffolon-Weiss, M. M., Bertrand, J. J., & Terrell, S. S. (1999). The results framework – an innovative tool for program planning and evaluation. Evaluation Review, 23(3), 336-359.

Torres, R. T., & Preskill, H. (2001). Evaluation and organizational learning: Past, present, and future. American Journal of Evaluation, 22(3), 387-395.

Unrau, Y. A. (2001). Using client interviews to illuminate outcomes in program logic models: A case example. Evaluation and Program Planning, 24, 353-361.

Wandersman, A., Imm, P., Chinman, M., & Kaftarian, S. (2000). Getting to outcomes: A results-based approach to accountability. Evaluation and Program Planning, 23, 389-395.

Watt, J. H. (1999). Internet systems for evaluation research. In G. Gay & T. L. Bennington (Eds.), Information technologies in evaluation: Social, moral, epistemological, and practical implications (pp. 23-43). New Directions for Evaluation, no. 84. San Francisco: Jossey-Bass.

Weiss, C. H. (1998). Evaluation: Methods for studying programs and policies (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

Wholey, J. S. (1997). Clarifying goals, reporting results. In D. J. Rog & D. Fournier (Eds.), Progress and future directions in evaluation: Perspectives on theory, practice, and methods (pp. 95-105). New Directions for Evaluation, no. 76. San Francisco: Jossey-Bass.

Wholey, J. S. (2001). Managing for results: Roles for evaluators in a new management era. American Journal of Evaluation, 22(3), 343-347.

Youtie, J., Bozeman, B., & Shapira, P. (1999). Using an evaluability assessment to select methods for evaluating state technology development programs: The case of the Georgia Research Alliance. Evaluation and Program Planning, 22, 55-64.

Zajano, N. C., & Lochtefeld, S. S. (1999). The nature of knowledge and language in the legislative arena. In R. K. Jonas (Ed.), Legislative program evaluation: Utilization-driven research for decision makers (pp. 85-94). New Directions for Evaluation, no. 81. San Francisco: Jossey-Bass.