NEJM -- The Promise and Problems of Meta-Analysis
Previous Volume 337:559-561 August 21, 1997 Number 8
The Promise and Problems of Meta-Analysis
Find Similar Articles
Meta-analysis has acquired a substantial following among both statisticians and clinicians. The technique was developed as a way to summarize the results of different research studies of related problems. Meta-analysis may be applied even when the studies are small and there is substantial variation in the specific issues studied, the research methods applied, the source and nature of the study subjects, and other factors that may have an important bearing on the findings. In this issue of the Journal, LeLorier et al.1 compare the findings of 12 large randomized, controlled trials with the results of meta-analyses of the same problems. They find important discrepancies. When a large randomized, controlled trial — commonly considered the gold standard for determining the effects of medical interventions — disagrees with a meta-analysis, what should the reader conclude? Perhaps more important, when only one of the two tools is used, how much uncertainty should the reader add to the confidence limits and other statistical measures of uncertainty reported by the author?
The core of meta-analysis is its systematic approach to the identification and abstracting of critical information from research reports. Doing a meta-analysis correctly demands expertise in both the method and the substance and hence almost always requires collaboration between clinicians and an experienced statistician. The questions must be defined carefully to maximize the relevance of the reports to be included and to reduce uncertainties about procedures. The investigators must then try to find every relevant report by searching data bases, reviewing bibliographies, and asking widely about unpublished work. The collected reports are then winnowed to the few (often less than 10 percent) that meet the requirements for the meta-analysis. The reports must be searched carefully to identify problems and validate the quantitative findings of interest. These findings must be expressed on a common scale (often as odds ratios), and some reports may have to be dropped for lack of information. Those doing a meta-analysis may also abstract information from each report to produce a quantitative measure of research quality. Each of the individual quantitative estimates must be scrutinized for problems, and this may require the efforts of a range of specialists. When the analysis is completed and submitted for publication, the editor and the reviewers must assure themselves of its quality. A rigorous technical review of a meta-analysis requires the reviewer to identify, reabstract, and interpret a fair sample of the original papers. Very few editors and reviewers will do this, which may be one reason why there are so many poor meta-analyses in the literature.
Although some meta-analyses stop with the presentation and discussion of the results of the individual studies, many others proceed further and combine the results into a single, comprehensive "best" estimate, generally with statistical confidence bounds, that is meant to summarize what is known about the clinical problem. This last step — preparing and presenting a single estimate as the distillation of all that is known — is the one that has drawn the most criticism. This is because there are often biologic reasons, statistical evidence, or both, showing that the studies included in the meta-analysis have in fact measured somewhat different things, so that a combined estimate cannot be meaningful unless additional, doubtful assumptions are made. One such assumption is that the effects reported in the studies actually performed can be seen as a random sample of the effects observed in all possible studies that might have met the author's criteria. Unfortunately, there is little evidence to support an assumption such as this.
LeLorier et al.1 searched four leading general medical journals to identify all the large randomized, controlled trials (those with 1000 subjects or more) whose results were published from 1991 through 1994, then searched for meta-analyses of similar topics that were published before each trial. Twelve large randomized, controlled trials and 19 meta-analyses met their criteria. Because some of the trials and meta-analyses reported on more than 1 outcome, they studied a total of 40 outcomes. Overall, there was somewhat better than chance agreement between these meta-analyses and the subsequent large randomized, controlled trials, with kappa values in a range commonly considered to represent "fair-to-slight agreement." In terms of an ordinary diagnostic test, the results could be described as average. The results obtained with the two approaches usually pointed in the same direction, and there were no cases in which they gave statistically significant results in opposite directions. However, the discrepancies with regard to the estimated size of an effect were sometimes quite substantial, and occasionally they differed significantly despite their agreement in direction. Stewart and Parmar2 have shown how some such differences can arise.
It is impossible to say, on the basis of present evidence alone, whether the results of a large randomized, controlled trial or those of a meta-analysis of many smaller studies are more likely to be close to the truth. Much depends on the details of both the research studies and the analyses. When both the trial and the meta-analysis seem to be of good quality, however, I tend to believe the results of the trial. A history of 40 years of generally successful use of randomized, controlled trials that have made important contributions to progress in many branches of medicine must not be overlooked. In addition, major problems with the implementation of meta-analyses have been common.3,4,5,6 There have been a wide variety of these, including failure of the investigator performing the meta-analysis to understand the basic issues, carelessness in abstracting and summarizing appropriate papers, failure to consider important covariates, bias on the part of the meta-analyst, and, perhaps most often, overstatements of the strength and precision of the results. It is not uncommon to find that two or more meta-analyses done at about the same time by investigators with the same access to the literature reach incompatible or even contradictory conclusions.7,8,9,10 Such disagreement argues powerfully against any notion that meta-analysis offers an assured way to distill the "truth" from a collection of research reports.
Other observers, including policy makers, also have reservations about meta-analyses, and there is some general concern about the credibility of the findings of meta-analysis. I know of no instance in medicine in which a meta-analysis led to a major change in policy before the time when a careful, conventional review of the literature led to the same change. Showing that a sequence of meta-analyses performed over time converged to have some value as published findings accumulated does not mean that it was the meta-analyses that gave a convincing answer to the particular clinical question.
In my own review of selected meta-analyses,3 problems were so frequent and so serious, including bias on the part of the meta-analyst, that it was difficult to trust the overall "best estimates" that the method often produces. On present evidence, we can generally accept the results of a well-done meta-analysis as a way to present the results of disparate studies on a common scale (as is shown by the two figures in the article by LeLorier et al.), but any attempt to reduce the results to a single value, with confidence bounds, is likely to lead to conclusions that are wrong, perhaps seriously so. I still prefer conventional narrative reviews of the literature, a type of summary familiar to readers of the countless review articles on important medical issues.
Meta-analysis may still be improved, by a combination of experience and theory, to the point at which its findings can be taken as sufficiently reliable when there is no other analysis or confirmation available, but that day seems to be well ahead of us. LeLorier et al. also imply, however, that large randomized, controlled trials should be regarded more circumspectly than published reports commonly suggest. We never know as much as we think we know.
John C. Bailar III, M.D., Ph.D.
University of Chicago
Chicago, IL 60637
LeLorier J, Grégoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 1997;337:536-542.[Abstract/Full Text]
Stewart LA, Parmar MK. Meta-analysis of the literature or of individual patient data: is there a difference? Lancet 1993;341:418-422.[Medline]
Bailar JC III. The practice of meta-analysis. J Clin Epidemiol 1995;48:149-157.[CrossRef][Medline]
Macarthur C, Foran PJ, Bailar JC III. Qualitative assessment of studies included in a meta-analysis: DES and the risk of pregnancy loss. J Clin Epidemiol 1995;48:739-747.[CrossRef][Medline]
Shapiro S. Meta-analysis shmeta-analysis. Am J Epidemiol 1994;140:771-778.[Medline]
Shapiro S. Is meta-analysis a valid approach to the evaluation of small effects in observational studies? J Clin Epidemiol 1997;50:223-229.[CrossRef][Medline]
Kerlikowske K, Grady D, Rubin SM, Sandrock C, Ernster VL. Efficacy of screening mammography: a meta-analysis. JAMA 1995;273:149-154.[Abstract]
Smart CR, Hendrick RE, Rutledge JH III, Smith RA. Benefit of mammography screening in women ages 40 to 49 years: current evidence from randomized controlled trials. Cancer 1995;75:1619-1626.[Medline]
Rosenfeld RM, Post JC. Meta-analysis of antibiotics for the treatment of otitis media with effusion. Otolaryngol Head Neck Surg 1992;106:378-386.[Medline]
Williams RL, Chalmers TC, Stange KC, Chalmers FT, Bowlin SJ. Use of antibiotics in preventing recurrent acute otitis media and in treating otitis media with effusion: a meta-analytic attempt to resolve the brouhaha. JAMA 1993;270:1344-1351. [Erratum, JAMA 1994;271:430.][Abstract]
Add to Personal Archive
Add to Citation Manager
Notify a Friend
E-mail When Cited
Find Similar Articles
Meta-Analyses and Large Randomized, Controlled Trials
Ioannidis J. P.A., Cappelleri J. C., Lau J., Bent S., Kerlikowske K., Grady D., Song F.-J., Sheldon T. A., Khan S., Williamson P., Sutton R., Stewart L. A., Parmar M. K.B., Tierney J. F., Sim I., Lavori P., Imperiale T. F., LeLorier J., Grégoire G., Bailar J. C.
Extract | Full Text
N Engl J Med 1998; 338:59-62, Jan 1, 1998. Correspondence
This article has been cited by other articles:
Liu, S. S., Strodtbeck, W. M., Richman, J. M., Wu, C. L. (2005). A Comparison of Regional Versus General Anesthesia for Ambulatory Anesthesia: A Meta-Analysis of Randomized Controlled Trials. Anesth Analg 101: 1634-1642 [Abstract] [Full Text]
Eisenberg, M. J. (2001). Risk stratification for arrhythmic events: are the bangs worth the bucks?. JACC 38: 1912-1915 [Full Text]
Bhandari, M., Bajammal, S., Guyatt, G. H., Griffith, L., Busse, J. W., Schunemann, H., Einhorn, T. A. (2005). Effect of Bisphosphonates on Periprosthetic Bone Mineral Density After Total Joint Arthroplasty. A Meta-Analysis. J Bone Joint Surg 87: 293-301 [Abstract] [Full Text]
Ferraris, V. A., Ferraris, S. P. (2003). Risk Stratification and Comorbidity. Card Surg Adult 2: 187-224 [Full Text]
Tobin, M. J. (2003). Writing a Review Article for AJRCCM. Am J Respir Crit Care Med 168: 732-734 [Full Text]
Bhandari, M., Devereaux, P. J., Swiontkowski, M. F., Tornetta, P. III, Obremskey, W., Koval, K. J., Nork, S., Sprague, S., Schemitsch, E. H., Guyatt, G. H. (2003). Internal Fixation Compared with Arthroplasty for Displaced Fractures of the Femoral Neck: A Meta-Analysis. J Bone Joint Surg 85: 1673-1681 [Abstract] [Full Text]
Bonow, R. O. (2002). Myocardial viability and prognosis in patients with ischemic left ventricular dysfunction. JACC 39: 1159-1162 [Full Text]
Khaodhiar, L., Bistrian, B. R., Weyandt, D., Karzai, W., Roubenoff, R., Lennon, C., Mason, J., Rosenberg, I. H., Saltzman, E., Heyland, D. K., Drover, J. W. (1999). Total Parenteral Nutrition for Critically Ill Patients. JAMA 282: 1423-1425 [Full Text]
DerSimonian, R., Levine, R. J. (1999). Resolving Discrepancies Between a Meta-analysis and a Subsequent Large Controlled Trial. JAMA 282: 664-670 [Abstract] [Full Text]
Allen, I. E., Olkin, I. (1999). Estimating Time to Conduct a Meta-analysis From Number of Citations Retrieved. JAMA 282: 634-635 [Full Text]
Ioannidis, J. P. A., Cappelleri, J. C., Lau, J. (1998). Issues in Comparisons Between Meta-analyses and Large Trials. JAMA 279: 1089-1093 [Abstract] [Full Text]
Alsarraf, R., Alsarraf, N. W., Kato, B. M., Goldman, N. D. (2000). Meta-analysis in Otolaryngology. Arch Otolaryngol Head Neck Surg 126: 711-716 [Abstract] [Full Text]
(2003). ATS/ACCP Statement on Cardiopulmonary Exercise Testing. Am J Respir Crit Care Med 167: 211-277 [Full Text]
Song, F., Sheldon, T. A., Sutton, A. J., Abrams, K. R., Jones, D. R. (2001). Methods for Exploring Heterogeneity in Meta-Analysis. Eval Health Prof 24: 126-151 [Abstract]
Halpern, S. D., Karlawish, J. H. T., Berlin, J. A. (2002). The Continuing Unethical Conduct of Underpowered Clinical Trials. JAMA 288: 358-362 [Abstract] [Full Text]
Goodman, N. W., Pronovost, P., Berenholtz, S., Dorman, T., Martinez, E., Merritt, W. (2002). Evidence-Based Medicine Needs Proper Critical Review * Response. Anesth Analg 95: 1817-1818 [Full Text]
Guzman, J., Esmail, R., Karjalainen, K., Malmivaara, A., Irvin, E., Bombardier, C. (2001). Multidisciplinary rehabilitation for chronic low back pain: systematic review. BMJ 322: 1511-1516 [Abstract] [Full Text]
Ioannidis, J. P.A., Cappelleri, J. C., Lau, J., Bent, S., Kerlikowske, K., Grady, D., Song, F.-J., Sheldon, T. A., Khan, S., Williamson, P., Sutton, R., Stewart, L. A., Parmar, M. K.B., Tierney, J. F., Sim, I., Lavori, P., Imperiale, T. F., LeLorier, J., Gregoire, G., Bailar, J. C. (1998). Meta-Analyses and Large Randomized, Controlled Trials. NEJM 338: 59-62 [Full Text]
Swanson, G. D., Denson, K. W.E., Wells, A. J., Jamrozik, K., Colditz, G. A., Fuster, V., Greenland, P., Liu, K., Rabinoff, M., He, J., Hughes, J., Whelton, P. K., Bailar, J. C. (1999). Passive Smoking and Coronary Heart Disease. NEJM 341: 697-700 [Full Text]
Suojanen, J. N. (1999). False False Positive Rates. NEJM 341: 131-131 [Full Text]
Ioannidis, J. P. A., Lau, J. (2001). Evolution of treatment effects over time: Empirical insight from recursive cumulative metaanalyses. Proc. Natl. Acad. Sci. U. S. A. 98: 831-836 [Abstract] [Full Text]
Bostwick, J. M., Pankratz, V. S. (2000). Affective Disorders and Suicide Risk: A Reexamination. Am. J. Psychiatry 157: 1925-1932 [Abstract] [Full Text]
(1998). Treatment for Major Depression in Managed Care and Fee-for-Service Systems. Am. J. Psychiatry 155: 859-860 [Full Text]
Arrivé, L., Renard, R., Carrat, F., Belkacem, A., Dahan, H., Le Hir, P., Monnier-Cholley, L., Tubiana, J.-M. (2000). A Scale of Methodological Quality for Clinical Studies of Radiologic Examinations. Radiology 217: 69-74 [Abstract] [Full Text]
WEST, R R (2000). Evidence based medicine overviews, bulletins, guidelines, and the new consensus. Postgrad Med J 76: 383-389 [Full Text]
Hogue, C. W. Jr, Hyder, M. L. (2000). Atrial fibrillation after cardiac operation: risks, mechanisms, and treatment. Ann. Thorac. Surg. 69: 300-306 [Abstract] [Full Text]
Spring, D. B., Barkan, H. E., Pruyn, S. C. (2000). Potential Therapeutic Effects of Contrast Materials in Hysterosalpingography: A Prospective Randomized Clinical Trial1. Radiology 214: 53-57 [Abstract] [Full Text]
Dinh-Zarr, T., Diguiseppi, C., Heitman, E., Roberts, I. (1999). PREVENTING INJURIES THROUGH INTERVENTIONS FOR PROBLEM DRINKING: A SYSTEMATIC REVIEW OF RANDOMIZED CONTROLLED TRIALS. Alcohol Alcohol. 34: 609-621 [Abstract] [Full Text]
Anderson, J. W, Johnstone, B. M, Remley, D. T (1999). Breast-feeding and cognitive development: a meta-analysis. Am. J. Clin. Nutr. 70: 525-535 [Abstract] [Full Text]
Freudenheim, J. L (1999). Study design and hypothesis testing: issues in the evaluation of evidence from research in nutritional epidemiology. Am. J. Clin. Nutr. 69: 1315S-1321 [Abstract] [Full Text]
Arditi, M., Mason Jr, E. O., Bradley, J. S., Tan, T. Q., Barson, W. J., Schutze, G. E., Wald, E. R., Givner, L. B., Kim, K. S., Yogev, R., Kaplan, S. L. (1998). Three-Year Multicenter Surveillance of Pneumococcal Meningitis in Children: Clinical Characteristics, and Outcome Related to Penicillin Susceptibility and Dexamethasone Use. Pediatrics 102: 1087-1097 [Abstract] [Full Text]
GIANNINI, E. H (1998). Can non-fundable trials be conducted anyway? The case for open, randomised, actively controlled trials in rheumatology. Ann Rheum Dis 57: 128-130 [Full Text]