 En español – ExME
 Em português – EME
Systematic reviews vs metaanalysis: what’s the difference?
Posted on 24th July 2023 by Verónica Tanco Tellechea
You may hear the terms ‘systematic review’ and ‘metaanalysis being used interchangeably’. Although they are related, they are distinctly different. Learn more in this blog for beginners.
What is a systematic review?
According to Cochrane (1), a systematic review attempts to identify, appraise and synthesize all the empirical evidence to answer a specific research question. Thus, a systematic review is where you might find the most relevant, adequate, and current information regarding a specific topic. In the levels of evidence pyramid , systematic reviews are only surpassed by metaanalyses.
To conduct a systematic review, you will need, among other things:
 A specific research question, usually in the form of a PICO question.
 Prespecified eligibility criteria, to decide which articles will be included or discarded from the review.
 To follow a systematic method that will minimize bias.
You can find protocols that will guide you from both Cochrane and the Equator Network , among other places, and if you are a beginner to the topic then have a read of an overview about systematic reviews.
What is a metaanalysis?
A metaanalysis is a quantitative, epidemiological study design used to systematically assess the results of previous research (2) . Usually, they are based on randomized controlled trials, though not always. This means that a metaanalysis is a mathematical tool that allows researchers to mathematically combine outcomes from multiple studies.
When can a metaanalysis be implemented?
There is always the possibility of conducting a metaanalysis, yet, for it to throw the best possible results it should be performed when the studies included in the systematic review are of good quality, similar designs, and have similar outcome measures.
Why are metaanalyses important?
Outcomes from a metaanalysis may provide more precise information regarding the estimate of the effect of what is being studied because it merges outcomes from multiple studies. In a metaanalysis, data from various trials are combined and generate an average result (1), which is portrayed in a forest plot diagram. Moreover, metaanalysis also include a funnel plot diagram to visually detect publication bias.
Conclusions
A systematic review is an article that synthesizes available evidence on a certain topic utilizing a specific research question, prespecified eligibility criteria for including articles, and a systematic method for its production. Whereas a metaanalysis is a quantitative, epidemiological study design used to assess the results of articles included in a systematicreview.
DEFINITION  Synthesis of empirical evidence regarding a specific research question  Statistical tool used with quantitative outcomes of various studies regarding a specific topic 
RESULTS  Synthesizes relevant and current information regarding a specific research question (qualitative).  Merges multiple outcomes from different researches and provides an average result (quantitative). 
Remember: All metaanalyses involve a systematic review, but not all systematic reviews involve a metaanalysis.
If you would like some further reading on this topic, we suggest the following:
The systematic review – a S4BE blog article
Metaanalysis: what, why, and how – a S4BE blog article
The difference between a systematic review and a metaanalysis – a blog article via Covidence
Systematic review vs metaanalysis: what’s the difference? A 5minute video from Research Masterminds:
 About Cochrane reviews [Internet]. Cochranelibrary.com. [cited 2023 Apr 30]. Available from: https://www.cochranelibrary.com/about/aboutcochranereviews
 Haidich AB. Metaanalysis in medical research. Hippokratia. 2010;14(Suppl 1):29–37.
Verónica Tanco Tellechea
Leave a reply cancel reply.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
Subscribe to our newsletter
You will receive our monthly newsletter and free access to Trip Premium.
Related Articles
How to read a funnel plot
This blog introduces you to funnel plots, guiding you through how to read them and what may cause them to look asymmetrical.
Heterogeneity in metaanalysis
When you bring studies together in a metaanalysis, one of the things you need to consider is the variability in your studies – this is called heterogeneity. This blog presents the three types of heterogeneity, considers the different types of outcome data, and delves a little more into dealing with the variations.
Natural killer cells in glioblastoma therapy
As seen in a previous blog from Davide, modern neuroscience often interfaces with other medical specialities. In this blog, he provides a summary of new evidence about the potential of a therapeutic strategy born at the crossroad between neurology, immunology and oncology.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
 View all journals
 Explore content
 About the journal
 Publish with us
 Sign up for alerts
 Review Article
 Published: 08 March 2018
Metaanalysis and the science of research synthesis
 Jessica Gurevitch 1 ,
 Julia Koricheva 2 ,
 Shinichi Nakagawa 3 , 4 &
 Gavin Stewart 5
Nature volume 555 , pages 175–182 ( 2018 ) Cite this article
57k Accesses
927 Citations
735 Altmetric
Metrics details
 Biodiversity
 Outcomes research
Metaanalysis is the quantitative, scientific synthesis of research results. Since the term and modern approaches to research synthesis were first introduced in the 1970s, metaanalysis has had a revolutionary effect in many scientific fields, helping to establish evidencebased practice and to resolve seemingly contradictory research outcomes. At the same time, its implementation has engendered criticism and controversy, in some cases general and others specific to particular disciplines. Here we take the opportunity provided by the recent fortieth anniversary of metaanalysis to reflect on the accomplishments, limitations, recent advances and directions for future developments in the field of research synthesis.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our bestvalue onlineaccess subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
185,98 € per year
only 3,65 € per issue
Buy this article
 Purchase on SpringerLink
 Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Eight problems with literature reviews and how to fix them
The past, present and future of Registered Reports
Raiders of the lost HARK: a reproducible inference framework for big data science
Jennions, M. D ., Lortie, C. J. & Koricheva, J. in The Handbook of Metaanalysis in Ecology and Evolution (eds Koricheva, J . et al.) Ch. 23 , 364–380 (Princeton Univ. Press, 2013)
Article Google Scholar
Roberts, P. D ., Stewart, G. B. & Pullin, A. S. Are review articles a reliable source of evidence to support conservation and environmental management? A comparison with medicine. Biol. Conserv. 132 , 409–423 (2006)
Bastian, H ., Glasziou, P . & Chalmers, I. Seventyfive trials and eleven systematic reviews a day: how will we ever keep up? PLoS Med. 7 , e1000326 (2010)
Article PubMed PubMed Central Google Scholar
Borman, G. D. & Grigg, J. A. in The Handbook of Research Synthesis and Metaanalysis 2nd edn (eds Cooper, H. M . et al.) 497–519 (Russell Sage Foundation, 2009)
Ioannidis, J. P. A. The mass production of redundant, misleading, and conflicted systematic reviews and metaanalyses. Milbank Q. 94 , 485–514 (2016)
Koricheva, J . & Gurevitch, J. Uses and misuses of metaanalysis in plant ecology. J. Ecol. 102 , 828–844 (2014)
Littell, J. H . & Shlonsky, A. Making sense of metaanalysis: a critique of “effectiveness of longterm psychodynamic psychotherapy”. Clin. Soc. Work J. 39 , 340–346 (2011)
Morrissey, M. B. Metaanalysis of magnitudes, differences and variation in evolutionary parameters. J. Evol. Biol. 29 , 1882–1904 (2016)
Article CAS PubMed Google Scholar
Whittaker, R. J. Metaanalyses and megamistakes: calling time on metaanalysis of the species richnessproductivity relationship. Ecology 91 , 2522–2533 (2010)
Article PubMed Google Scholar
Begley, C. G . & Ellis, L. M. Drug development: Raise standards for preclinical cancer research. Nature 483 , 531–533 (2012); clarification 485 , 41 (2012)
Article CAS ADS PubMed Google Scholar
Hillebrand, H . & Cardinale, B. J. A critique for metaanalyses and the productivitydiversity relationship. Ecology 91 , 2545–2549 (2010)
Moher, D . et al. Preferred reporting items for systematic reviews and metaanalyses: the PRISMA statement. PLoS Med. 6 , e1000097 (2009). This paper provides a consensus regarding the reporting requirements for medical metaanalysis and has been highly influential in ensuring good reporting practice and standardizing language in evidencebased medicine, with further guidance for protocols, individual patient data metaanalyses and animal studies.
Moher, D . et al. Preferred reporting items for systematic review and metaanalysis protocols (PRISMAP) 2015 statement. Syst. Rev. 4 , 1 (2015)
Nakagawa, S . & Santos, E. S. A. Methodological issues and advances in biological metaanalysis. Evol. Ecol. 26 , 1253–1274 (2012)
Nakagawa, S ., Noble, D. W. A ., Senior, A. M. & Lagisz, M. Metaevaluation of metaanalysis: ten appraisal questions for biologists. BMC Biol. 15 , 18 (2017)
Hedges, L. & Olkin, I. Statistical Methods for Metaanalysis (Academic Press, 1985)
Viechtbauer, W. Conducting metaanalyses in R with the metafor package. J. Stat. Softw. 36 , 1–48 (2010)
AnzuresCabrera, J . & Higgins, J. P. T. Graphical displays for metaanalysis: an overview with suggestions for practice. Res. Synth. Methods 1 , 66–80 (2010)
Egger, M ., Davey Smith, G ., Schneider, M. & Minder, C. Bias in metaanalysis detected by a simple, graphical test. Br. Med. J. 315 , 629–634 (1997)
Article CAS Google Scholar
Duval, S . & Tweedie, R. Trim and fill: a simple funnelplotbased method of testing and adjusting for publication bias in metaanalysis. Biometrics 56 , 455–463 (2000)
Article CAS MATH PubMed Google Scholar
Leimu, R . & Koricheva, J. Cumulative metaanalysis: a new tool for detection of temporal trends and publication bias in ecology. Proc. R. Soc. Lond. B 271 , 1961–1966 (2004)
Higgins, J. P. T . & Green, S. (eds) Cochrane Handbook for Systematic Reviews of Interventions : Version 5.1.0 (Wiley, 2011). This large collaborative work provides definitive guidance for the production of systematic reviews in medicine and is of broad interest for methods development outside the medical field.
Lau, J ., Rothstein, H. R . & Stewart, G. B. in The Handbook of Metaanalysis in Ecology and Evolution (eds Koricheva, J . et al.) Ch. 25 , 407–419 (Princeton Univ. Press, 2013)
Lortie, C. J ., Stewart, G ., Rothstein, H. & Lau, J. How to critically read ecological metaanalyses. Res. Synth. Methods 6 , 124–133 (2015)
Murad, M. H . & Montori, V. M. Synthesizing evidence: shifting the focus from individual studies to the body of evidence. J. Am. Med. Assoc. 309 , 2217–2218 (2013)
Rasmussen, S. A ., Chu, S. Y ., Kim, S. Y ., Schmid, C. H . & Lau, J. Maternal obesity and risk of neural tube defects: a metaanalysis. Am. J. Obstet. Gynecol. 198 , 611–619 (2008)
Littell, J. H ., Campbell, M ., Green, S . & Toews, B. Multisystemic therapy for social, emotional, and behavioral problems in youth aged 10–17. Cochrane Database Syst. Rev. https://doi.org/10.1002/14651858.CD004797.pub4 (2005)
Schmidt, F. L. What do data really mean? Research findings, metaanalysis, and cumulative knowledge in psychology. Am. Psychol. 47 , 1173–1181 (1992)
Button, K. S . et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14 , 365–376 (2013); erratum 14 , 451 (2013)
Parker, T. H . et al. Transparency in ecology and evolution: real problems, real solutions. Trends Ecol. Evol. 31 , 711–719 (2016)
Stewart, G. Metaanalysis in applied ecology. Biol. Lett. 6 , 78–81 (2010)
Sutherland, W. J ., Pullin, A. S ., Dolman, P. M . & Knight, T. M. The need for evidencebased conservation. Trends Ecol. Evol. 19 , 305–308 (2004)
Lowry, E . et al. Biological invasions: a field synopsis, systematic review, and database of the literature. Ecol. Evol. 3 , 182–196 (2013)
Article PubMed Central Google Scholar
Parmesan, C . & Yohe, G. A globally coherent fingerprint of climate change impacts across natural systems. Nature 421 , 37–42 (2003)
Jennions, M. D ., Lortie, C. J . & Koricheva, J. in The Handbook of Metaanalysis in Ecology and Evolution (eds Koricheva, J . et al.) Ch. 24 , 381–403 (Princeton Univ. Press, 2013)
Balvanera, P . et al. Quantifying the evidence for biodiversity effects on ecosystem functioning and services. Ecol. Lett. 9 , 1146–1156 (2006)
Cardinale, B. J . et al. Effects of biodiversity on the functioning of trophic groups and ecosystems. Nature 443 , 989–992 (2006)
Rey Benayas, J. M ., Newton, A. C ., Diaz, A. & Bullock, J. M. Enhancement of biodiversity and ecosystem services by ecological restoration: a metaanalysis. Science 325 , 1121–1124 (2009)
Article ADS PubMed CAS Google Scholar
Leimu, R ., Mutikainen, P. I. A ., Koricheva, J. & Fischer, M. How general are positive relationships between plant population size, fitness and genetic variation? J. Ecol. 94 , 942–952 (2006)
Hillebrand, H. On the generality of the latitudinal diversity gradient. Am. Nat. 163 , 192–211 (2004)
Gurevitch, J. in The Handbook of Metaanalysis in Ecology and Evolution (eds Koricheva, J . et al.) Ch. 19 , 313–320 (Princeton Univ. Press, 2013)
Rustad, L . et al. A metaanalysis of the response of soil respiration, net nitrogen mineralization, and aboveground plant growth to experimental ecosystem warming. Oecologia 126 , 543–562 (2001)
Adams, D. C. Phylogenetic metaanalysis. Evolution 62 , 567–572 (2008)
Hadfield, J. D . & Nakagawa, S. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multitrait models for continuous and categorical characters. J. Evol. Biol. 23 , 494–508 (2010)
Lajeunesse, M. J. Metaanalysis and the comparative phylogenetic method. Am. Nat. 174 , 369–381 (2009)
Rosenberg, M. S ., Adams, D. C . & Gurevitch, J. MetaWin: Statistical Software for MetaAnalysis with Resampling Tests Version 1 (Sinauer Associates, 1997)
Wallace, B. C . et al. OpenMEE: intuitive, opensource software for metaanalysis in ecology and evolutionary biology. Methods Ecol. Evol. 8 , 941–947 (2016)
Gurevitch, J ., Morrison, J. A . & Hedges, L. V. The interaction between competition and predation: a metaanalysis of field experiments. Am. Nat. 155 , 435–453 (2000)
Adams, D. C ., Gurevitch, J . & Rosenberg, M. S. Resampling tests for metaanalysis of ecological data. Ecology 78 , 1277–1283 (1997)
Gurevitch, J . & Hedges, L. V. Statistical issues in ecological metaanalyses. Ecology 80 , 1142–1149 (1999)
Schmid, C. H . & Mengersen, K. in The Handbook of Metaanalysis in Ecology and Evolution (eds Koricheva, J . et al.) Ch. 11 , 145–173 (Princeton Univ. Press, 2013)
Eysenck, H. J. Exercise in megasilliness. Am. Psychol. 33 , 517 (1978)
Simberloff, D. Rejoinder to: Don’t calculate effect sizes; study ecological effects. Ecol. Lett. 9 , 921–922 (2006)
Cadotte, M. W ., Mehrkens, L. R . & Menge, D. N. L. Gauging the impact of metaanalysis on ecology. Evol. Ecol. 26 , 1153–1167 (2012)
Koricheva, J ., Jennions, M. D. & Lau, J. in The Handbook of Metaanalysis in Ecology and Evolution (eds Koricheva, J . et al.) Ch. 15 , 237–254 (Princeton Univ. Press, 2013)
Lau, J ., Ioannidis, J. P. A ., Terrin, N ., Schmid, C. H . & Olkin, I. The case of the misleading funnel plot. Br. Med. J. 333 , 597–600 (2006)
Vetter, D ., Rucker, G. & Storch, I. Metaanalysis: a need for welldefined usage in ecology and conservation biology. Ecosphere 4 , 1–24 (2013)
Mengersen, K ., Jennions, M. D. & Schmid, C. H. in The Handbook of Metaanalysis in Ecology and Evolution (eds Koricheva, J. et al.) Ch. 16 , 255–283 (Princeton Univ. Press, 2013)
Patsopoulos, N. A ., Analatos, A. A. & Ioannidis, J. P. A. Relative citation impact of various study designs in the health sciences. J. Am. Med. Assoc. 293 , 2362–2366 (2005)
Kueffer, C . et al. Fame, glory and neglect in metaanalyses. Trends Ecol. Evol. 26 , 493–494 (2011)
Cohnstaedt, L. W. & Poland, J. Review Articles: The blackmarket of scientific currency. Ann. Entomol. Soc. Am. 110 , 90 (2017)
Longo, D. L. & Drazen, J. M. Data sharing. N. Engl. J. Med. 374 , 276–277 (2016)
Gauch, H. G. Scientific Method in Practice (Cambridge Univ. Press, 2003)
Science Staff. Dealing with data: introduction. Challenges and opportunities. Science 331 , 692–693 (2011)
Nosek, B. A . et al. Promoting an open research culture. Science 348 , 1422–1425 (2015)
Article CAS ADS PubMed PubMed Central Google Scholar
Stewart, L. A . et al. Preferred reporting items for a systematic review and metaanalysis of individual participant data: the PRISMAIPD statement. J. Am. Med. Assoc. 313 , 1657–1665 (2015)
Saldanha, I. J . et al. Evaluating Data Abstraction Assistant, a novel software application for data abstraction during systematic reviews: protocol for a randomized controlled trial. Syst. Rev. 5 , 196 (2016)
Tipton, E. & Pustejovsky, J. E. Smallsample adjustments for tests of moderators and model fit using robust variance estimation in metaregression. J. Educ. Behav. Stat. 40 , 604–634 (2015)
Mengersen, K ., MacNeil, M. A . & Caley, M. J. The potential for metaanalysis to support decision analysis in ecology. Res. Synth. Methods 6 , 111–121 (2015)
Ashby, D. Bayesian statistics in medicine: a 25 year review. Stat. Med. 25 , 3589–3631 (2006)
Article MathSciNet PubMed Google Scholar
Senior, A. M . et al. Heterogeneity in ecological and evolutionary metaanalyses: its magnitude and implications. Ecology 97 , 3293–3299 (2016)
McAuley, L ., Pham, B ., Tugwell, P . & Moher, D. Does the inclusion of grey literature influence estimates of intervention effectiveness reported in metaanalyses? Lancet 356 , 1228–1231 (2000)
Koricheva, J ., Gurevitch, J . & Mengersen, K. (eds) The Handbook of MetaAnalysis in Ecology and Evolution (Princeton Univ. Press, 2013) This book provides the first comprehensive guide to undertaking metaanalyses in ecology and evolution and is also relevant to other fields where heterogeneity is expected, incorporating explicit consideration of the different approaches used in different domains.
Lumley, T. Network metaanalysis for indirect treatment comparisons. Stat. Med. 21 , 2313–2324 (2002)
Zarin, W . et al. Characteristics and knowledge synthesis approach for 456 network metaanalyses: a scoping review. BMC Med. 15 , 3 (2017)
Elliott, J. H . et al. Living systematic reviews: an emerging opportunity to narrow the evidencepractice gap. PLoS Med. 11 , e1001603 (2014)
Vandvik, P. O ., BrignardelloPetersen, R . & Guyatt, G. H. Living cumulative network metaanalysis to reduce waste in research: a paradigmatic shift for systematic reviews? BMC Med. 14 , 59 (2016)
Jarvinen, A. A metaanalytic study of the effects of female age on laying date and clutch size in the Great Tit Parus major and the Pied Flycatcher Ficedula hypoleuca . Ibis 133 , 62–67 (1991)
Arnqvist, G. & Wooster, D. Metaanalysis: synthesizing research findings in ecology and evolution. Trends Ecol. Evol. 10 , 236–240 (1995)
Hedges, L. V ., Gurevitch, J . & Curtis, P. S. The metaanalysis of response ratios in experimental ecology. Ecology 80 , 1150–1156 (1999)
Gurevitch, J ., Curtis, P. S. & Jones, M. H. Metaanalysis in ecology. Adv. Ecol. Res 32 , 199–247 (2001)
Lajeunesse, M. J. phyloMeta: a program for phylogenetic comparative analyses with metaanalysis. Bioinformatics 27 , 2603–2604 (2011)
CAS PubMed Google Scholar
Pearson, K. Report on certain enteric fever inoculation statistics. Br. Med. J. 2 , 1243–1246 (1904)
Fisher, R. A. Statistical Methods for Research Workers (Oliver and Boyd, 1925)
Yates, F. & Cochran, W. G. The analysis of groups of experiments. J. Agric. Sci. 28 , 556–580 (1938)
Cochran, W. G. The combination of estimates from different experiments. Biometrics 10 , 101–129 (1954)
Smith, M. L . & Glass, G. V. Metaanalysis of psychotherapy outcome studies. Am. Psychol. 32 , 752–760 (1977)
Glass, G. V. Metaanalysis at middle age: a personal history. Res. Synth. Methods 6 , 221–231 (2015)
Cooper, H. M ., Hedges, L. V . & Valentine, J. C. (eds) The Handbook of Research Synthesis and Metaanalysis 2nd edn (Russell Sage Foundation, 2009). This book is an important compilation that builds on the groundbreaking first edition to set the standard for best practice in metaanalysis, primarily in the social sciences but with applications to medicine and other fields.
Rosenthal, R. Metaanalytic Procedures for Social Research (Sage, 1991)
Hunter, J. E ., Schmidt, F. L. & Jackson, G. B. Metaanalysis: Cumulating Research Findings Across Studies (Sage, 1982)
Gurevitch, J ., Morrow, L. L ., Wallace, A . & Walsh, J. S. A metaanalysis of competition in field experiments. Am. Nat. 140 , 539–572 (1992). This influential early ecological metaanalysis reports multiple experimental outcomes on a longstanding and controversial topic that introduced a wide range of ecologists to research synthesis methods.
O’Rourke, K. An historical perspective on metaanalysis: dealing quantitatively with varying study results. J. R. Soc. Med. 100 , 579–582 (2007)
Shadish, W. R . & Lecy, J. D. The metaanalytic big bang. Res. Synth. Methods 6 , 246–264 (2015)
Glass, G. V. Primary, secondary, and metaanalysis of research. Educ. Res. 5 , 3–8 (1976)
DerSimonian, R . & Laird, N. Metaanalysis in clinical trials. Control. Clin. Trials 7 , 177–188 (1986)
Lipsey, M. W . & Wilson, D. B. The efficacy of psychological, educational, and behavioral treatment. Confirmation from metaanalysis. Am. Psychol. 48 , 1181–1209 (1993)
Chalmers, I. & Altman, D. G. Systematic Reviews (BMJ Publishing Group, 1995)
Moher, D . et al. Improving the quality of reports of metaanalyses of randomised controlled trials: the QUOROM statement. Quality of reporting of metaanalyses. Lancet 354 , 1896–1900 (1999)
Higgins, J. P. & Thompson, S. G. Quantifying heterogeneity in a metaanalysis. Stat. Med. 21 , 1539–1558 (2002)
Download references
Acknowledgements
We dedicate this Review to the memory of Ingram Olkin and William Shadish, founding members of the Society for Research Synthesis Methodology who made tremendous contributions to the development of metaanalysis and research synthesis and to the supervision of generations of students. We thank L. Lagisz for help in preparing the figures. We are grateful to the Center for Open Science and the Laura and John Arnold Foundation for hosting and funding a workshop, which was the origination of this article. S.N. is supported by Australian Research Council Future Fellowship (FT130100268). J.G. acknowledges funding from the US National Science Foundation (ABI 1262402).
Author information
Authors and affiliations.
Department of Ecology and Evolution, Stony Brook University, Stony Brook, 117945245, New York, USA
Jessica Gurevitch
School of Biological Sciences, Royal Holloway University of London, Egham, TW20 0EX, Surrey, UK
Julia Koricheva
Evolution and Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, 2052, New South Wales, Australia
Shinichi Nakagawa
Diabetes and Metabolism Division, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, 2010, New South Wales, Australia
School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK
Gavin Stewart
You can also search for this author in PubMed Google Scholar
Contributions
All authors contributed equally in designing the study and writing the manuscript, and so are listed alphabetically.
Corresponding authors
Correspondence to Jessica Gurevitch , Julia Koricheva , Shinichi Nakagawa or Gavin Stewart .
Ethics declarations
Competing interests.
The authors declare no competing financial interests.
Additional information
Reviewer Information Nature thanks D. Altman, M. Lajeunesse, D. Moher and G. Romero for their contribution to the peer review of this work.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
PowerPoint slides
Powerpoint slide for fig. 1, rights and permissions.
Reprints and permissions
About this article
Cite this article.
Gurevitch, J., Koricheva, J., Nakagawa, S. et al. Metaanalysis and the science of research synthesis. Nature 555 , 175–182 (2018). https://doi.org/10.1038/nature25753
Download citation
Received : 04 March 2017
Accepted : 12 January 2018
Published : 08 March 2018
Issue Date : 08 March 2018
DOI : https://doi.org/10.1038/nature25753
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt contentsharing initiative
This article is cited by
Accelerating evidence synthesis for safety assessment through clinicaltrials.gov platform: a feasibility study.
BMC Medical Research Methodology (2024)
Investigate the relationship between the retraction reasons and the quality of methodology in nonCochrane retracted systematic reviews: a systematic review
 Azita ShahrakiMohammadi
 Leila Keikha
 Razieh Zahedi
Systematic Reviews (2024)
A metaanalysis on global change drivers and the risk of infectious disease
 Michael B. Mahon
 Alexandra Sack
 Jason R. Rohr
Nature (2024)
Systematic review of the uncertainty of coral reef futures under climate change
 Shannon G. Klein
 Cassandra Roch
 Carlos M. Duarte
Nature Communications (2024)
Metaanalysis reveals weak associations between reef fishes and corals
 Pooventhran Muruga
 Alexandre C. Siqueira
 David R. Bellwood
Nature Ecology & Evolution (2024)
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Quick links
 Explore articles by subject
 Guide to authors
 Editorial policies
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
 How it works
MetaAnalysis – Guide with Definition, Steps & Examples
Published by Owen Ingram at April 26th, 2023 , Revised On April 26, 2023
“A metaanalysis is a formal, epidemiological, quantitative study design that uses statistical methods to generalise the findings of the selected independent studies. “
Metaanalysis and systematic review are the two most authentic strategies in research. When researchers start looking for the best available evidence concerning their research work, they are advised to begin from the top of the evidence pyramid. The evidence available in the form of metaanalysis or systematic reviews addressing important questions is significant in academics because it informs decisionmaking.
What is MetaAnalysis
Metaanalysis estimates the absolute effect of individual independent research studies by systematically synthesising or merging the results. Metaanalysis isn’t only about achieving a wider population by combining several smaller studies. It involves systematic methods to evaluate the inconsistencies in participants, variability (also known as heterogeneity), and findings to check how sensitive their findings are to the selected systematic review protocol.
When Should you Conduct a MetaAnalysis?
Metaanalysis has become a widelyused research method in medical sciences and other fields of work for several reasons. The technique involves summarising the results of independent systematic review studies.
The Cochrane Handbook explains that “an important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a metaanalysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention” (section 10.2).
A researcher or a practitioner should choose metaanalysis when the following outcomes are desirable.
For generating new hypotheses or ending controversies resulting from different research studies. Quantifying and evaluating the variable results and identifying the extent of conflict in literature through metaanalysis is possible.
To find research gaps left unfilled and address questions not posed by individual studies. Primary research studies involve specific types of participants and interventions. A review of these studies with variable characteristics and methodologies can allow the researcher to gauge the consistency of findings across a wider range of participants and interventions. With the help of metaanalysis, the reasons for differences in the effect can also be explored.
To provide convincing evidence. Estimating the effects with a larger sample size and interventions can provide convincing evidence. Many academic studies are based on a very small dataset, so the estimated intervention effects in isolation are not fully reliable.
Elements of a MetaAnalysis
Deeks et al. (2019), Haidilch (2010), and Grant & Booth (2009) explored the characteristics, strengths, and weaknesses of conducting the metaanalysis. They are briefly explained below.
Characteristics:
 A systematic review must be completed before conducting the metaanalysis because it provides a summary of the findings of the individual studies synthesised.
 You can only conduct a metaanalysis by synthesising studies in a systematic review.
 The studies selected for statistical analysis for the purpose of metaanalysis should be similar in terms of comparison, intervention, and population.
Strengths:
 A metaanalysis takes place after the systematic review. The end product is a comprehensive quantitative analysis that is complicated but reliable.
 It gives more value and weightage to existing studies that do not hold practical value on their own.
 Policymakers and academicians cannot base their decisions on individual research studies. Metaanalysis provides them with a complex and solid analysis of evidence to make informed decisions.
Criticisms:
 The metaanalysis uses studies exploring similar topics. Finding similar studies for the metaanalysis can be challenging.
 When and if biases in the individual studies or those related to reporting and specific research methodologies are involved, the metaanalysis results could be misleading.
Steps of Conducting the MetaAnalysis
The process of conducting the metaanalysis has remained a topic of debate among researchers and scientists. However, the following 5step process is widely accepted.
Step 1: Research Question
The first step in conducting clinical research involves identifying a research question and proposing a hypothesis . The potential clinical significance of the research question is then explained, and the study design and analytical plan are justified.
Step 2: Systematic Review
The purpose of a systematic review (SR) is to address a research question by identifying all relevant studies that meet the required quality standards for inclusion. While established journals typically serve as the primary source for identified studies, it is important to also consider unpublished data to avoid publication bias or the exclusion of studies with negative results.
While some metaanalyses may limit their focus to randomized controlled trials (RCTs) for the sake of obtaining the highest quality evidence, other experimental and quasiexperimental studies may be included if they meet the specific inclusion/exclusion criteria established for the review.
Step 3: Data Extraction
After selecting studies for the metaanalysis, researchers extract summary data or outcomes, as well as sample sizes and measures of data variability for both intervention and control groups. The choice of outcome measures depends on the research question and the type of study, and may include numerical or categorical measures.
For instance, numerical means may be used to report differences in scores on a questionnaire or changes in a measurement, such as blood pressure. In contrast, risk measures like odds ratios (OR) or relative risks (RR) are typically used to report differences in the probability of belonging to one category or another, such as vaginal birth versus cesarean birth.
Step 4: Standardisation and Weighting Studies
After gathering all the required data, the fourth step involves computing suitable summary measures from each study for further examination. These measures are typically referred to as Effect Sizes and indicate the difference in average scores between the control and intervention groups. For instance, it could be the variation in blood pressure changes between study participants who used drug X and those who used a placebo.
Since the units of measurement often differ across the included studies, standardization is necessary to create comparable effect size estimates. Standardization is accomplished by determining, for each study, the average score for the intervention group, subtracting the average score for the control group, and dividing the result by the relevant measure of variability in that dataset.
In some cases, the results of certain studies must carry more significance than others. Larger studies, as measured by their sample sizes, are deemed to produce more precise estimates of effect size than smaller studies. Additionally, studies with less variability in data, such as smaller standard deviation or narrower confidence intervals, are typically regarded as higher quality in study design. A weighting statistic that aims to incorporate both of these factors, known as inverse variance, is commonly employed.
Step 5: Absolute Effect Estimation
The ultimate step in conducting a metaanalysis is to choose and utilize an appropriate model for comparing Effect Sizes among diverse studies. Two popular models for this purpose are the Fixed Effects and Random Effects models. The Fixed Effects model relies on the premise that each study is evaluating a common treatment effect, implying that all studies would have estimated the same Effect Size if sample variability were equal across all studies.
Conversely, the Random Effects model posits that the true treatment effects in individual studies may vary from each other, and endeavors to consider this additional source of interstudy variation in Effect Sizes. The existence and magnitude of this latter variability is usually evaluated within the metaanalysis through a test for ‘heterogeneity.’
Forest Plot
The results of a metaanalysis are often visually presented using a “Forest Plot”. This type of plot displays, for each study, included in the analysis, a horizontal line that indicates the standardized Effect Size estimate and 95% confidence interval for the risk ratio used. Figure A provides an example of a hypothetical Forest Plot in which drug X reduces the risk of death in all three studies.
However, the first study was larger than the other two, and as a result, the estimates for the smaller studies were not statistically significant. This is indicated by the lines emanating from their boxes, including the value of 1. The size of the boxes represents the relative weights assigned to each study by the metaanalysis. The combined estimate of the drug’s effect, represented by the diamond, provides a more precise estimate of the drug’s effect, with the diamond indicating both the combined risk ratio estimate and the 95% confidence interval limits.
FigureA: Hypothetical Forest Plot
Relevance to Practice and Research
Evidence Based Nursing commentaries often include recently published systematic reviews and metaanalyses, as they can provide new insights and strengthen recommendations for effective healthcare practices. Additionally, they can identify gaps or limitations in current evidence and guide future research directions.
The quality of the data available for synthesis is a critical factor in the strength of conclusions drawn from metaanalyses, and this is influenced by the quality of individual studies and the systematic review itself. However, metaanalysis cannot overcome issues related to underpowered or poorly designed studies.
Therefore, clinicians may still encounter situations where the evidence is weak or uncertain, and where higherquality research is required to improve clinical decisionmaking. While such findings can be frustrating, they remain important for informing practice and highlighting the need for further research to fill gaps in the evidence base.
Methods and Assumptions in MetaAnalysis
Ensuring the credibility of findings is imperative in all types of research, including metaanalyses. To validate the outcomes of a metaanalysis, the researcher must confirm that the research techniques used were accurate in measuring the intended variables. Typically, researchers establish the validity of a metaanalysis by testing the outcomes for homogeneity or the degree of similarity between the results of the combined studies.
Homogeneity is preferred in metaanalyses as it allows the data to be combined without needing adjustments to suit the study’s requirements. To determine homogeneity, researchers assess heterogeneity, the opposite of homogeneity. Two widely used statistical methods for evaluating heterogeneity in research results are Cochran’sQ and ISquare, also known as I2 Index.
Difference Between MetaAnalysis and Systematic Reviews
Metaanalysis and systematic reviews are both research methods used to synthesise evidence from multiple studies on a particular topic. However, there are some key differences between the two.
Systematic reviews involve a comprehensive and structured approach to identifying, selecting, and critically appraising all available evidence relevant to a specific research question. This process involves searching multiple databases, screening the identified studies for relevance and quality, and summarizing the findings in a narrative report.
Metaanalysis, on the other hand, involves using statistical methods to combine and analyze the data from multiple studies, with the aim of producing a quantitative summary of the overall effect size. Metaanalysis requires the studies to be similar enough in terms of their design, methodology, and outcome measures to allow for meaningful comparison and analysis.
Therefore, systematic reviews are broader in scope and summarize the findings of all studies on a topic, while metaanalyses are more focused on producing a quantitative estimate of the effect size of an intervention across multiple studies that meet certain criteria. In some cases, a systematic review may be conducted without a metaanalysis if the studies are too diverse or the quality of the data is not sufficient to allow for statistical pooling.
Software Packages For MetaAnalysis
Metaanalysis can be done through software packages, including free and paid options. One of the most commonly used software packages for metaanalysis is RevMan by the Cochrane Collaboration.
Assessing the Quality of MetaAnalysis
Assessing the quality of a metaanalysis involves evaluating the methods used to conduct the analysis and the quality of the studies included. Here are some key factors to consider:
 Study selection: The studies included in the metaanalysis should be relevant to the research question and meet predetermined criteria for quality.
 Search strategy: The search strategy should be comprehensive and transparent, including databases and search terms used to identify relevant studies.
 Study quality assessment: The quality of included studies should be assessed using appropriate tools, and this assessment should be reported in the metaanalysis.
 Data extraction: The data extraction process should be systematic and clearly reported, including any discrepancies that arose.
 Analysis methods: The metaanalysis should use appropriate statistical methods to combine the results of the included studies, and these methods should be transparently reported.
 Publication bias: The potential for publication bias should be assessed and reported in the metaanalysis, including any efforts to identify and include unpublished studies.
 Interpretation of results: The results should be interpreted in the context of the study limitations and the overall quality of the evidence.
 Sensitivity analysis: Sensitivity analysis should be conducted to evaluate the impact of study quality, inclusion criteria, and other factors on the overall results.
Overall, a highquality metaanalysis should be transparent in its methods and clearly report the included studies’ limitations and the evidence’s overall quality.
Hire an Expert Writer
Orders completed by our expert writers are
 Formally drafted in an academic style
 Free Amendments and 100% Plagiarism Free – or your money back!
 100% Confidential and Timely Delivery!
 Free antiplagiarism report
 Appreciated by thousands of clients. Check client reviews
Examples of MetaAnalysis
 STANLEY T.D. et JARRELL S.B. (1989), « Metaregression analysis : a quantitative method of literature surveys », Journal of Economics Surveys, vol. 3, n°2, pp. 161170.
 DATTA D.K., PINCHES G.E. et NARAYANAN V.K. (1992), « Factors influencing wealth creation from mergers and acquisitions : a metaanalysis », Strategic Management Journal, Vol. 13, pp. 6784.
 GLASS G. (1983), « Synthesising empirical research : Metaanalysis » in S.A. Ward and L.J. Reed (Eds), Knowledge structure and use : Implications for synthesis and interpretation, Philadelphia : Temple University Press.
 WOLF F.M. (1986), Metaanalysis : Quantitative methods for research synthesis, Sage University Paper n°59.
 HUNTER J.E., SCHMIDT F.L. et JACKSON G.B. (1982), « Metaanalysis : cumulating research findings across studies », Beverly Hills, CA : Sage.
Frequently Asked Questions
What is a metaanalysis in research.
Metaanalysis is a statistical method used to combine results from multiple studies on a specific topic. By pooling data from various sources, metaanalysis can provide a more precise estimate of the effect size of a treatment or intervention and identify areas for future research.
Why is metaanalysis important?
Metaanalysis is important because it combines and summarizes results from multiple studies to provide a more precise and reliable estimate of the effect of a treatment or intervention. This helps clinicians and policymakers make evidencebased decisions and identify areas for further research.
What is an example of a metaanalysis?
A metaanalysis of studies evaluating physical exercise’s effect on depression in adults is an example. Researchers gathered data from 49 studies involving a total of 2669 participants. The studies used different types of exercise and measures of depression, which made it difficult to compare the results.
Through metaanalysis, the researchers calculated an overall effect size and determined that exercise was associated with a statistically significant reduction in depression symptoms. The study also identified that moderateintensity aerobic exercise, performed three to five times per week, was the most effective. The metaanalysis provided a more comprehensive understanding of the impact of exercise on depression than any single study could provide.
What is the definition of metaanalysis in clinical research?
Metaanalysis in clinical research is a statistical technique that combines data from multiple independent studies on a particular topic to generate a summary or “meta” estimate of the effect of a particular intervention or exposure.
This type of analysis allows researchers to synthesise the results of multiple studies, potentially increasing the statistical power and providing more precise estimates of treatment effects. Metaanalyses are commonly used in clinical research to evaluate the effectiveness and safety of medical interventions and to inform clinical practice guidelines.
Is metaanalysis qualitative or quantitative?
Metaanalysis is a quantitative method used to combine and analyze data from multiple studies. It involves the statistical synthesis of results from individual studies to obtain a pooled estimate of the effect size of a particular intervention or treatment. Therefore, metaanalysis is considered a quantitative approach to research synthesis.
You May Also Like
Quantitative research is associated with measurable numerical data. Qualitative research is where a researcher collects evidence to seek answers to a question.
Inductive and deductive reasoning takes into account assumptions and incidents. Here is all you need to know about inductive vs deductive reasoning.
Descriptive research is carried out to describe current issues, programs, and provides information about the issue through surveys and various factfinding methods.
USEFUL LINKS
LEARNING RESOURCES
COMPANY DETAILS
 How It Works
Systematic Reviews and Meta Analysis
 Getting Started
 Guides and Standards
 Review Protocols
 Databases and Sources
 Randomized Controlled Trials
 Controlled Clinical Trials
 Observational Designs
 Tests of Diagnostic Accuracy
 Software and Tools
 Where do I get all those articles?
 Collaborations
 EPI 233/528
 Countway Mediated Search
 Risk of Bias (RoB)
Systematic review Q & A
What is a systematic review.
A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and preselected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies. A welldesigned systematic review includes clear objectives, preselected criteria for identifying eligible studies, an explicit methodology, a thorough and reproducible search of the literature, an assessment of the validity or risk of bias of each included study, and a systematic synthesis, analysis and presentation of the findings of the included studies. A systematic review may include a metaanalysis.
For details about carrying out systematic reviews, see the Guides and Standards section of this guide.
Is my research topic appropriate for systematic review methods?
A systematic review is best deployed to test a specific hypothesis about a healthcare or public health intervention or exposure. By focusing on a single intervention or a few specific interventions for a particular condition, the investigator can ensure a manageable results set. Moreover, examining a single or small set of related interventions, exposures, or outcomes, will simplify the assessment of studies and the synthesis of the findings.
Systematic reviews are poor tools for hypothesis generation: for instance, to determine what interventions have been used to increase the awareness and acceptability of a vaccine or to investigate the ways that predictive analytics have been used in health care management. In the first case, we don't know what interventions to search for and so have to screen all the articles about awareness and acceptability. In the second, there is no agreed on set of methods that make up predictive analytics, and health care management is far too broad. The search will necessarily be incomplete, vague and very large all at the same time. In most cases, reviews without clearly and exactly specified populations, interventions, exposures, and outcomes will produce results sets that quickly outstrip the resources of a small team and offer no consistent way to assess and synthesize findings from the studies that are identified.
If not a systematic review, then what?
You might consider performing a scoping review . This framework allows iterative searching over a reduced number of data sources and no requirement to assess individual studies for risk of bias. The framework includes builtin mechanisms to adjust the analysis as the work progresses and more is learned about the topic. A scoping review won't help you limit the number of records you'll need to screen (broad questions lead to large results sets) but may give you means of dealing with a large set of results.
This tool can help you decide what kind of review is right for your question.
Can my student complete a systematic review during her summer project?
Probably not. Systematic reviews are a lot of work. Including creating the protocol, building and running a quality search, collecting all the papers, evaluating the studies that meet the inclusion criteria and extracting and analyzing the summary data, a well done review can require dozens to hundreds of hours of work that can span several months. Moreover, a systematic review requires subject expertise, statistical support and a librarian to help design and run the search. Be aware that librarians sometimes have queues for their search time. It may take several weeks to complete and run a search. Moreover, all guidelines for carrying out systematic reviews recommend that at least two subject experts screen the studies identified in the search. The first round of screening can consume 1 hour per screener for every 100200 records. A systematic review is a laborintensive team effort.
How can I know if my topic has been been reviewed already?
Before starting out on a systematic review, check to see if someone has done it already. In PubMed you can use the systematic review subset to limit to a broad group of papers that is enriched for systematic reviews. You can invoke the subset by selecting if from the Article Types filters to the left of your PubMed results, or you can append AND systematic[sb] to your search. For example:
"neoadjuvant chemotherapy" AND systematic[sb]
The systematic review subset is very noisy, however. To quickly focus on systematic reviews (knowing that you may be missing some), simply search for the word systematic in the title:
"neoadjuvant chemotherapy" AND systematic[ti]
Any PRISMAcompliant systematic review will be captured by this method since including the words "systematic review" in the title is a requirement of the PRISMA checklist. Cochrane systematic reviews do not include 'systematic' in the title, however. It's worth checking the Cochrane Database of Systematic Reviews independently.
You can also search for protocols that will indicate that another group has set out on a similar project. Many investigators will register their protocols in PROSPERO , a registry of review protocols. Other published protocols as well as Cochrane Review protocols appear in the Cochrane Methodology Register, a part of the Cochrane Library .
 Next: Guides and Standards >>
 Last Updated: Feb 26, 2024 3:17 PM
 URL: https://guides.library.harvard.edu/metaanalysis
  
Jump to navigation
Cochrane Training
Chapter 10: analysing data and undertaking metaanalyses.
Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group
Key Points:
 Metaanalysis is the statistical combination of results from two or more separate studies.
 Potential advantages of metaanalyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims. However, they also have the potential to mislead seriously, particularly if specific study designs, withinstudy biases, variation across studies, and reporting biases are not carefully considered.
 It is important to be familiar with the type of data (e.g. dichotomous, continuous) that result from measurement of an outcome in an individual study, and to choose suitable effect measures for comparing intervention groups.
 Most metaanalysis methods are variations on a weighted average of the effect estimates from the different studies.
 Studies with no events contribute no information about the risk ratio or odds ratio. For rare events, the Peto method has been observed to be less biased and more powerful than other methods.
 Variation across studies (heterogeneity) must be considered, although most Cochrane Reviews do not have enough studies to allow for the reliable investigation of its causes. Randomeffects metaanalyses allow for heterogeneity by assuming that underlying effects follow a normal distribution, but they must be interpreted carefully. Prediction intervals from randomeffects metaanalyses are a useful device for presenting the extent of betweenstudy variation.
 Many judgements are required in the process of preparing a metaanalysis. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.
Cite this chapter as: Deeks JJ, Higgins JPT, Altman DG (editors). Chapter 10: Analysing data and undertaking metaanalyses. In: Higgins JPT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, Welch VA (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane, 2023. Available from www.training.cochrane.org/handbook .
10.1 Do not start here!
It can be tempting to jump prematurely into a statistical analysis when undertaking a systematic review. The production of a diamond at the bottom of a plot is an exciting moment for many authors, but results of metaanalyses can be very misleading if suitable attention has not been given to formulating the review question; specifying eligibility criteria; identifying and selecting studies; collecting appropriate data; considering risk of bias; planning intervention comparisons; and deciding what data would be meaningful to analyse. Review authors should consult the chapters that precede this one before a metaanalysis is undertaken.
10.2 Introduction to metaanalysis
An important step in a systematic review is the thoughtful consideration of whether it is appropriate to combine the numerical results of all, or perhaps some, of the studies. Such a metaanalysis yields an overall statistic (together with its confidence interval) that summarizes the effectiveness of an experimental intervention compared with a comparator intervention. Potential advantages of metaanalyses include the following:
 T o improve precision . Many studies are too small to provide convincing evidence about intervention effects in isolation. Estimation is usually improved when it is based on more information.
 To answer questions not posed by the individual studies . Primary studies often involve a specific type of participant and explicitly defined interventions. A selection of studies in which these characteristics differ can allow investigation of the consistency of effect across a wider range of populations and interventions. It may also, if relevant, allow reasons for differences in effect estimates to be investigated.
 To settle controversies arising from apparently conflicting studies or to generate new hypotheses . Statistical synthesis of findings allows the degree of conflict to be formally assessed, and reasons for different results to be explored and quantified.
Of course, the use of statistical synthesis methods does not guarantee that the results of a review are valid, any more than it does for a primary study. Moreover, like any tool, statistical methods can be misused.
This chapter describes the principles and methods used to carry out a metaanalysis for a comparison of two interventions for the main types of data encountered. The use of network metaanalysis to compare more than two interventions is addressed in Chapter 11 . Formulae for most of the methods described are provided in the RevMan Web Knowledge Base under Statistical Algorithms and calculations used in Review Manager (documentation.cochrane.org/revmankb/statisticalmethods210600101.html), and a longer discussion of many of the issues is available ( Deeks et al 2001 ).
10.2.1 Principles of metaanalysis
The commonly used methods for metaanalysis follow the following basic principles:
 Metaanalysis is typically a twostage process. In the first stage, a summary statistic is calculated for each study, to describe the observed intervention effect in the same way for every study. For example, the summary statistic may be a risk ratio if the data are dichotomous, or a difference between means if the data are continuous (see Chapter 6 ).
 The combination of intervention effect estimates across studies may optionally incorporate an assumption that the studies are not all estimating the same intervention effect, but estimate intervention effects that follow a distribution across studies. This is the basis of a randomeffects metaanalysis (see Section 10.10.4 ). Alternatively, if it is assumed that each study is estimating exactly the same quantity, then a fixedeffect metaanalysis is performed.
 The standard error of the summary intervention effect can be used to derive a confidence interval, which communicates the precision (or uncertainty) of the summary estimate; and to derive a P value, which communicates the strength of the evidence against the null hypothesis of no intervention effect.
 As well as yielding a summary quantification of the intervention effect, all methods of metaanalysis can incorporate an assessment of whether the variation among the results of the separate studies is compatible with random variation, or whether it is large enough to indicate inconsistency of intervention effects across studies (see Section 10.10 ).
 The problem of missing data is one of the numerous practical considerations that must be thought through when undertaking a metaanalysis. In particular, review authors should consider the implications of missing outcome data from individual participants (due to losses to followup or exclusions from analysis) (see Section 10.12 ).
Metaanalyses are usually illustrated using a forest plot . An example appears in Figure 10.2.a . A forest plot displays effect estimates and confidence intervals for both individual studies and metaanalyses (Lewis and Clarke 2001). Each study is represented by a block at the point estimate of intervention effect with a horizontal line extending either side of the block. The area of the block indicates the weight assigned to that study in the metaanalysis while the horizontal line depicts the confidence interval (usually with a 95% level of confidence). The area of the block and the confidence interval convey similar information, but both make different contributions to the graphic. The confidence interval depicts the range of intervention effects compatible with the study’s result. The size of the block draws the eye towards the studies with larger weight (usually those with narrower confidence intervals), which dominate the calculation of the summary result, presented as a diamond at the bottom.
Figure 10.2.a Example of a forest plot from a review of interventions to promote ownership of smoke alarms (DiGuiseppi and Higgins 2001). Reproduced with permission of John Wiley & Sons
10.3 A generic inversevariance approach to metaanalysis
A very common and simple version of the metaanalysis procedure is commonly referred to as the inversevariance method . This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many metaanalyses of both dichotomous and continuous data.
The inversevariance method is so named because the weight given to each study is chosen to be the inverse of the variance of the effect estimate (i.e. 1 over the square of its standard error). Thus, larger studies, which have smaller standard errors, are given more weight than smaller studies, which have larger standard errors. This choice of weights minimizes the imprecision (uncertainty) of the pooled effect estimate.
10.3.1 Fixedeffect method for metaanalysis
A fixedeffect metaanalysis using the inversevariance method calculates a weighted average as:
where Y i is the intervention effect estimated in the i th study, SE i is the standard error of that estimate, and the summation is across all studies. The basic data required for the analysis are therefore an estimate of the intervention effect and its standard error from each study. A fixedeffect metaanalysis is valid under an assumption that all effect estimates are estimating the same underlying intervention effect, which is referred to variously as a ‘fixedeffect’ assumption, a ‘commoneffect’ assumption or an ‘equaleffects’ assumption. However, the result of the metaanalysis can be interpreted without making such an assumption (Rice et al 2018).
10.3.2 Randomeffects methods for metaanalysis
A variation on the inversevariance method is to incorporate an assumption that the different studies are estimating different, yet related, intervention effects (Higgins et al 2009). This produces a randomeffects metaanalysis, and the simplest version is known as the DerSimonian and Laird method (DerSimonian and Laird 1986). Randomeffects metaanalysis is discussed in detail in Section 10.10.4 .
10.3.3 Performing inversevariance metaanalyses
Most metaanalysis programs perform inversevariance metaanalyses. Usually the user provides summary data from each intervention arm of each study, such as a 2×2 table when the outcome is dichotomous (see Chapter 6, Section 6.4 ), or means, standard deviations and sample sizes for each group when the outcome is continuous (see Chapter 6, Section 6.5 ). This avoids the need for the author to calculate effect estimates, and allows the use of methods targeted specifically at different types of data (see Sections 10.4 and 10.5 ).
When the data are conveniently available as summary statistics from each intervention group, the inversevariance method can be implemented directly. For example, estimates and their standard errors may be entered directly into RevMan under the ‘Generic inverse variance’ outcome type. For ratio measures of intervention effect, the data must be entered into RevMan as natural logarithms (for example, as a log odds ratio and the standard error of the log odds ratio). However, it is straightforward to instruct the software to display results on the original (e.g. odds ratio) scale. It is possible to supplement or replace this with a column providing the sample sizes in the two groups. Note that the ability to enter estimates and standard errors creates a high degree of flexibility in metaanalysis. It facilitates the analysis of properly analysed crossover trials, clusterrandomized trials and nonrandomized trials (see Chapter 23 ), as well as outcome data that are ordinal, timetoevent or rates (see Chapter 6 ).
10.4 Metaanalysis of dichotomous outcomes
There are four widely used methods of metaanalysis for dichotomous outcomes, three fixedeffect methods (MantelHaenszel, Peto and inverse variance) and one randomeffects method (DerSimonian and Laird inverse variance). All of these methods are available as analysis options in RevMan. The Peto method can only combine odds ratios, whilst the other three methods can combine odds ratios, risk ratios or risk differences. Formulae for all of the metaanalysis methods are available elsewhere (Deeks et al 2001).
Note that having no events in one group (sometimes referred to as ‘zero cells’) causes problems with computation of estimates and standard errors with some methods: see Section 10.4.4 .
10.4.1 MantelHaenszel methods
When data are sparse, either in terms of event risks being low or study size being small, the estimates of the standard errors of the effect estimates that are used in the inversevariance methods may be poor. MantelHaenszel methods are fixedeffect metaanalysis methods using a different weighting scheme that depends on which effect measure (e.g. risk ratio, odds ratio, risk difference) is being used (Mantel and Haenszel 1959, Greenland and Robins 1985). They have been shown to have better statistical properties when there are few events. As this is a common situation in Cochrane Reviews, the MantelHaenszel method is generally preferable to the inverse variance method in fixedeffect metaanalyses. In other situations the two methods give similar estimates.
10.4.2 Peto odds ratio method
Peto’s method can only be used to combine odds ratios (Yusuf et al 1985). It uses an inversevariance approach, but uses an approximate method of estimating the log odds ratio, and uses different weights. An alternative way of viewing the Peto method is as a sum of ‘O – E’ statistics. Here, O is the observed number of events and E is an expected number of events in the experimental intervention group of each study under the null hypothesis of no intervention effect.
The approximation used in the computation of the log odds ratio works well when intervention effects are small (odds ratios are close to 1), events are not particularly common and the studies have similar numbers in experimental and comparator groups. In other situations it has been shown to give biased answers. As these criteria are not always fulfilled, Peto’s method is not recommended as a default approach for metaanalysis.
Corrections for zero cell counts are not necessary when using Peto’s method. Perhaps for this reason, this method performs well when events are very rare (Bradburn et al 2007); see Section 10.4.4.1 . Also, Peto’s method can be used to combine studies with dichotomous outcome data with studies using timetoevent analyses where logrank tests have been used (see Section 10.9 ).
10.4.3 Which effect measure for dichotomous outcomes?
Effect measures for dichotomous data are described in Chapter 6, Section 6.4.1 . The effect of an intervention can be expressed as either a relative or an absolute effect. The risk ratio (relative risk) and odds ratio are relative measures, while the risk difference and number needed to treat for an additional beneficial outcome are absolute measures. A further complication is that there are, in fact, two risk ratios. We can calculate the risk ratio of an event occurring or the risk ratio of no event occurring. These give different summary results in a metaanalysis, sometimes dramatically so.
The selection of a summary statistic for use in metaanalysis depends on balancing three criteria (Deeks 2002). First, we desire a summary statistic that gives values that are similar for all the studies in the metaanalysis and subdivisions of the population to which the interventions will be applied. The more consistent the summary statistic, the greater is the justification for expressing the intervention effect as a single summary number. Second, the summary statistic must have the mathematical properties required to perform a valid metaanalysis. Third, the summary statistic would ideally be easily understood and applied by those using the review. The summary intervention effect should be presented in a way that helps readers to interpret and apply the results appropriately. Among effect measures for dichotomous data, no single measure is uniformly best, so the choice inevitably involves a compromise.
Consistency Empirical evidence suggests that relative effect measures are, on average, more consistent than absolute measures (Engels et al 2000, Deeks 2002, Rücker et al 2009). For this reason, it is wise to avoid performing metaanalyses of risk differences, unless there is a clear reason to suspect that risk differences will be consistent in a particular clinical situation. On average there is little difference between the odds ratio and risk ratio in terms of consistency (Deeks 2002). When the study aims to reduce the incidence of an adverse event, there is empirical evidence that risk ratios of the adverse event are more consistent than risk ratios of the nonevent (Deeks 2002). Selecting an effect measure based on what is the most consistent in a particular situation is not a generally recommended strategy, since it may lead to a selection that spuriously maximizes the precision of a metaanalysis estimate.
Mathematical properties The most important mathematical criterion is the availability of a reliable variance estimate. The number needed to treat for an additional beneficial outcome does not have a simple variance estimator and cannot easily be used directly in metaanalysis, although it can be computed from the metaanalysis result afterwards (see Chapter 15, Section 15.4.2 ). There is no consensus regarding the importance of two other oftencited mathematical properties: the fact that the behaviour of the odds ratio and the risk difference do not rely on which of the two outcome states is coded as the event, and the odds ratio being the only statistic which is unbounded (see Chapter 6, Section 6.4.1 ).
Ease of interpretation The odds ratio is the hardest summary statistic to understand and to apply in practice, and many practising clinicians report difficulties in using them. There are many published examples where authors have misinterpreted odds ratios from metaanalyses as risk ratios. Although odds ratios can be reexpressed for interpretation (as discussed here), there must be some concern that routine presentation of the results of systematic reviews as odds ratios will lead to frequent overestimation of the benefits and harms of interventions when the results are applied in clinical practice. Absolute measures of effect are thought to be more easily interpreted by clinicians than relative effects (Sinclair and Bracken 1994), and allow tradeoffs to be made between likely benefits and likely harms of interventions. However, they are less likely to be generalizable.
It is generally recommended that metaanalyses are undertaken using risk ratios (taking care to make a sensible choice over which category of outcome is classified as the event) or odds ratios. This is because it seems important to avoid using summary statistics for which there is empirical evidence that they are unlikely to give consistent estimates of intervention effects (the risk difference), and it is impossible to use statistics for which metaanalysis cannot be performed (the number needed to treat for an additional beneficial outcome). It may be wise to plan to undertake a sensitivity analysis to investigate whether choice of summary statistic (and selection of the event category) is critical to the conclusions of the metaanalysis (see Section 10.14 ).
It is often sensible to use one statistic for metaanalysis and to reexpress the results using a second, more easily interpretable statistic. For example, often metaanalysis may be best performed using relative effect measures (risk ratios or odds ratios) and the results reexpressed using absolute effect measures (risk differences or numbers needed to treat for an additional beneficial outcome – see Chapter 15, Section 15.4 . This is one of the key motivations for ‘Summary of findings’ tables in Cochrane Reviews: see Chapter 14 ). If odds ratios are used for metaanalysis they can also be reexpressed as risk ratios (see Chapter 15, Section 15.4 ). In all cases the same formulae can be used to convert upper and lower confidence limits. However, all of these transformations require specification of a value of baseline risk that indicates the likely risk of the outcome in the ‘control’ population to which the experimental intervention will be applied. Where the chosen value for this assumed comparator group risk is close to the typical observed comparator group risks across the studies, similar estimates of absolute effect will be obtained regardless of whether odds ratios or risk ratios are used for metaanalysis. Where the assumed comparator risk differs from the typical observed comparator group risk, the predictions of absolute benefit will differ according to which summary statistic was used for metaanalysis.
10.4.4 Metaanalysis of rare events
For rare outcomes, metaanalysis may be the only way to obtain reliable evidence of the effects of healthcare interventions. Individual studies are usually underpowered to detect differences in rare outcomes, but a metaanalysis of many studies may have adequate power to investigate whether interventions do have an impact on the incidence of the rare event. However, many methods of metaanalysis are based on large sample approximations, and are unsuitable when events are rare. Thus authors must take care when selecting a method of metaanalysis (Efthimiou 2018).
There is no single risk at which events are classified as ‘rare’. Certainly risks of 1 in 1000 constitute rare events, and many would classify risks of 1 in 100 the same way. However, the performance of methods when risks are as high as 1 in 10 may also be affected by the issues discussed in this section. What is typical is that a high proportion of the studies in the metaanalysis observe no events in one or more study arms.
10.4.4.1 Studies with no events in one or more arms
Computational problems can occur when no events are observed in one or both groups in an individual study. Inverse variance metaanalytical methods involve computing an intervention effect estimate and its standard error for each study. For studies where no events were observed in one or both arms, these computations often involve dividing by a zero count, which yields a computational error. Most metaanalytical software routines (including those in RevMan) automatically check for problematic zero counts, and add a fixed value (typically 0.5) to all cells of a 2×2 table where the problems occur. The MantelHaenszel methods require zerocell corrections only if the same cell is zero in all the included studies, and hence need to use the correction less often. However, in many software applications the same correction rules are applied for MantelHaenszel methods as for the inversevariance methods. Odds ratio and risk ratio methods require zero cell corrections more often than difference methods, except for the Peto odds ratio method, which encounters computation problems only in the extreme situation of no events occurring in all arms of all studies.
Whilst the fixed correction meets the objective of avoiding computational errors, it usually has the undesirable effect of biasing study estimates towards no difference and overestimating variances of study estimates (consequently downweighting inappropriately their contribution to the metaanalysis). Where the sizes of the study arms are unequal (which occurs more commonly in nonrandomized studies than randomized trials), they will introduce a directional bias in the treatment effect. Alternative nonfixed zerocell corrections have been explored by Sweeting and colleagues, including a correction proportional to the reciprocal of the size of the contrasting study arm, which they found preferable to the fixed 0.5 correction when arm sizes were not balanced (Sweeting et al 2004).
10.4.4.2 Studies with no events in either arm
The standard practice in metaanalysis of odds ratios and risk ratios is to exclude studies from the metaanalysis where there are no events in both arms. This is because such studies do not provide any indication of either the direction or magnitude of the relative treatment effect. Whilst it may be clear that events are very rare on both the experimental intervention and the comparator intervention, no information is provided as to which group is likely to have the higher risk, or on whether the risks are of the same or different orders of magnitude (when risks are very low, they are compatible with very large or very small ratios). Whilst one might be tempted to infer that the risk would be lowest in the group with the larger sample size (as the upper limit of the confidence interval would be lower), this is not justified as the sample size allocation was determined by the study investigators and is not a measure of the incidence of the event.
Risk difference methods superficially appear to have an advantage over odds ratio methods in that the risk difference is defined (as zero) when no events occur in either arm. Such studies are therefore included in the estimation process. Bradburn and colleagues undertook simulation studies which revealed that all risk difference methods yield confidence intervals that are too wide when events are rare, and have associated poor statistical power, which make them unsuitable for metaanalysis of rare events (Bradburn et al 2007). This is especially relevant when outcomes that focus on treatment safety are being studied, as the ability to identify correctly (or attempt to refute) serious adverse events is a key issue in drug development.
It is likely that outcomes for which no events occur in either arm may not be mentioned in reports of many randomized trials, precluding their inclusion in a metaanalysis. It is unclear, though, when working with published results, whether failure to mention a particular adverse event means there were no such events, or simply that such events were not included as a measured endpoint. Whilst the results of risk difference metaanalyses will be affected by nonreporting of outcomes with no events, odds and risk ratio based methods naturally exclude these data whether or not they are published, and are therefore unaffected.
10.4.4.3 Validity of methods of metaanalysis for rare events
Simulation studies have revealed that many metaanalytical methods can give misleading results for rare events, which is unsurprising given their reliance on asymptotic statistical theory. Their performance has been judged suboptimal either through results being biased, confidence intervals being inappropriately wide, or statistical power being too low to detect substantial differences.
In the following we consider the choice of statistical method for metaanalyses of odds ratios. Appropriate choices appear to depend on the comparator group risk, the likely size of the treatment effect and consideration of balance in the numbers of experimental and comparator participants in the constituent studies. We are not aware of research that has evaluated risk ratio measures directly, but their performance is likely to be very similar to corresponding odds ratio measurements. When events are rare, estimates of odds and risks are near identical, and results of both can be interpreted as ratios of probabilities.
Bradburn and colleagues found that many of the most commonly used metaanalytical methods were biased when events were rare (Bradburn et al 2007). The bias was greatest in inverse variance and DerSimonian and Laird odds ratio and risk difference methods, and the MantelHaenszel odds ratio method using a 0.5 zerocell correction. As already noted, risk difference metaanalytical methods tended to show conservative confidence interval coverage and low statistical power when risks of events were low.
At event rates below 1% the Peto onestep odds ratio method was found to be the least biased and most powerful method, and provided the best confidence interval coverage, provided there was no substantial imbalance between treatment and comparator group sizes within studies, and treatment effects were not exceptionally large. This finding was consistently observed across three different metaanalytical scenarios, and was also observed by Sweeting and colleagues (Sweeting et al 2004).
This finding was noted despite the method producing only an approximation to the odds ratio. For very large effects (e.g. risk ratio=0.2) when the approximation is known to be poor, treatment effects were underestimated, but the Peto method still had the best performance of all the methods considered for event risks of 1 in 1000, and the bias was never more than 6% of the comparator group risk.
In other circumstances (i.e. event risks above 1%, very large effects at event risks around 1%, and metaanalyses where many studies were substantially imbalanced) the best performing methods were the MantelHaenszel odds ratio without zerocell corrections, logistic regression and an exact method. None of these methods is available in RevMan.
Methods that should be avoided with rare events are the inversevariance methods (including the DerSimonian and Laird randomeffects method) (Efthimiou 2018). These directly incorporate the study’s variance in the estimation of its contribution to the metaanalysis, but these are usually based on a largesample variance approximation, which was not intended for use with rare events. We would suggest that incorporation of heterogeneity into an estimate of a treatment effect should be a secondary consideration when attempting to produce estimates of effects from sparse data – the primary concern is to discern whether there is any signal of an effect in the data.
10.5 Metaanalysis of continuous outcomes
An important assumption underlying standard methods for metaanalysis of continuous data is that the outcomes have a normal distribution in each intervention arm in each study. This assumption may not always be met, although it is unimportant in very large studies. It is useful to consider the possibility of skewed data (see Section 10.5.3 ).
10.5.1 Which effect measure for continuous outcomes?
The two summary statistics commonly used for metaanalysis of continuous data are the mean difference (MD) and the standardized mean difference (SMD). Other options are available, such as the ratio of means (see Chapter 6, Section 6.5.1 ). Selection of summary statistics for continuous data is principally determined by whether studies all report the outcome using the same scale (when the mean difference can be used) or using different scales (when the standardized mean difference is usually used). The ratio of means can be used in either situation, but is appropriate only when outcome measurements are strictly greater than zero. Further considerations in deciding on an effect measure that will facilitate interpretation of the findings appears in Chapter 15, Section 15.5 .
The different roles played in MD and SMD approaches by the standard deviations (SDs) of outcomes observed in the two groups should be understood.
For the mean difference approach, the SDs are used together with the sample sizes to compute the weight given to each study. Studies with small SDs are given relatively higher weight whilst studies with larger SDs are given relatively smaller weights. This is appropriate if variation in SDs between studies reflects differences in the reliability of outcome measurements, but is probably not appropriate if the differences in SD reflect real differences in the variability of outcomes in the study populations.
For the standardized mean difference approach, the SDs are used to standardize the mean differences to a single scale, as well as in the computation of study weights. Thus, studies with small SDs lead to relatively higher estimates of SMD, whilst studies with larger SDs lead to relatively smaller estimates of SMD. For this to be appropriate, it must be assumed that betweenstudy variation in SDs reflects only differences in measurement scales and not differences in the reliability of outcome measures or variability among study populations, as discussed in Chapter 6, Section 6.5.1.2 .
These assumptions of the methods should be borne in mind when unexpected variation of SDs is observed across studies.
10.5.2 Metaanalysis of change scores
In some circumstances an analysis based on changes from baseline will be more efficient and powerful than comparison of postintervention values, as it removes a component of betweenperson variability from the analysis. However, calculation of a change score requires measurement of the outcome twice and in practice may be less efficient for outcomes that are unstable or difficult to measure precisely, where the measurement error may be larger than true betweenperson baseline variability. Changefrombaseline outcomes may also be preferred if they have a less skewed distribution than postintervention measurement outcomes. Although sometimes used as a device to ‘correct’ for unlucky randomization, this practice is not recommended.
The preferred statistical approach to accounting for baseline measurements of the outcome variable is to include the baseline outcome measurements as a covariate in a regression model or analysis of covariance (ANCOVA). These analyses produce an ‘adjusted’ estimate of the intervention effect together with its standard error. These analyses are the least frequently encountered, but as they give the most precise and least biased estimates of intervention effects they should be included in the analysis when they are available. However, they can only be included in a metaanalysis using the generic inversevariance method, since means and SDs are not available for each intervention group separately.
In practice an author is likely to discover that the studies included in a review include a mixture of changefrombaseline and postintervention value scores. However, mixing of outcomes is not a problem when it comes to metaanalysis of MDs. There is no statistical reason why studies with changefrombaseline outcomes should not be combined in a metaanalysis with studies with postintervention measurement outcomes when using the (unstandardized) MD method. In a randomized study, MD based on changes from baseline can usually be assumed to be addressing exactly the same underlying intervention effects as analyses based on postintervention measurements. That is to say, the difference in mean postintervention values will on average be the same as the difference in mean change scores. If the use of change scores does increase precision, appropriately, the studies presenting change scores will be given higher weights in the analysis than they would have received if postintervention values had been used, as they will have smaller SDs.
When combining the data on the MD scale, authors must be careful to use the appropriate means and SDs (either of postintervention measurements or of changes from baseline) for each study. Since the mean values and SDs for the two types of outcome may differ substantially, it may be advisable to place them in separate subgroups to avoid confusion for the reader, but the results of the subgroups can legitimately be pooled together.
In contrast, postintervention value and change scores should not in principle be combined using standard metaanalysis approaches when the effect measure is an SMD. This is because the SDs used in the standardization reflect different things. The SD when standardizing postintervention values reflects betweenperson variability at a single point in time. The SD when standardizing change scores reflects variation in betweenperson changes over time, so will depend on both withinperson and betweenperson variability; withinperson variability in turn is likely to depend on the length of time between measurements. Nevertheless, an empirical study of 21 metaanalyses in osteoarthritis did not find a difference between combined SMDs based on postintervention values and combined SMDs based on change scores (da Costa et al 2013). One option is to standardize SMDs using postintervention SDs rather than change score SDs. This would lead to valid synthesis of the two approaches, but we are not aware that an appropriate standard error for this has been derived.
A common practical problem associated with including changefrombaseline measures is that the SD of changes is not reported. Imputation of SDs is discussed in Chapter 6, Section 6.5.2.8 .
10.5.3 Metaanalysis of skewed data
Analyses based on means are appropriate for data that are at least approximately normally distributed, and for data from very large trials. If the true distribution of outcomes is asymmetrical, then the data are said to be skewed. Review authors should consider the possibility and implications of skewed data when analysing continuous outcomes (see MECIR Box 10.5.a ). Skew can sometimes be diagnosed from the means and SDs of the outcomes. A rough check is available, but it is only valid if a lowest or highest possible value for an outcome is known to exist. Thus, the check may be used for outcomes such as weight, volume and blood concentrations, which have lowest possible values of 0, or for scale outcomes with minimum or maximum scores, but it may not be appropriate for changefrombaseline measures. The check involves calculating the observed mean minus the lowest possible value (or the highest possible value minus the observed mean), and dividing this by the SD. A ratio less than 2 suggests skew (Altman and Bland 1996). If the ratio is less than 1, there is strong evidence of a skewed distribution.
Transformation of the original outcome data may reduce skew substantially. Reports of trials may present results on a transformed scale, usually a log scale. Collection of appropriate data summaries from the trialists, or acquisition of individual patient data, is currently the approach of choice. Appropriate data summaries and analysis strategies for the individual patient data will depend on the situation. Consultation with a knowledgeable statistician is advised.
Where data have been analysed on a log scale, results are commonly presented as geometric means and ratios of geometric means. A metaanalysis may be then performed on the scale of the logtransformed data; an example of the calculation of the required means and SD is given in Chapter 6, Section 6.5.2.4 . This approach depends on being able to obtain transformed data for all studies; methods for transforming from one scale to the other are available (Higgins et al 2008b). Logtransformed and untransformed data should not be mixed in a metaanalysis.
MECIR Box 10.5.a Relevant expectations for conduct of intervention reviews
Addressing skewed data ( )  
 Skewed data are sometimes not summarized usefully by means and standard deviations. While statistical methods are approximately valid for large sample sizes, skewed outcome data can lead to misleading results when studies are small. 
10.6 Combining dichotomous and continuous outcomes
Occasionally authors encounter a situation where data for the same outcome are presented in some studies as dichotomous data and in other studies as continuous data. For example, scores on depression scales can be reported as means, or as the percentage of patients who were depressed at some point after an intervention (i.e. with a score above a specified cutpoint). This type of information is often easier to understand, and more helpful, when it is dichotomized. However, deciding on a cutpoint may be arbitrary, and information is lost when continuous data are transformed to dichotomous data.
There are several options for handling combinations of dichotomous and continuous data. Generally, it is useful to summarize results from all the relevant, valid studies in a similar way, but this is not always possible. It may be possible to collect missing data from investigators so that this can be done. If not, it may be useful to summarize the data in three ways: by entering the means and SDs as continuous outcomes, by entering the counts as dichotomous outcomes and by entering all of the data in text form as ‘Other data’ outcomes.
There are statistical approaches available that will reexpress odds ratios as SMDs (and vice versa), allowing dichotomous and continuous data to be combined (AnzuresCabrera et al 2011). A simple approach is as follows. Based on an assumption that the underlying continuous measurements in each intervention group follow a logistic distribution (which is a symmetrical distribution similar in shape to the normal distribution, but with more data in the distributional tails), and that the variability of the outcomes is the same in both experimental and comparator participants, the odds ratios can be reexpressed as a SMD according to the following simple formula (Chinn 2000):
The standard error of the log odds ratio can be converted to the standard error of a SMD by multiplying by the same constant (√3/π=0.5513). Alternatively SMDs can be reexpressed as log odds ratios by multiplying by π/√3=1.814. Once SMDs (or log odds ratios) and their standard errors have been computed for all studies in the metaanalysis, they can be combined using the generic inversevariance method. Standard errors can be computed for all studies by entering the data as dichotomous and continuous outcome type data, as appropriate, and converting the confidence intervals for the resulting log odds ratios and SMDs into standard errors (see Chapter 6, Section 6.3 ).
10.7 Metaanalysis of ordinal outcomes and measurement scale s
Ordinal and measurement scale outcomes are most commonly metaanalysed as dichotomous data (if so, see Section 10.4 ) or continuous data (if so, see Section 10.5 ) depending on the way that the study authors performed the original analyses.
Occasionally it is possible to analyse the data using proportional odds models. This is the case when ordinal scales have a small number of categories, the numbers falling into each category for each intervention group can be obtained, and the same ordinal scale has been used in all studies. This approach may make more efficient use of all available data than dichotomization, but requires access to statistical software and results in a summary statistic for which it is challenging to find a clinical meaning.
The proportional odds model uses the proportional odds ratio as the measure of intervention effect (Agresti 1996) (see Chapter 6, Section 6.6 ), and can be used for conducting a metaanalysis in advanced statistical software packages (Whitehead and Jones 1994). Estimates of log odds ratios and their standard errors from a proportional odds model may be metaanalysed using the generic inversevariance method (see Section 10.3.3 ). If the same ordinal scale has been used in all studies, but in some reports has been presented as a dichotomous outcome, it may still be possible to include all studies in the metaanalysis. In the context of the threecategory model, this might mean that for some studies category 1 constitutes a success, while for others both categories 1 and 2 constitute a success. Methods are available for dealing with this, and for combining data from scales that are related but have different definitions for their categories (Whitehead and Jones 1994).
10.8 Metaanalysis of counts and rates
Results may be expressed as count data when each participant may experience an event, and may experience it more than once (see Chapter 6, Section 6.7 ). For example, ‘number of strokes’, or ‘number of hospital visits’ are counts. These events may not happen at all, but if they do happen there is no theoretical maximum number of occurrences for an individual. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual (see Section 10.4 ), continuous data (see Section 10.5 ) and timetoevent data (see Section 10.9 ), as well as being analysed as rate data.
Rate data occur if counts are measured for each participant along with the time over which they are observed. This is particularly appropriate when the events being counted are rare. For example, a woman may experience two strokes during a followup period of two years. Her rate of strokes is one per year of followup (or, equivalently 0.083 per month of followup). Rates are conventionally summarized at the group level. For example, participants in the comparator group of a clinical trial may experience 85 strokes during a total of 2836 personyears of followup. An underlying assumption associated with the use of rates is that the risk of an event is constant across participants and over time. This assumption should be carefully considered for each situation. For example, in contraception studies, rates have been used (known as Pearl indices) to describe the number of pregnancies per 100 womenyears of followup. This is now considered inappropriate since couples have different risks of conception, and the risk for each woman changes over time. Pregnancies are now analysed more often using life tables or timetoevent methods that investigate the time elapsing before the first pregnancy.
Analysing count data as rates is not always the most appropriate approach and is uncommon in practice. This is because:
 the assumption of a constant underlying risk may not be suitable; and
 the statistical methods are not as well developed as they are for other types of data.
The results of a study may be expressed as a rate ratio , that is the ratio of the rate in the experimental intervention group to the rate in the comparator group. The (natural) logarithms of the rate ratios may be combined across studies using the generic inversevariance method (see Section 10.3.3 ). Alternatively, Poisson regression approaches can be used (Spittal et al 2015).
In a randomized trial, rate ratios may often be very similar to risk ratios obtained after dichotomizing the participants, since the average period of followup should be similar in all intervention groups. Rate ratios and risk ratios will differ, however, if an intervention affects the likelihood of some participants experiencing multiple events.
It is possible also to focus attention on the rate difference (see Chapter 6, Section 6.7.1 ). The analysis again can be performed using the generic inversevariance method (Hasselblad and McCrory 1995, Guevara et al 2004).
10.9 Metaanalysis of timetoevent outcomes
Two approaches to metaanalysis of timetoevent outcomes are readily available to Cochrane Review authors. The choice of which to use will depend on the type of data that have been extracted from the primary studies, or obtained from reanalysis of individual participant data.
If ‘O – E’ and ‘V’ statistics have been obtained (see Chapter 6, Section 6.8.2 ), either through reanalysis of individual participant data or from aggregate statistics presented in the study reports, then these statistics may be entered directly into RevMan using the ‘O – E and Variance’ outcome type. There are several ways to calculate these ‘O – E’ and ‘V’ statistics. Peto’s method applied to dichotomous data (Section 10.4.2 ) gives rise to an odds ratio; a logrank approach gives rise to a hazard ratio; and a variation of the Peto method for analysing timetoevent data gives rise to something in between (Simmonds et al 2011). The appropriate effect measure should be specified. Only fixedeffect metaanalysis methods are available in RevMan for ‘O – E and Variance’ outcomes.
Alternatively, if estimates of log hazard ratios and standard errors have been obtained from results of Cox proportional hazards regression models, study results can be combined using generic inversevariance methods (see Section 10.3.3 ).
If a mixture of logrank and Cox model estimates are obtained from the studies, all results can be combined using the generic inversevariance method, as the logrank estimates can be converted into log hazard ratios and standard errors using the approaches discussed in Chapter 6, Section 6.8 .
10.10 Heterogeneity
10.10.1 what is heterogeneity.
Inevitably, studies brought together in a systematic review will differ. Any kind of variability among studies in a systematic review may be termed heterogeneity. It can be helpful to distinguish between different types of heterogeneity. Variability in the participants, interventions and outcomes studied may be described as clinical diversity (sometimes called clinical heterogeneity), and variability in study design, outcome measurement tools and risk of bias may be described as methodological diversity (sometimes called methodological heterogeneity). Variability in the intervention effects being evaluated in the different studies is known as statistical heterogeneity , and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects being more different from each other than one would expect due to random error (chance) alone. We will follow convention and refer to statistical heterogeneity simply as heterogeneity .
Clinical variation will lead to heterogeneity if the intervention effect is affected by the factors that vary across studies; most obviously, the specific interventions or patient characteristics. In other words, the true intervention effect will be different in different studies.
Differences between studies in terms of methodological factors, such as use of blinding and concealment of allocation sequence, or if there are differences between studies in the way the outcomes are defined and measured, may be expected to lead to differences in the observed intervention effects. Significant statistical heterogeneity arising from methodological diversity or differences in outcome assessments suggests that the studies are not all estimating the same quantity, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity would indicate that the studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this is not always the case. Further discussion appears in Chapter 7 and Chapter 8 .
The scope of a review will largely determine the extent to which studies included in a review are diverse. Sometimes a review will include studies addressing a variety of questions, for example when several different interventions for the same condition are of interest (see also Chapter 11 ) or when the differential effects of an intervention in different populations are of interest. Metaanalysis should only be considered when a group of studies is sufficiently homogeneous in terms of participants, interventions and outcomes to provide a meaningful summary (see MECIR Box 10.10.a. ). It is often appropriate to take a broader perspective in a metaanalysis than in a single clinical trial. A common analogy is that systematic reviews bring together apples and oranges, and that combining these can yield a meaningless result. This is true if apples and oranges are of intrinsic interest on their own, but may not be if they are used to contribute to a wider question about fruit. For example, a metaanalysis may reasonably evaluate the average effect of a class of drugs by combining results from trials where each evaluates the effect of a different drug from the class.
MECIR Box 10.10.a Relevant expectations for conduct of intervention reviews
( )  
 Metaanalyses of very diverse studies can be misleading, for example where studies use different forms of control. Clinical diversity does not indicate necessarily that a metaanalysis should not be performed. However, authors must be clear about the underlying question that all studies are addressing. 
There may be specific interest in a review in investigating how clinical and methodological aspects of studies relate to their results. Where possible these investigations should be specified a priori (i.e. in the protocol for the systematic review). It is legitimate for a systematic review to focus on examining the relationship between some clinical characteristic(s) of the studies and the size of intervention effect, rather than on obtaining a summary effect estimate across a series of studies (see Section 10.11 ). Metaregression may best be used for this purpose, although it is not implemented in RevMan (see Section 10.11.4 ).
10.10.2 Identifying and measuring heterogeneity
It is essential to consider the extent to which the results of studies are consistent with each other (see MECIR Box 10.10.b ). If confidence intervals for the results of individual studies (generally depicted graphically using horizontal lines) have poor overlap, this generally indicates the presence of statistical heterogeneity. More formally, a statistical test for heterogeneity is available. This Chi 2 (χ 2 , or chisquared) test is included in the forest plots in Cochrane Reviews. It assesses whether observed differences in results are compatible with chance alone. A low P value (or a large Chi 2 statistic relative to its degree of freedom) provides evidence of heterogeneity of intervention effects (variation in effect estimates beyond chance).
MECIR Box 10.10.b Relevant expectations for conduct of intervention reviews
Assessing statistical heterogeneity ( )  
 The presence of heterogeneity affects the extent to which generalizable conclusions can be formed. It is important to identify heterogeneity in case there is sufficient information to explain it and offer new insights. Authors should recognize that there is much uncertainty in measures such as and Tau when there are few studies. Thus, use of simple thresholds to diagnose heterogeneity should be avoided. 
Care must be taken in the interpretation of the Chi 2 test, since it has low power in the (common) situation of a metaanalysis when studies have small sample size or are few in number. This means that while a statistically significant result may indicate a problem with heterogeneity, a nonsignificant result must not be taken as evidence of no heterogeneity. This is also why a P value of 0.10, rather than the conventional level of 0.05, is sometimes used to determine statistical significance. A further problem with the test, which seldom occurs in Cochrane Reviews, is that when there are many studies in a metaanalysis, the test has high power to detect a small amount of heterogeneity that may be clinically unimportant.
Some argue that, since clinical and methodological diversity always occur in a metaanalysis, statistical heterogeneity is inevitable (Higgins et al 2003). Thus, the test for heterogeneity is irrelevant to the choice of analysis; heterogeneity will always exist whether or not we happen to be able to detect it using a statistical test. Methods have been developed for quantifying inconsistency across studies that move the focus away from testing whether heterogeneity is present to assessing its impact on the metaanalysis. A useful statistic for quantifying inconsistency is:
In this equation, Q is the Chi 2 statistic and df is its degrees of freedom (Higgins and Thompson 2002, Higgins et al 2003). I 2 describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance).
Thresholds for the interpretation of the I 2 statistic can be misleading, since the importance of inconsistency depends on several factors. A rough guide to interpretation in the context of metaanalyses of randomized trials is as follows:
 0% to 40%: might not be important;
 30% to 60%: may represent moderate heterogeneity*;
 50% to 90%: may represent substantial heterogeneity*;
 75% to 100%: considerable heterogeneity*.
*The importance of the observed value of I 2 depends on (1) magnitude and direction of effects, and (2) strength of evidence for heterogeneity (e.g. P value from the Chi 2 test, or a confidence interval for I 2 : uncertainty in the value of I 2 is substantial when the number of studies is small).
10.10.3 Strategies for addressing heterogeneity
Review authors must take into account any statistical heterogeneity when interpreting results, particularly when there is variation in the direction of effect (see MECIR Box 10.10.c ). A number of options are available if heterogeneity is identified among a group of studies that would otherwise be considered suitable for a metaanalysis.
MECIR Box 10.10.c Relevant expectations for conduct of intervention reviews
Considering statistical heterogeneity when interpreting the results ( )  
 The presence of heterogeneity affects the extent to which generalizable conclusions can be formed. If a fixedeffect analysis is used, the confidence intervals ignore the extent of heterogeneity. If a randomeffects analysis is used, the result pertains to the mean effect across studies. In both cases, the implications of notable heterogeneity should be addressed. It may be possible to understand the reasons for the heterogeneity if there are sufficient studies. 
 Check again that the data are correct. Severe apparent heterogeneity can indicate that data have been incorrectly extracted or entered into metaanalysis software. For example, if standard errors have mistakenly been entered as SDs for continuous outcomes, this could manifest itself in overly narrow confidence intervals with poor overlap and hence substantial heterogeneity. Unitofanalysis errors may also be causes of heterogeneity (see Chapter 6, Section 6.2 ).
 Do not do a meta analysis. A systematic review need not contain any metaanalyses. If there is considerable variation in results, and particularly if there is inconsistency in the direction of effect, it may be misleading to quote an average value for the intervention effect.
 Explore heterogeneity. It is clearly of interest to determine the causes of heterogeneity among results of studies. This process is problematic since there are often many characteristics that vary across studies from which one may choose. Heterogeneity may be explored by conducting subgroup analyses (see Section 10.11.3 ) or metaregression (see Section 10.11.4 ). Reliable conclusions can only be drawn from analyses that are truly prespecified before inspecting the studies’ results, and even these conclusions should be interpreted with caution. Explorations of heterogeneity that are devised after heterogeneity is identified can at best lead to the generation of hypotheses. They should be interpreted with even more caution and should generally not be listed among the conclusions of a review. Also, investigations of heterogeneity when there are very few studies are of questionable value.
 Ignore heterogeneity. Fixedeffect metaanalyses ignore heterogeneity. The summary effect estimate from a fixedeffect metaanalysis is normally interpreted as being the best estimate of the intervention effect. However, the existence of heterogeneity suggests that there may not be a single intervention effect but a variety of intervention effects. Thus, the summary fixedeffect estimate may be an intervention effect that does not actually exist in any population, and therefore have a confidence interval that is meaningless as well as being too narrow (see Section 10.10.4 ).
 Perform a randomeffects metaanalysis. A randomeffects metaanalysis may be used to incorporate heterogeneity among studies. This is not a substitute for a thorough investigation of heterogeneity. It is intended primarily for heterogeneity that cannot be explained. An extended discussion of this option appears in Section 10.10.4 .
 Reconsider the effect measure. Heterogeneity may be an artificial consequence of an inappropriate choice of effect measure. For example, when studies collect continuous outcome data using different scales or different units, extreme heterogeneity may be apparent when using the mean difference but not when the more appropriate standardized mean difference is used. Furthermore, choice of effect measure for dichotomous outcomes (odds ratio, risk ratio, or risk difference) may affect the degree of heterogeneity among results. In particular, when comparator group risks vary, homogeneous odds ratios or risk ratios will necessarily lead to heterogeneous risk differences, and vice versa. However, it remains unclear whether homogeneity of intervention effect in a particular metaanalysis is a suitable criterion for choosing between these measures (see also Section 10.4.3 ).
 Exclude studies. Heterogeneity may be due to the presence of one or two outlying studies with results that conflict with the rest of the studies. In general it is unwise to exclude studies from a metaanalysis on the basis of their results as this may introduce bias. However, if an obvious reason for the outlying result is apparent, the study might be removed with more confidence. Since usually at least one characteristic can be found for any study in any metaanalysis which makes it different from the others, this criterion is unreliable because it is all too easy to fulfil. It is advisable to perform analyses both with and without outlying studies as part of a sensitivity analysis (see Section 10.14 ). Whenever possible, potential sources of clinical diversity that might lead to such situations should be specified in the protocol.
10.10.4 Incorporating heterogeneity into randomeffects models
The randomeffects metaanalysis approach incorporates an assumption that the different studies are estimating different, yet related, intervention effects (DerSimonian and Laird 1986, Borenstein et al 2010). The approach allows us to address heterogeneity that cannot readily be explained by other factors. A randomeffects metaanalysis model involves an assumption that the effects being estimated in the different studies follow some distribution. The model represents our lack of knowledge about why real, or apparent, intervention effects differ, by considering the differences as if they were random. The centre of the assumed distribution describes the average of the effects, while its width describes the degree of heterogeneity. The conventional choice of distribution is a normal distribution. It is difficult to establish the validity of any particular distributional assumption, and this is a common criticism of randomeffects metaanalyses. The importance of the assumed shape for this distribution has not been widely studied.
To undertake a randomeffects metaanalysis, the standard errors of the studyspecific estimates (SE i in Section 10.3.1 ) are adjusted to incorporate a measure of the extent of variation, or heterogeneity, among the intervention effects observed in different studies (this variation is often referred to as Tausquared, τ 2 , or Tau 2 ). The amount of variation, and hence the adjustment, can be estimated from the intervention effects and standard errors of the studies included in the metaanalysis.
In a heterogeneous set of studies, a randomeffects metaanalysis will award relatively more weight to smaller studies than such studies would receive in a fixedeffect metaanalysis. This is because small studies are more informative for learning about the distribution of effects across studies than for learning about an assumed common intervention effect.
Note that a randomeffects model does not ‘take account’ of the heterogeneity, in the sense that it is no longer an issue. It is always preferable to explore possible causes of heterogeneity, although there may be too few studies to do this adequately (see Section 10.11 ).
10.10.4.1 Fixed or random effects?
A fixedeffect metaanalysis provides a result that may be viewed as a ‘typical intervention effect’ from the studies included in the analysis. In order to calculate a confidence interval for a fixedeffect metaanalysis the assumption is usually made that the true effect of intervention (in both magnitude and direction) is the same value in every study (i.e. fixed across studies). This assumption implies that the observed differences among study results are due solely to the play of chance (i.e. that there is no statistical heterogeneity).
A randomeffects model provides a result that may be viewed as an ‘average intervention effect’, where this average is explicitly defined according to an assumed distribution of effects across studies. Instead of assuming that the intervention effects are the same, we assume that they follow (usually) a normal distribution. The assumption implies that the observed differences among study results are due to a combination of the play of chance and some genuine variation in the intervention effects.
The randomeffects method and the fixedeffect method will give identical results when there is no heterogeneity among the studies.
When heterogeneity is present, a confidence interval around the randomeffects summary estimate is wider than a confidence interval around a fixedeffect summary estimate. This will happen whenever the I 2 statistic is greater than zero, even if the heterogeneity is not detected by the Chi 2 test for heterogeneity (see Section 10.10.2 ).
Sometimes the central estimate of the intervention effect is different between fixedeffect and randomeffects analyses. In particular, if results of smaller studies are systematically different from results of larger ones, which can happen as a result of publication bias or withinstudy bias in smaller studies (Egger et al 1997, Poole and Greenland 1999, Kjaergard et al 2001), then a randomeffects metaanalysis will exacerbate the effects of the bias (see also Chapter 13, Section 13.3.5.6 ). A fixedeffect analysis will be affected less, although strictly it will also be inappropriate.
The decision between fixed and randomeffects metaanalyses has been the subject of much debate, and we do not provide a universal recommendation. Some considerations in making this choice are as follows:
 Many have argued that the decision should be based on an expectation of whether the intervention effects are truly identical, preferring the fixedeffect model if this is likely and a randomeffects model if this is unlikely (Borenstein et al 2010). Since it is generally considered to be implausible that intervention effects across studies are identical (unless the intervention has no effect at all), this leads many to advocate use of the randomeffects model.
 Others have argued that a fixedeffect analysis can be interpreted in the presence of heterogeneity, and that it makes fewer assumptions than a randomeffects metaanalysis. They then refer to it as a ‘fixedeffects’ metaanalysis (Peto et al 1995, Rice et al 2018).
 Under any interpretation, a fixedeffect metaanalysis ignores heterogeneity. If the method is used, it is therefore important to supplement it with a statistical investigation of the extent of heterogeneity (see Section 10.10.2 ).
 In the presence of heterogeneity, a randomeffects analysis gives relatively more weight to smaller studies and relatively less weight to larger studies. If there is additionally some funnel plot asymmetry (i.e. a relationship between intervention effect magnitude and study size), then this will push the results of the randomeffects analysis towards the findings in the smaller studies. In the context of randomized trials, this is generally regarded as an unfortunate consequence of the model.
 A pragmatic approach is to plan to undertake both a fixedeffect and a randomeffects metaanalysis, with an intention to present the randomeffects result if there is no indication of funnel plot asymmetry. If there is an indication of funnel plot asymmetry, then both methods are problematic. It may be reasonable to present both analyses or neither, or to perform a sensitivity analysis in which small studies are excluded or addressed directly using metaregression (see Chapter 13, Section 13.3.5.6 ).
 The choice between a fixedeffect and a randomeffects metaanalysis should never be made on the basis of a statistical test for heterogeneity.
10.10.4.2 Interpretation of randomeffects metaanalyses
The summary estimate and confidence interval from a randomeffects metaanalysis refer to the centre of the distribution of intervention effects, but do not describe the width of the distribution. Often the summary estimate and its confidence interval are quoted in isolation and portrayed as a sufficient summary of the metaanalysis. This is inappropriate. The confidence interval from a randomeffects metaanalysis describes uncertainty in the location of the mean of systematically different effects in the different studies. It does not describe the degree of heterogeneity among studies, as may be commonly believed. For example, when there are many studies in a metaanalysis, we may obtain a very tight confidence interval around the randomeffects estimate of the mean effect even when there is a large amount of heterogeneity. A solution to this problem is to consider a prediction interval (see Section 10.10.4.3 ).
Methodological diversity creates heterogeneity through biases variably affecting the results of different studies. The randomeffects summary estimate will only correctly estimate the average intervention effect if the biases are symmetrically distributed, leading to a mixture of overestimates and underestimates of effect, which is unlikely to be the case. In practice it can be very difficult to distinguish whether heterogeneity results from clinical or methodological diversity, and in most cases it is likely to be due to both, so these distinctions are hard to draw in the interpretation.
When there is little information, either because there are few studies or if the studies are small with few events, a randomeffects analysis will provide poor estimates of the amount of heterogeneity (i.e. of the width of the distribution of intervention effects). Fixedeffect methods such as the MantelHaenszel method will provide more robust estimates of the average intervention effect, but at the cost of ignoring any heterogeneity.
10.10.4.3 Prediction intervals from a randomeffects metaanalysis
An estimate of the betweenstudy variance in a randomeffects metaanalysis is typically presented as part of its results. The square root of this number (i.e. Tau) is the estimated standard deviation of underlying effects across studies. Prediction intervals are a way of expressing this value in an interpretable way.
To motivate the idea of a prediction interval, note that for absolute measures of effect (e.g. risk difference, mean difference, standardized mean difference), an approximate 95% range of normally distributed underlying effects can be obtained by creating an interval from 1.96´Tau below the randomeffects mean, to 1.96✕Tau above it. (For relative measures such as the odds ratio and risk ratio, an equivalent interval needs to be based on the natural logarithm of the summary estimate.) In reality, both the summary estimate and the value of Tau are associated with uncertainty. A prediction interval seeks to present the range of effects in a way that acknowledges this uncertainty (Higgins et al 2009). A simple 95% prediction interval can be calculated as:
where M is the summary mean from the randomeffects metaanalysis, t k −2 is the 95% percentile of a t distribution with k –2 degrees of freedom, k is the number of studies, Tau 2 is the estimated amount of heterogeneity and SE( M ) is the standard error of the summary mean.
The term ‘prediction interval’ relates to the use of this interval to predict the possible underlying effect in a new study that is similar to the studies in the metaanalysis. A more useful interpretation of the interval is as a summary of the spread of underlying effects in the studies included in the randomeffects metaanalysis.
Prediction intervals have proved a popular way of expressing the amount of heterogeneity in a metaanalysis (Riley et al 2011). They are, however, strongly based on the assumption of a normal distribution for the effects across studies, and can be very problematic when the number of studies is small, in which case they can appear spuriously wide or spuriously narrow. Nevertheless, we encourage their use when the number of studies is reasonable (e.g. more than ten) and there is no clear funnel plot asymmetry.
10.10.4.4 Implementing randomeffects metaanalyses
As introduced in Section 10.3.2 , the randomeffects model can be implemented using an inversevariance approach, incorporating a measure of the extent of heterogeneity into the study weights. RevMan implements a version of randomeffects metaanalysis that is described by DerSimonian and Laird, making use of a ‘momentbased’ estimate of the betweenstudy variance (DerSimonian and Laird 1986). The attraction of this method is that the calculations are straightforward, but it has a theoretical disadvantage in that the confidence intervals are slightly too narrow to encompass full uncertainty resulting from having estimated the degree of heterogeneity.
For many years, RevMan has implemented two randomeffects methods for dichotomous data: a MantelHaenszel method and an inversevariance method. Both use the momentbased approach to estimating the amount of betweenstudies variation. The difference between the two is subtle: the former estimates the betweenstudy variation by comparing each study’s result with a MantelHaenszel fixedeffect metaanalysis result, whereas the latter estimates it by comparing each study’s result with an inversevariance fixedeffect metaanalysis result. In practice, the difference is likely to be trivial.
There are alternative methods for performing randomeffects metaanalyses that have better technical properties than the DerSimonian and Laird approach with a momentbased estimate (Veroniki et al 2016). Most notable among these is an adjustment to the confidence interval proposed by Hartung and Knapp and by Sidik and Jonkman (Hartung and Knapp 2001, Sidik and Jonkman 2002). This adjustment widens the confidence interval to reflect uncertainty in the estimation of betweenstudy heterogeneity, and it should be used if available to review authors. An alternative option to encompass full uncertainty in the degree of heterogeneity is to take a Bayesian approach (see Section 10.13 ).
An empirical comparison of different ways to estimate betweenstudy variation in Cochrane metaanalyses has shown that they can lead to substantial differences in estimates of heterogeneity, but seldom have major implications for estimating summary effects (Langan et al 2015). Several simulation studies have concluded that an approach proposed by Paule and Mandel should be recommended (Langan et al 2017); whereas a comprehensive recent simulation study recommended a restricted maximum likelihood approach, although noted that no single approach is universally preferable (Langan et al 2019). Review authors are encouraged to select one of these options if it is available to them.
10.11 Investigating heterogeneity
10.11.1 interaction and effect modification.
Does the intervention effect vary with different populations or intervention characteristics (such as dose or duration)? Such variation is known as interaction by statisticians and as effect modification by epidemiologists. Methods to search for such interactions include subgroup analyses and metaregression. All methods have considerable pitfalls.
10.11.2 What are subgroup analyses?
Subgroup analyses involve splitting all the participant data into subgroups, often in order to make comparisons between them. Subgroup analyses may be done for subsets of participants (such as males and females), or for subsets of studies (such as different geographical locations). Subgroup analyses may be done as a means of investigating heterogeneous results, or to answer specific questions about particular patient groups, types of intervention or types of study.
Subgroup analyses of subsets of participants within studies are uncommon in systematic reviews based on published literature because sufficient details to extract data about separate participant types are seldom published in reports. By contrast, such subsets of participants are easily analysed when individual participant data have been collected (see Chapter 26 ). The methods we describe in the remainder of this chapter are for subgroups of studies.
Findings from multiple subgroup analyses may be misleading. Subgroup analyses are observational by nature and are not based on randomized comparisons. False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed. If their findings are presented as definitive conclusions there is clearly a risk of people being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations about directions for future research that, if followed, would waste scarce resources.
It is useful to distinguish between the notions of ‘qualitative interaction’ and ‘quantitative interaction’ (Yusuf et al 1991). Qualitative interaction exists if the direction of effect is reversed, that is if an intervention is beneficial in one subgroup but is harmful in another. Qualitative interaction is rare. This may be used as an argument that the most appropriate result of a metaanalysis is the overall effect across all subgroups. Quantitative interaction exists when the size of the effect varies but not the direction, that is if an intervention is beneficial to different degrees in different subgroups.
10.11.3 Undertaking subgroup analyses
Metaanalyses can be undertaken in RevMan both within subgroups of studies as well as across all studies irrespective of their subgroup membership. It is tempting to compare effect estimates in different subgroups by considering the metaanalysis results from each subgroup separately. This should only be done informally by comparing the magnitudes of effect. Noting that either the effect or the test for heterogeneity in one subgroup is statistically significant whilst that in the other subgroup is not statistically significant does not indicate that the subgroup factor explains heterogeneity. Since different subgroups are likely to contain different amounts of information and thus have different abilities to detect effects, it is extremely misleading simply to compare the statistical significance of the results.
10.11.3.1 Is the effect different in different subgroups?
Valid investigations of whether an intervention works differently in different subgroups involve comparing the subgroups with each other. It is a mistake to compare withinsubgroup inferences such as P values. If one subgroup analysis is statistically significant and another is not, then the latter may simply reflect a lack of information rather than a smaller (or absent) effect. When there are only two subgroups, nonoverlap of the confidence intervals indicates statistical significance, but note that the confidence intervals can overlap to a small degree and the difference still be statistically significant.
A formal statistical approach should be used to examine differences among subgroups (see MECIR Box 10.11.a ). A simple significance test to investigate differences between two or more subgroups can be performed (Borenstein and Higgins 2013). This procedure consists of undertaking a standard test for heterogeneity across subgroup results rather than across individual study results. When the metaanalysis uses a fixedeffect inversevariance weighted average approach, the method is exactly equivalent to the test described by Deeks and colleagues (Deeks et al 2001). An I 2 statistic is also computed for subgroup differences. This describes the percentage of the variability in effect estimates from the different subgroups that is due to genuine subgroup differences rather than sampling error (chance). Note that these methods for examining subgroup differences should be used only when the data in the subgroups are independent (i.e. they should not be used if the same study participants contribute to more than one of the subgroups in the forest plot).
If fixedeffect models are used for the analysis within each subgroup, then these statistics relate to differences in typical effects across different subgroups. If randomeffects models are used for the analysis within each subgroup, then the statistics relate to variation in the mean effects in the different subgroups.
An alternative method for testing for differences between subgroups is to use metaregression techniques, in which case a randomeffects model is generally preferred (see Section 10.11.4 ). Tests for subgroup differences based on randomeffects models may be regarded as preferable to those based on fixedeffect models, due to the high risk of falsepositive results when a fixedeffect model is used to compare subgroups (Higgins and Thompson 2004).
MECIR Box 10.11.a Relevant expectations for conduct of intervention reviews
Comparing subgroups ( )  
 Concluding that there is a difference in effect in different subgroups on the basis of differences in the level of statistical significance within subgroups can be very misleading. 
10.11.4 Metaregression
If studies are divided into subgroups (see Section 10.11.2 ), this may be viewed as an investigation of how a categorical study characteristic is associated with the intervention effects in the metaanalysis. For example, studies in which allocation sequence concealment was adequate may yield different results from those in which it was inadequate. Here, allocation sequence concealment, being either adequate or inadequate, is a categorical characteristic at the study level. Metaregression is an extension to subgroup analyses that allows the effect of continuous, as well as categorical, characteristics to be investigated, and in principle allows the effects of multiple factors to be investigated simultaneously (although this is rarely possible due to inadequate numbers of studies) (Thompson and Higgins 2002). Metaregression should generally not be considered when there are fewer than ten studies in a metaanalysis.
Metaregressions are similar in essence to simple regressions, in which an outcome variable is predicted according to the values of one or more explanatory variables . In metaregression, the outcome variable is the effect estimate (for example, a mean difference, a risk difference, a log odds ratio or a log risk ratio). The explanatory variables are characteristics of studies that might influence the size of intervention effect. These are often called ‘potential effect modifiers’ or covariates. Metaregressions usually differ from simple regressions in two ways. First, larger studies have more influence on the relationship than smaller studies, since studies are weighted by the precision of their respective effect estimate. Second, it is wise to allow for the residual heterogeneity among intervention effects not modelled by the explanatory variables. This gives rise to the term ‘randomeffects metaregression’, since the extra variability is incorporated in the same way as in a randomeffects metaanalysis (Thompson and Sharp 1999).
The regression coefficient obtained from a metaregression analysis will describe how the outcome variable (the intervention effect) changes with a unit increase in the explanatory variable (the potential effect modifier). The statistical significance of the regression coefficient is a test of whether there is a linear relationship between intervention effect and the explanatory variable. If the intervention effect is a ratio measure, the logtransformed value of the intervention effect should always be used in the regression model (see Chapter 6, Section 6.1.2.1 ), and the exponential of the regression coefficient will give an estimate of the relative change in intervention effect with a unit increase in the explanatory variable.
Metaregression can also be used to investigate differences for categorical explanatory variables as done in subgroup analyses. If there are J subgroups, membership of particular subgroups is indicated by using J minus 1 dummy variables (which can only take values of zero or one) in the metaregression model (as in standard linear regression modelling). The regression coefficients will estimate how the intervention effect in each subgroup differs from a nominated reference subgroup. The P value of each regression coefficient will indicate the strength of evidence against the null hypothesis that the characteristic is not associated with the intervention effect.
Metaregression may be performed using the ‘metareg’ macro available for the Stata statistical package, or using the ‘metafor’ package for R, as well as other packages.
10.11.5 Selection of study characteristics for subgroup analyses and metaregression
Authors need to be cautious about undertaking subgroup analyses, and interpreting any that they do. Some considerations are outlined here for selecting characteristics (also called explanatory variables, potential effect modifiers or covariates) that will be investigated for their possible influence on the size of the intervention effect. These considerations apply similarly to subgroup analyses and to metaregressions. Further details may be obtained elsewhere (Oxman and Guyatt 1992, Berlin and Antman 1994).
10.11.5.1 Ensure that there are adequate studies to justify subgroup analyses and metaregressions
It is very unlikely that an investigation of heterogeneity will produce useful findings unless there is a substantial number of studies. Typical advice for undertaking simple regression analyses: that at least ten observations (i.e. ten studies in a metaanalysis) should be available for each characteristic modelled. However, even this will be too few when the covariates are unevenly distributed across studies.
10.11.5.2 Specify characteristics in advance
Authors should, whenever possible, prespecify characteristics in the protocol that later will be subject to subgroup analyses or metaregression. The plan specified in the protocol should then be followed (data permitting), without undue emphasis on any particular findings (see MECIR Box 10.11.b ). Prespecifying characteristics reduces the likelihood of spurious findings, first by limiting the number of subgroups investigated, and second by preventing knowledge of the studies’ results influencing which subgroups are analysed. True prespecification is difficult in systematic reviews, because the results of some of the relevant studies are often known when the protocol is drafted. If a characteristic was overlooked in the protocol, but is clearly of major importance and justified by external evidence, then authors should not be reluctant to explore it. However, such posthoc analyses should be identified as such.
MECIR Box 10.11.b Relevant expectations for conduct of intervention reviews
Interpreting subgroup analyses ( )  
If subgroup analyses are conducted  Selective reporting, or overinterpretation, of particular subgroups or particular subgroup analyses should be avoided. This is a problem especially when multiple subgroup analyses are performed. This does not preclude the use of sensible and honest post hoc subgroup analyses. 
10.11.5.3 Select a small number of characteristics
The likelihood of a falsepositive result among subgroup analyses and metaregression increases with the number of characteristics investigated. It is difficult to suggest a maximum number of characteristics to look at, especially since the number of available studies is unknown in advance. If more than one or two characteristics are investigated it may be sensible to adjust the level of significance to account for making multiple comparisons.
10.11.5.4 Ensure there is scientific rationale for investigating each characteristic
Selection of characteristics should be motivated by biological and clinical hypotheses, ideally supported by evidence from sources other than the included studies. Subgroup analyses using characteristics that are implausible or clinically irrelevant are not likely to be useful and should be avoided. For example, a relationship between intervention effect and year of publication is seldom in itself clinically informative, and if identified runs the risk of initiating a posthoc data dredge of factors that may have changed over time.
Prognostic factors are those that predict the outcome of a disease or condition, whereas effect modifiers are factors that influence how well an intervention works in affecting the outcome. Confusion between prognostic factors and effect modifiers is common in planning subgroup analyses, especially at the protocol stage. Prognostic factors are not good candidates for subgroup analyses unless they are also believed to modify the effect of intervention. For example, being a smoker may be a strong predictor of mortality within the next ten years, but there may not be reason for it to influence the effect of a drug therapy on mortality (Deeks 1998). Potential effect modifiers may include participant characteristics (age, setting), the precise interventions (dose of active intervention, choice of comparison intervention), how the study was done (length of followup) or methodology (design and quality).
10.11.5.5 Be aware that the effect of a characteristic may not always be identified
Many characteristics that might have important effects on how well an intervention works cannot be investigated using subgroup analysis or metaregression. These are characteristics of participants that might vary substantially within studies, but that can only be summarized at the level of the study. An example is age. Consider a collection of clinical trials involving adults ranging from 18 to 60 years old. There may be a strong relationship between age and intervention effect that is apparent within each study. However, if the mean ages for the trials are similar, then no relationship will be apparent by looking at trial mean ages and triallevel effect estimates. The problem is one of aggregating individuals’ results and is variously known as aggregation bias, ecological bias or the ecological fallacy (Morgenstern 1982, Greenland 1987, Berlin et al 2002). It is even possible for the direction of the relationship across studies be the opposite of the direction of the relationship observed within each study.
10.11.5.6 Think about whether the characteristic is closely related to another characteristic (confounded)
The problem of ‘confounding’ complicates interpretation of subgroup analyses and metaregressions and can lead to incorrect conclusions. Two characteristics are confounded if their influences on the intervention effect cannot be disentangled. For example, if those studies implementing an intensive version of a therapy happened to be the studies that involved patients with more severe disease, then one cannot tell which aspect is the cause of any difference in effect estimates between these studies and others. In metaregression, colinearity between potential effect modifiers leads to similar difficulties (Berlin and Antman 1994). Computing correlations between study characteristics will give some information about which study characteristics may be confounded with each other.
10.11.6 Interpretation of subgroup analyses and metaregressions
Appropriate interpretation of subgroup analyses and metaregressions requires caution (Oxman and Guyatt 1992).
 Subgroup comparisons are observational. It must be remembered that subgroup analyses and metaregressions are entirely observational in their nature. These analyses investigate differences between studies. Even if individuals are randomized to one group or other within a clinical trial, they are not randomized to go in one trial or another. Hence, subgroup analyses suffer the limitations of any observational investigation, including possible bias through confounding by other studylevel characteristics. Furthermore, even a genuine difference between subgroups is not necessarily due to the classification of the subgroups. As an example, a subgroup analysis of bone marrow transplantation for treating leukaemia might show a strong association between the age of a sibling donor and the success of the transplant. However, this probably does not mean that the age of donor is important. In fact, the age of the recipient is probably a key factor and the subgroup finding would simply be due to the strong association between the age of the recipient and the age of their sibling.
 Was the analysis prespecified or post hoc? Authors should state whether subgroup analyses were prespecified or undertaken after the results of the studies had been compiled (post hoc). More reliance may be placed on a subgroup analysis if it was one of a small number of prespecified analyses. Performing numerous posthoc subgroup analyses to explain heterogeneity is a form of data dredging. Data dredging is condemned because it is usually possible to find an apparent, but false, explanation for heterogeneity by considering lots of different characteristics.
 Is there indirect evidence in support of the findings? Differences between subgroups should be clinically plausible and supported by other external or indirect evidence, if they are to be convincing.
 Is the magnitude of the difference practically important? If the magnitude of a difference between subgroups will not result in different recommendations for different subgroups, then it may be better to present only the overall analysis results.
 Is there a statistically significant difference between subgroups? To establish whether there is a different effect of an intervention in different situations, the magnitudes of effects in different subgroups should be compared directly with each other. In particular, statistical significance of the results within separate subgroup analyses should not be compared (see Section 10.11.3.1 ).
 Are analyses looking at withinstudy or betweenstudy relationships? For patient and intervention characteristics, differences in subgroups that are observed within studies are more reliable than analyses of subsets of studies. If such withinstudy relationships are replicated across studies then this adds confidence to the findings.
10.11.7 Investigating the effect of underlying risk
One potentially important source of heterogeneity among a series of studies is when the underlying average risk of the outcome event varies between the studies. The underlying risk of a particular event may be viewed as an aggregate measure of casemix factors such as age or disease severity. It is generally measured as the observed risk of the event in the comparator group of each study (the comparator group risk, or CGR). The notion is controversial in its relevance to clinical practice since underlying risk represents a summary of both known and unknown risk factors. Problems also arise because comparator group risk will depend on the length of followup, which often varies across studies. However, underlying risk has received particular attention in metaanalysis because the information is readily available once dichotomous data have been prepared for use in metaanalyses. Sharp provides a full discussion of the topic (Sharp 2001).
Intuition would suggest that participants are more or less likely to benefit from an effective intervention according to their risk status. However, the relationship between underlying risk and intervention effect is a complicated issue. For example, suppose an intervention is equally beneficial in the sense that for all patients it reduces the risk of an event, say a stroke, to 80% of the underlying risk. Then it is not equally beneficial in terms of absolute differences in risk in the sense that it reduces a 50% stroke rate by 10 percentage points to 40% (number needed to treat=10), but a 20% stroke rate by 4 percentage points to 16% (number needed to treat=25).
Use of different summary statistics (risk ratio, odds ratio and risk difference) will demonstrate different relationships with underlying risk. Summary statistics that show close to no relationship with underlying risk are generally preferred for use in metaanalysis (see Section 10.4.3 ).
Investigating any relationship between effect estimates and the comparator group risk is also complicated by a technical phenomenon known as regression to the mean. This arises because the comparator group risk forms an integral part of the effect estimate. A high risk in a comparator group, observed entirely by chance, will on average give rise to a higher than expected effect estimate, and vice versa. This phenomenon results in a false correlation between effect estimates and comparator group risks. There are methods, which require sophisticated software, that correct for regression to the mean (McIntosh 1996, Thompson et al 1997). These should be used for such analyses, and statistical expertise is recommended.
10.11.8 Doseresponse analyses
The principles of metaregression can be applied to the relationships between intervention effect and dose (commonly termed doseresponse), treatment intensity or treatment duration (Greenland and Longnecker 1992, Berlin et al 1993). Conclusions about differences in effect due to differences in dose (or similar factors) are on stronger ground if participants are randomized to one dose or another within a study and a consistent relationship is found across similar studies. While authors should consider these effects, particularly as a possible explanation for heterogeneity, they should be cautious about drawing conclusions based on betweenstudy differences. Authors should be particularly cautious about claiming that a doseresponse relationship does not exist, given the low power of many metaregression analyses to detect genuine relationships.
10.12 Missing data
10.12.1 types of missing data.
There are many potential sources of missing data in a systematic review or metaanalysis (see Table 10.12.a ). For example, a whole study may be missing from the review, an outcome may be missing from a study, summary data may be missing for an outcome, and individual participants may be missing from the summary data. Here we discuss a variety of potential sources of missing data, highlighting where more detailed discussions are available elsewhere in the Handbook .
Whole studies may be missing from a review because they are never published, are published in obscure places, are rarely cited, or are inappropriately indexed in databases. Thus, review authors should always be aware of the possibility that they have failed to identify relevant studies. There is a strong possibility that such studies are missing because of their ‘uninteresting’ or ‘unwelcome’ findings (that is, in the presence of publication bias). This problem is discussed at length in Chapter 13 . Details of comprehensive search methods are provided in Chapter 4 .
Some studies might not report any information on outcomes of interest to the review. For example, there may be no information on quality of life, or on serious adverse effects. It is often difficult to determine whether this is because the outcome was not measured or because the outcome was not reported. Furthermore, failure to report that outcomes were measured may be dependent on the unreported results (selective outcome reporting bias; see Chapter 7, Section 7.2.3.3 ). Similarly, summary data for an outcome, in a form that can be included in a metaanalysis, may be missing. A common example is missing standard deviations (SDs) for continuous outcomes. This is often a problem when changefrombaseline outcomes are sought. We discuss imputation of missing SDs in Chapter 6, Section 6.5.2.8 . Other examples of missing summary data are missing sample sizes (particularly those for each intervention group separately), numbers of events, standard errors, followup times for calculating rates, and sufficient details of timetoevent outcomes. Inappropriate analyses of studies, for example of clusterrandomized and crossover trials, can lead to missing summary data. It is sometimes possible to approximate the correct analyses of such studies, for example by imputing correlation coefficients or SDs, as discussed in Chapter 23, Section 23.1 , for clusterrandomized studies and Chapter 23,Section 23.2 , for crossover trials. As a general rule, most methodologists believe that missing summary data (e.g. ‘no usable data’) should not be used as a reason to exclude a study from a systematic review. It is more appropriate to include the study in the review, and to discuss the potential implications of its absence from a metaanalysis.
It is likely that in some, if not all, included studies, there will be individuals missing from the reported results. Review authors are encouraged to consider this problem carefully (see MECIR Box 10.12.a ). We provide further discussion of this problem in Section 10.12.3 ; see also Chapter 8, Section 8.5 .
Missing data can also affect subgroup analyses. If subgroup analyses or metaregressions are planned (see Section 10.11 ), they require details of the studylevel characteristics that distinguish studies from one another. If these are not available for all studies, review authors should consider asking the study authors for more information.
Table 10.12.a Types of missing data in a metaanalysis


Missing studies  Publication bias Search not sufficiently comprehensive 
Missing outcomes  Outcome not measured Selective reporting bias 
Missing summary data  Selective reporting bias Incomplete reporting 
Missing individuals  Lack of intentiontotreat analysis Attrition from the study Selective reporting bias 
Missing studylevel characteristics (for subgroup analysis or metaregression)  Characteristic not measured Incomplete reporting 
MECIR Box 10.12.a Relevant expectations for conduct of intervention reviews
Addressing missing outcome data ( )  
 Incomplete outcome data can introduce bias. In most circumstances, authors should follow the principles of intentiontotreat analyses as far as possible (this may not be appropriate for adverse effects or if trying to demonstrate equivalence). Risk of bias due to incomplete outcome data is addressed in the Cochrane riskofbias tool. However, statistical analyses and careful interpretation of results are additional ways in which the issue can be addressed by review authors. Imputation methods can be considered (accompanied by, or in the form of, sensitivity analyses). 
10.12.2 General principles for dealing with missing data
There is a large literature of statistical methods for dealing with missing data. Here we briefly review some key concepts and make some general recommendations for Cochrane Review authors. It is important to think why data may be missing. Statisticians often use the terms ‘missing at random’ and ‘not missing at random’ to represent different scenarios.
Data are said to be ‘missing at random’ if the fact that they are missing is unrelated to actual values of the missing data. For instance, if some qualityoflife questionnaires were lost in the postal system, this would be unlikely to be related to the quality of life of the trial participants who completed the forms. In some circumstances, statisticians distinguish between data ‘missing at random’ and data ‘missing completely at random’, although in the context of a systematic review the distinction is unlikely to be important. Data that are missing at random may not be important. Analyses based on the available data will often be unbiased, although based on a smaller sample size than the original data set.
Data are said to be ‘not missing at random’ if the fact that they are missing is related to the actual missing data. For instance, in a depression trial, participants who had a relapse of depression might be less likely to attend the final followup interview, and more likely to have missing outcome data. Such data are ‘nonignorable’ in the sense that an analysis of the available data alone will typically be biased. Publication bias and selective reporting bias lead by definition to data that are ‘not missing at random’, and attrition and exclusions of individuals within studies often do as well.
The principal options for dealing with missing data are:
 analysing only the available data (i.e. ignoring the missing data);
 imputing the missing data with replacement values, and treating these as if they were observed (e.g. last observation carried forward, imputing an assumed outcome such as assuming all were poor outcomes, imputing the mean, imputing based on predicted values from a regression analysis);
 imputing the missing data and accounting for the fact that these were imputed with uncertainty (e.g. multiple imputation, simple imputation methods (as point 2) with adjustment to the standard error); and
 using statistical models to allow for missing data, making assumptions about their relationships with the available data.
Option 2 is practical in most circumstances and very commonly used in systematic reviews. However, it fails to acknowledge uncertainty in the imputed values and results, typically, in confidence intervals that are too narrow. Options 3 and 4 would require involvement of a knowledgeable statistician.
Five general recommendations for dealing with missing data in Cochrane Reviews are as follows:
 Whenever possible, contact the original investigators to request missing data.
 Make explicit the assumptions of any methods used to address missing data: for example, that the data are assumed missing at random, or that missing values were assumed to have a particular value such as a poor outcome.
 Follow the guidance in Chapter 8 to assess risk of bias due to missing outcome data in randomized trials.
 Perform sensitivity analyses to assess how sensitive results are to reasonable changes in the assumptions that are made (see Section 10.14 ).
 Address the potential impact of missing data on the findings of the review in the Discussion section.
10.12.3 Dealing with missing outcome data from individual participants
Review authors may undertake sensitivity analyses to assess the potential impact of missing outcome data, based on assumptions about the relationship between missingness in the outcome and its true value. Several methods are available (Akl et al 2015). For dichotomous outcomes, Higgins and colleagues propose a strategy involving different assumptions about how the risk of the event among the missing participants differs from the risk of the event among the observed participants, taking account of uncertainty introduced by the assumptions (Higgins et al 2008a). Akl and colleagues propose a suite of simple imputation methods, including a similar approach to that of Higgins and colleagues based on relative risks of the event in missing versus observed participants. Similar ideas can be applied to continuous outcome data (Ebrahim et al 2013, Ebrahim et al 2014). Particular care is required to avoid double counting events, since it can be unclear whether reported numbers of events in trial reports apply to the full randomized sample or only to those who did not drop out (Akl et al 2016).
Although there is a tradition of implementing ‘worst case’ and ‘best case’ analyses clarifying the extreme boundaries of what is theoretically possible, such analyses may not be informative for the most plausible scenarios (Higgins et al 2008a).
10.13 Bayesian approaches to metaanalysis
Bayesian statistics is an approach to statistics based on a different philosophy from that which underlies significance tests and confidence intervals. It is essentially about updating of evidence. In a Bayesian analysis, initial uncertainty is expressed through a prior distribution about the quantities of interest. Current data and assumptions concerning how they were generated are summarized in the likelihood . The posterior distribution for the quantities of interest can then be obtained by combining the prior distribution and the likelihood. The likelihood summarizes both the data from studies included in the metaanalysis (for example, 2×2 tables from randomized trials) and the metaanalysis model (for example, assuming a fixed effect or random effects). The result of the analysis is usually presented as a point estimate and 95% credible interval from the posterior distribution for each quantity of interest, which look much like classical estimates and confidence intervals. Potential advantages of Bayesian analyses are summarized in Box 10.13.a . Bayesian analysis may be performed using WinBUGS software (Smith et al 1995, Lunn et al 2000), within R (Röver 2017), or – for some applications – using standard metaregression software with a simple trick (Rhodes et al 2016).
A difference between Bayesian analysis and classical metaanalysis is that the interpretation is directly in terms of belief: a 95% credible interval for an odds ratio is that region in which we believe the odds ratio to lie with probability 95%. This is how many practitioners actually interpret a classical confidence interval, but strictly in the classical framework the 95% refers to the longterm frequency with which 95% intervals contain the true value. The Bayesian framework also allows a review author to calculate the probability that the odds ratio has a particular range of values, which cannot be done in the classical framework. For example, we can determine the probability that the odds ratio is less than 1 (which might indicate a beneficial effect of an experimental intervention), or that it is no larger than 0.8 (which might indicate a clinically important effect). It should be noted that these probabilities are specific to the choice of the prior distribution. Different metaanalysts may analyse the same data using different prior distributions and obtain different results. It is therefore important to carry out sensitivity analyses to investigate how the results depend on any assumptions made.
In the context of a metaanalysis, prior distributions are needed for the particular intervention effect being analysed (such as the odds ratio or the mean difference) and – in the context of a randomeffects metaanalysis – on the amount of heterogeneity among intervention effects across studies. Prior distributions may represent subjective belief about the size of the effect, or may be derived from sources of evidence not included in the metaanalysis, such as information from nonrandomized studies of the same intervention or from randomized trials of other interventions. The width of the prior distribution reflects the degree of uncertainty about the quantity. When there is little or no information, a ‘noninformative’ prior can be used, in which all values across the possible range are equally likely.
Most Bayesian metaanalyses use noninformative (or very weakly informative) prior distributions to represent beliefs about intervention effects, since many regard it as controversial to combine objective trial data with subjective opinion. However, prior distributions are increasingly used for the extent of amongstudy variation in a randomeffects analysis. This is particularly advantageous when the number of studies in the metaanalysis is small, say fewer than five or ten. Libraries of databased prior distributions are available that have been derived from reanalyses of many thousands of metaanalyses in the Cochrane Database of Systematic Reviews (Turner et al 2012).
Box 10.13.a Some potential advantages of Bayesian metaanalysis
Some potential advantages of Bayesian approaches over classical methods for metaanalyses are that they: of various clinical outcome states; ); ); ); and 
Statistical expertise is strongly recommended for review authors who wish to carry out Bayesian analyses. There are several good texts (Sutton et al 2000, Sutton and Abrams 2001, Spiegelhalter et al 2004).
10.14 Sensitivity analyses
The process of undertaking a systematic review involves a sequence of decisions. Whilst many of these decisions are clearly objective and noncontentious, some will be somewhat arbitrary or unclear. For instance, if eligibility criteria involve a numerical value, the choice of value is usually arbitrary: for example, defining groups of older people may reasonably have lower limits of 60, 65, 70 or 75 years, or any value in between. Other decisions may be unclear because a study report fails to include the required information. Some decisions are unclear because the included studies themselves never obtained the information required: for example, the outcomes of those who were lost to followup. Further decisions are unclear because there is no consensus on the best statistical method to use for a particular problem.
It is highly desirable to prove that the findings from a systematic review are not dependent on such arbitrary or unclear decisions by using sensitivity analysis (see MECIR Box 10.14.a ). A sensitivity analysis is a repeat of the primary analysis or metaanalysis in which alternative decisions or ranges of values are substituted for decisions that were arbitrary or unclear. For example, if the eligibility of some studies in the metaanalysis is dubious because they do not contain full details, sensitivity analysis may involve undertaking the metaanalysis twice: the first time including all studies and, second, including only those that are definitely known to be eligible. A sensitivity analysis asks the question, ‘Are the findings robust to the decisions made in the process of obtaining them?’
MECIR Box 10.14.a Relevant expectations for conduct of intervention reviews
Sensitivity analysis ( )  
 It is important to be aware when results are robust, since the strength of the conclusion may be strengthened or weakened. 
There are many decision nodes within the systematic review process that can generate a need for a sensitivity analysis. Examples include:
Searching for studies:
 Should abstracts whose results cannot be confirmed in subsequent publications be included in the review?
Eligibility criteria:
 Characteristics of participants: where a majority but not all people in a study meet an age range, should the study be included?
 Characteristics of the intervention: what range of doses should be included in the metaanalysis?
 Characteristics of the comparator: what criteria are required to define usual care to be used as a comparator group?
 Characteristics of the outcome: what time point or range of time points are eligible for inclusion?
 Study design: should blinded and unblinded outcome assessment be included, or should study inclusion be restricted by other aspects of methodological criteria?
What data should be analysed?
 Timetoevent data: what assumptions of the distribution of censored data should be made?
 Continuous data: where standard deviations are missing, when and how should they be imputed? Should analyses be based on change scores or on postintervention values?
 Ordinal scales: what cutpoint should be used to dichotomize short ordinal scales into two groups?
 Clusterrandomized trials: what values of the intraclass correlation coefficient should be used when trial analyses have not been adjusted for clustering?
 Crossover trials: what values of the withinsubject correlation coefficient should be used when this is not available in primary reports?
 All analyses: what assumptions should be made about missing outcomes? Should adjusted or unadjusted estimates of intervention effects be used?
Analysis methods:
 Should fixedeffect or randomeffects methods be used for the analysis?
 For dichotomous outcomes, should odds ratios, risk ratios or risk differences be used?
 For continuous outcomes, where several scales have assessed the same dimension, should results be analysed as a standardized mean difference across all scales or as mean differences individually for each scale?
Some sensitivity analyses can be prespecified in the study protocol, but many issues suitable for sensitivity analysis are only identified during the review process where the individual peculiarities of the studies under investigation are identified. When sensitivity analyses show that the overall result and conclusions are not affected by the different decisions that could be made during the review process, the results of the review can be regarded with a higher degree of certainty. Where sensitivity analyses identify particular decisions or missing information that greatly influence the findings of the review, greater resources can be deployed to try and resolve uncertainties and obtain extra information, possibly through contacting trial authors and obtaining individual participant data. If this cannot be achieved, the results must be interpreted with an appropriate degree of caution. Such findings may generate proposals for further investigations and future research.
Reporting of sensitivity analyses in a systematic review may best be done by producing a summary table. Rarely is it informative to produce individual forest plots for each sensitivity analysis undertaken.
Sensitivity analyses are sometimes confused with subgroup analysis. Although some sensitivity analyses involve restricting the analysis to a subset of the totality of studies, the two methods differ in two ways. First, sensitivity analyses do not attempt to estimate the effect of the intervention in the group of studies removed from the analysis, whereas in subgroup analyses, estimates are produced for each subgroup. Second, in sensitivity analyses, informal comparisons are made between different ways of estimating the same thing, whereas in subgroup analyses, formal statistical comparisons are made across the subgroups.
10.15 Chapter information
Editors: Jonathan J Deeks, Julian PT Higgins, Douglas G Altman; on behalf of the Cochrane Statistical Methods Group
Contributing authors: Douglas Altman, Deborah Ashby, Jacqueline Birks, Michael Borenstein, Marion Campbell, Jonathan Deeks, Matthias Egger, Julian Higgins, Joseph Lau, Keith O’Rourke, Gerta Rücker, Rob Scholten, Jonathan Sterne, Simon Thompson, Anne Whitehead
Acknowledgements: We are grateful to the following for commenting helpfully on earlier drafts: Bodil AlsNielsen, Deborah Ashby, Jesse Berlin, Joseph Beyene, Jacqueline Birks, Michael Bracken, Marion Campbell, Chris Cates, Wendong Chen, Mike Clarke, Albert Cobos, Esther Coren, Francois Curtin, Roberto D’Amico, Keith Dear, Heather Dickinson, Diana Elbourne, Simon Gates, Paul Glasziou, Christian Gluud, Peter Herbison, Sally Hollis, David Jones, Steff Lewis, Tianjing Li, Joanne McKenzie, Philippa Middleton, Nathan Pace, Craig Ramsey, Keith O’Rourke, Rob Scholten, Guido Schwarzer, Jack Sinclair, Jonathan Sterne, Simon Thompson, Andy Vail, Clarine van Oel, Paula Williamson and Fred Wolf.
Funding: JJD received support from the National Institute for Health Research (NIHR) Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. JPTH is a member of the NIHR Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University of Bristol. JPTH received funding from National Institute for Health Research Senior Investigator award NFSI061710145. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
10.16 References
Agresti A. An Introduction to Categorical Data Analysis . New York (NY): John Wiley & Sons; 1996.
Akl EA, Kahale LA, Agoritsas T, BrignardelloPetersen R, Busse JW, CarrascoLabra A, Ebrahim S, Johnston BC, Neumann I, Sola I, Sun X, Vandvik P, Zhang Y, AlonsoCoello P, Guyatt G. Handling trial participants with missing outcome data when conducting a metaanalysis: a systematic survey of proposed approaches. Systematic Reviews 2015; 4 : 98.
Akl EA, Kahale LA, Ebrahim S, AlonsoCoello P, Schünemann HJ, Guyatt GH. Three challenges described for identifying participants with missing data in trials reports, and potential solutions suggested to systematic reviewers. Journal of Clinical Epidemiology 2016; 76 : 147154.
Altman DG, Bland JM. Detecting skewness from summary information. BMJ 1996; 313 : 1200.
AnzuresCabrera J, Sarpatwari A, Higgins JPT. Expressing findings from metaanalyses of continuous outcomes in terms of risks. Statistics in Medicine 2011; 30 : 29672985.
Berlin JA, Longnecker MP, Greenland S. Metaanalysis of epidemiologic doseresponse data. Epidemiology 1993; 4 : 218228.
Berlin JA, Antman EM. Advantages and limitations of metaanalytic regressions of clinical trials data. Online Journal of Current Clinical Trials 1994; Doc No 134 .
Berlin JA, Santanna J, Schmid CH, Szczech LA, Feldman KA, Group ALAITS. Individual patient versus grouplevel data metaregressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Statistics in Medicine 2002; 21 : 371387.
Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. A basic introduction to fixedeffect and randomeffects models for metaanalysis. Research Synthesis Methods 2010; 1 : 97111.
Borenstein M, Higgins JPT. Metaanalysis and subgroups. Prev Sci 2013; 14 : 134143.
Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado about nothing: a comparison of the performance of metaanalytical methods with rare events. Statistics in Medicine 2007; 26 : 5377.
Chinn S. A simple method for converting an odds ratio to effect size for use in metaanalysis. Statistics in Medicine 2000; 19 : 31273131.
da Costa BR, Nuesch E, Rutjes AW, Johnston BC, Reichenbach S, Trelle S, Guyatt GH, Jüni P. Combining followup and change data is valid in metaanalyses of continuous outcomes: a metaepidemiological study. Journal of Clinical Epidemiology 2013; 66 : 847855.
Deeks JJ. Systematic reviews of published evidence: Miracles or minefields? Annals of Oncology 1998; 9 : 703709.
Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from several studies in metaanalysis. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Metaanalysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 285312.
Deeks JJ. Issues in the selection of a summary statistic for metaanalysis of clinical trials with binary outcomes. Statistics in Medicine 2002; 21 : 15751600.
DerSimonian R, Laird N. Metaanalysis in clinical trials. Controlled Clinical Trials 1986; 7 : 177188.
DiGuiseppi C, Higgins JPT. Interventions for promoting smoke alarm ownership and function. Cochrane Database of Systematic Reviews 2001; 2 : CD002246.
Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, HeelsAnsdell D, AlonsoCoello P, Johnston BC, Guyatt GH. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2013; 66 : 10141021 e1011.
Ebrahim S, Johnston BC, Akl EA, Mustafa RA, Sun X, Walter SD, HeelsAnsdell D, AlonsoCoello P, Guyatt GH. Addressing continuous data measured with different instruments for participants excluded from trial analysis: a guide for systematic reviewers. Journal of Clinical Epidemiology 2014; 67 : 560570.
Efthimiou O. Practical guide to the metaanalysis of rare events. EvidenceBased Mental Health 2018; 21 : 7276.
Egger M, Davey Smith G, Schneider M, Minder C. Bias in metaanalysis detected by a simple, graphical test. BMJ 1997; 315 : 629634.
Engels EA, Schmid CH, Terrin N, Olkin I, Lau J. Heterogeneity and statistical significance in metaanalysis: an empirical study of 125 metaanalyses. Statistics in Medicine 2000; 19 : 17071728.
Greenland S, Robins JM. Estimation of a common effect parameter from sparse followup data. Biometrics 1985; 41 : 5568.
Greenland S. Quantitative methods in the review of epidemiologic literature. Epidemiologic Reviews 1987; 9 : 130.
Greenland S, Longnecker MP. Methods for trend estimation from summarized doseresponse data, with applications to metaanalysis. American Journal of Epidemiology 1992; 135 : 13011309.
Guevara JP, Berlin JA, Wolf FM. Metaanalytic methods for pooling rates when followup duration varies: a case study. BMC Medical Research Methodology 2004; 4 : 17.
Hartung J, Knapp G. A refined method for the metaanalysis of controlled clinical trials with binary outcome. Statistics in Medicine 2001; 20 : 38753889.
Hasselblad V, McCrory DC. Metaanalytic tools for medical decision making: A practical guide. Medical Decision Making 1995; 15 : 8196.
Higgins JPT, Thompson SG. Quantifying heterogeneity in a metaanalysis. Statistics in Medicine 2002; 21 : 15391558.
Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in metaanalyses. BMJ 2003; 327 : 557560.
Higgins JPT, Thompson SG. Controlling the risk of spurious findings from metaregression. Statistics in Medicine 2004; 23 : 16631682.
Higgins JPT, White IR, Wood AM. Imputation methods for missing outcome data in metaanalysis of clinical trials. Clinical Trials 2008a; 5 : 225239.
Higgins JPT, White IR, AnzuresCabrera J. Metaanalysis of skewed data: combining results reported on logtransformed or raw scales. Statistics in Medicine 2008b; 27 : 60726092.
Higgins JPT, Thompson SG, Spiegelhalter DJ. A reevaluation of randomeffects metaanalysis. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172 : 137159.
Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality and discrepancies between large and small randomized trials in metaanalyses. Annals of Internal Medicine 2001; 135 : 982989.
Langan D, Higgins JPT, Simmonds M. An empirical comparison of heterogeneity variance estimators in 12 894 metaanalyses. Research Synthesis Methods 2015; 6 : 195205.
Langan D, Higgins JPT, Simmonds M. Comparative performance of heterogeneity variance estimators in metaanalysis: a review of simulation studies. Research Synthesis Methods 2017; 8 : 181198.
Langan D, Higgins JPT, Jackson D, Bowden J, Veroniki AA, Kontopantelis E, Viechtbauer W, Simmonds M. A comparison of heterogeneity variance estimators in simulated randomeffects metaanalyses. Research Synthesis Methods 2019; 10 : 8398.
Lewis S, Clarke M. Forest plots: trying to see the wood and the trees. BMJ 2001; 322 : 14791480.
Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS  A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing 2000; 10 : 325337.
Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute 1959; 22 : 719748.
McIntosh MW. The population risk as an explanatory variable in research synthesis of clinical trials. Statistics in Medicine 1996; 15 : 17131728.
Morgenstern H. Uses of ecologic analysis in epidemiologic research. American Journal of Public Health 1982; 72 : 13361344.
Oxman AD, Guyatt GH. A consumers guide to subgroup analyses. Annals of Internal Medicine 1992; 116 : 7884.
Peto R, Collins R, Gray R. Largescale randomized evidence: large, simple trials and overviews of trials. Journal of Clinical Epidemiology 1995; 48 : 2340.
Poole C, Greenland S. Randomeffects metaanalyses are not always conservative. American Journal of Epidemiology 1999; 150 : 469475.
Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ, Higgins JPT. Implementing informative priors for heterogeneity in metaanalysis using metaregression and pseudo data. Statistics in Medicine 2016; 35 : 54955511.
Rice K, Higgins JPT, Lumley T. A reevaluation of fixed effect(s) metaanalysis. Journal of the Royal Statistical Society Series A (Statistics in Society) 2018; 181 : 205227.
Riley RD, Higgins JPT, Deeks JJ. Interpretation of random effects metaanalyses. BMJ 2011; 342 : d549.
Röver C. Bayesian randomeffects metaanalysis using the bayesmeta R package 2017. https://arxiv.org/abs/1711.08683 .
Rücker G, Schwarzer G, Carpenter J, Olkin I. Why add anything to nothing? The arcsine difference as a measure of treatment effect in metaanalysis with zero cells. Statistics in Medicine 2009; 28 : 721738.
Sharp SJ. Analysing the relationship between treatment benefit and underlying risk: precautions and practical recommendations. In: Egger M, Davey Smith G, Altman DG, editors. Systematic Reviews in Health Care: Metaanalysis in Context . 2nd edition ed. London (UK): BMJ Publication Group; 2001. p. 176188.
Sidik K, Jonkman JN. A simple confidence interval for metaanalysis. Statistics in Medicine 2002; 21 : 31533159.
Simmonds MC, Tierney J, Bowden J, Higgins JPT. Metaanalysis of timetoevent data: a comparison of twostage methods. Research Synthesis Methods 2011; 2 : 139149.
Sinclair JC, Bracken MB. Clinically useful measures of effect in binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47 : 881889.
Smith TC, Spiegelhalter DJ, Thomas A. Bayesian approaches to randomeffects metaanalysis: a comparative study. Statistics in Medicine 1995; 14 : 26852699.
Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and HealthCare Evaluation . Chichester (UK): John Wiley & Sons; 2004.
Spittal MJ, Pirkis J, Gurrin LC. Metaanalysis of incidence rate data in the presence of zero events. BMC Medical Research Methodology 2015; 15 : 42.
Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Metaanalysis in Medical Research . Chichester (UK): John Wiley & Sons; 2000.
Sutton AJ, Abrams KR. Bayesian methods in metaanalysis and evidence synthesis. Statistical Methods in Medical Research 2001; 10 : 277303.
Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in metaanalysis of sparse data. Statistics in Medicine 2004; 23 : 13511375.
Thompson SG, Smith TC, Sharp SJ. Investigating underlying risk as a source of heterogeneity in metaanalysis. Statistics in Medicine 1997; 16 : 27412758.
Thompson SG, Sharp SJ. Explaining heterogeneity in metaanalysis: a comparison of methods. Statistics in Medicine 1999; 18 : 26932708.
Thompson SG, Higgins JPT. How should metaregression analyses be undertaken and interpreted? Statistics in Medicine 2002; 21 : 15591574.
Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT. Predicting the extent of heterogeneity in metaanalysis, using empirical data from the Cochrane Database of Systematic Reviews. International Journal of Epidemiology 2012; 41 : 818827.
Veroniki AA, Jackson D, Viechtbauer W, Bender R, Bowden J, Knapp G, Kuss O, Higgins JPT, Langan D, Salanti G. Methods to estimate the betweenstudy variance and its uncertainty in metaanalysis. Research Synthesis Methods 2016; 7 : 5579.
Whitehead A, Jones NMB. A metaanalysis of clinical trials involving different classifications of response into ordered categories. Statistics in Medicine 1994; 13 : 25032515.
Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Progress in Cardiovascular Diseases 1985; 27 : 335371.
Yusuf S, Wittes J, Probstfield J, Tyroler HA. Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 1991; 266 : 9398.
For permission to reuse material from the Handbook (either academic or commercial), please see here for full details.
 Research Process
 Manuscript Preparation
 Manuscript Review
 Publication Process
 Publication Recognition
 Language Editing Services
 Translation Services
Systematic Review VS MetaAnalysis
 3 minute read
 66.9K views
Table of Contents
How you organize your research is incredibly important; whether you’re preparing a report, research review, thesis or an article to be published. What methodology you choose can make or break your work getting out into the world, so let’s take a look at two main types: systematic review and metaanalysis.
Let’s start with what they have in common – essentially, they are both based on highquality filtered evidence related to a specific research topic. They’re both highly regarded as generally resulting in reliable findings, though there are differences, which we’ll discuss below. Additionally, they both support conclusions based on expert reviews, casecontrolled studies, data analysis, etc., versus mere opinions and musings.
What is a Systematic Review?
A systematic review is a form of research done collecting, appraising and synthesizing evidence to answer a particular question, in a very transparent and systematic way. Data (or evidence) used in systematic reviews have their origin in scholarly literature – published or unpublished. So, findings are typically very reliable. In addition, they are normally collated and appraised by an independent panel of experts in the field. Unlike traditional reviews, systematic reviews are very comprehensive and don’t rely on a single author’s point of view, thus avoiding bias.
Systematic reviews are especially important in the medical field, where health practitioners need to be constantly uptodate with new, highquality information to lead their daily decisions. Since systematic reviews, by definition, collect information from previous research, the pitfalls of new primary studies is avoided. They often, in fact, identify lack of evidence or knowledge limitations, and consequently recommend further study, if needed.
Why are systematic reviews important?
 They combine and synthesize various studies and their findings.
 Systematic reviews appraise the validity of the results and findings of the collected studies in an impartial way.
 They define clear objectives and reproducible methodologies.
What is a Metaanalysis?
This form of research relies on combining statistical results from two or more existing studies. When multiple studies are addressing the same problem or question, it’s to be expected that there will be some potential for error. Most studies account for this within their results. A metaanalysis can help iron out any inconsistencies in data, as long as the studies are similar.
For instance, if your research is about the influence of the Mediterranean diet on diabetic people, between the ages of 30 and 45, but you only find a study about the Mediterranean diet in healthy people and another about the Mediterranean diet in diabetic teenagers. In this case, undertaking a metaanalysis would probably be a poor choice. You can either pursue the idea of comparing such different material, at the risk of findings that don’t really answer the review question. Or, you can decide to explore a different research method (perhaps more qualitative).
Why is metaanalysis important?
 They help improve precision about evidence since many studies are too small to provide convincing data.
 Metaanalyses can settle divergences between conflicting studies. By formally assessing the conflicting study results, it is possible to eventually reach new hypotheses and explore the reasons for controversy.
 They can also answer questions with a broader influence than individual studies. For example, the effect of a disease on several populations across the world, by comparing other modest research studies completed in specific countries or continents.
Undertaking research approaches, like systematic reviews and/or metaanalysis, involve great responsibility. They provide reliable information that has a real impact on society. Elsevier offers a number of services that aim to help researchers achieve excellence in written text, suggesting the necessary amendments to fit them into a targeted format. A perfectly written text, whether translated or edited from a manuscript, is the key to being respected within the scientific community, leading to more and more important positions like, let’s say…being part of an expert panel leading a systematic review or a widely acknowledged metaanalysis.
Check why it’s important to manage research data .
Language Editing Services by Elsevier Author Services:
What is a Good Hindex?
What is a Research Gap
You may also like.
Essential for HighQuality Paper Editing: Three Tips to Efficient Spellchecks
If You’re a Researcher, Remember These Before You Are Submitting Your Manuscript to Journals!
Navigating “Chinglish” Errors in Academic English Writing
Is The Use of AI in Manuscript Editing Feasible? Here’s Three Tips to Steer Clear of Potential Issues
A profound editing experience with Englishspeaking experts: Elsevier Language Services to learn more!
Research Fraud: Falsification and Fabrication in Research Data
Professor Anselmo Paiva: Using Computer Vision to Tackle Medical Issues with a Little Help from Elsevier Author Services
What is the main purpose of proofreading a paper?
Input your search keywords and press Enter.
Study Design 101: MetaAnalysis
 Case Report
 Case Control Study
 Cohort Study
 Randomized Controlled Trial
 Practice Guideline
 Systematic Review
MetaAnalysis
 Helpful Formulas
 Finding Specific Study Types
A subset of systematic reviews; a method for systematically combining pertinent qualitative and quantitative study data from several selected studies to develop a single conclusion that has greater statistical power. This conclusion is statistically stronger than the analysis of any single study, due to increased numbers of subjects, greater diversity among subjects, or accumulated effects and results.
Metaanalysis would be used for the following purposes:
 To establish statistical significance with studies that have conflicting results
 To develop a more correct estimate of effect magnitude
 To provide a more complex analysis of harms, safety data, and benefits
 To examine subgroups with individual numbers that are not statistically significant
If the individual studies utilized randomized controlled trials (RCT), combining several selected RCT results would be the highestlevel of evidence on the evidence hierarchy, followed by systematic reviews, which analyze all available studies on a topic.
 Greater statistical power
 Confirmatory data analysis
 Greater ability to extrapolate to general population affected
 Considered an evidencebased resource
Disadvantages
 Difficult and time consuming to identify appropriate studies
 Not all studies provide adequate data for inclusion and analysis
 Requires advanced statistical techniques
 Heterogeneity of study populations
Design pitfalls to look out for
The studies pooled for review should be similar in type (i.e. all randomized controlled trials).
Are the studies being reviewed all the same type of study or are they a mixture of different types?
The analysis should include published and unpublished results to avoid publication bias.
Does the metaanalysis include any appropriate relevant studies that may have had negative outcomes?
Fictitious Example
Do individuals who wear sunscreen have fewer cases of melanoma than those who do not wear sunscreen? A MEDLINE search was conducted using the terms melanoma, sunscreening agents, and zinc oxide, resulting in 8 randomized controlled studies, each with between 100 and 120 subjects. All of the studies showed a positive effect between wearing sunscreen and reducing the likelihood of melanoma. The subjects from all eight studies (total: 860 subjects) were pooled and statistically analyzed to determine the effect of the relationship between wearing sunscreen and melanoma. This metaanalysis showed a 50% reduction in melanoma diagnosis among sunscreenwearers.
Reallife Examples
Goyal, A., Elminawy, M., Kerezoudis, P., Lu, V., Yolcu, Y., Alvi, M., & Bydon, M. (2019). Impact of obesity on outcomes following lumbar spine surgery: A systematic review and metaanalysis. Clinical Neurology and Neurosurgery, 177 , 2736. https://doi.org/10.1016/j.clineuro.2018.12.012
This metaanalysis was interested in determining whether obesity affects the outcome of spinal surgery. Some previous studies have shown higher perioperative morbidity in patients with obesity while other studies have not shown this effect. This study looked at surgical outcomes including "blood loss, operative time, length of stay, complication and reoperation rates and functional outcomes" between patients with and without obesity. A metaanalysis of 32 studies (23,415 patients) was conducted. There were no significant differences for patients undergoing minimally invasive surgery, but patients with obesity who had open surgery had experienced higher blood loss and longer operative times (not clinically meaningful) as well as higher complication and reoperation rates. Further research is needed to explore this issue in patients with morbid obesity.
Nakamura, A., van Der Waerden, J., Melchior, M., Bolze, C., ElKhoury, F., & Pryor, L. (2019). Physical activity during pregnancy and postpartum depression: Systematic review and metaanalysis. Journal of Affective Disorders, 246 , 2941. https://doi.org/10.1016/j.jad.2018.12.009
This metaanalysis explored whether physical activity during pregnancy prevents postpartum depression. Seventeen studies were included (93,676 women) and analysis showed a "significant reduction in postpartum depression scores in women who were physically active during their pregnancies when compared with inactive women." Possible limitations or moderators of this effect include intensity and frequency of physical activity, type of physical activity, and timepoint in pregnancy (e.g. trimester).
Related Terms
A document often written by a panel that provides a comprehensive review of all relevant studies on a particular clinical or healthrelated topic/question.
Publication Bias
A phenomenon in which studies with positive results have a better chance of being published, are published earlier, and are published in journals with higher impact factors. Therefore, conclusions based exclusively on published studies can be misleading.
Now test yourself!
1. A MetaAnalysis pools together the sample populations from different studies, such as Randomized Controlled Trials, into one statistical analysis and treats them as one large sample population with one conclusion.
a) True b) False
2. One potential design pitfall of MetaAnalyses that is important to pay attention to is:
a) Whether it is evidencebased. b) If the authors combined studies with conflicting results. c) If the authors appropriately combined studies so they did not compare apples and oranges. d) If the authors used only quantitative data.
Evidence Pyramid  Navigation
 Meta Analysis
 Case Reports
 << Previous: Systematic Review
 Next: Helpful Formulas >>
 Last Updated: Sep 25, 2023 10:59 AM
 URL: https://guides.himmelfarb.gwu.edu/studydesign101
 Himmelfarb Intranet
 Privacy Notice
 Terms of Use
 GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .
 Himmelfarb Health Sciences Library
 2300 Eye St., NW, Washington, DC 20037
 Phone: (202) 9942962
 [email protected]
 https://himmelfarb.gwu.edu
Advertisement
MetaAnalysis and MetaSynthesis Methodologies: Rigorously Piecing Together Research
 Original Paper
 Published: 18 June 2018
 Volume 62 , pages 525–534, ( 2018 )
Cite this article
 Heather Leary ORCID: orcid.org/000000022487578X 1 &
 Andrew Walker 2
5873 Accesses
37 Citations
7 Altmetric
Explore all metrics
For a variety of reasons, education research can be difficult to summarize. Varying contexts, designs, levels of quality, measurement challenges, definition of underlying constructs, and treatments as well as the complexity of research subjects themselves can result in variability. Education research is voluminous and draws on multiple methods including quantitative, as well as, qualitative approaches to answer key research questions. With increased numbers of empirical research in Instructional Design and Technology (IDT), using various synthesis methods can provide a means to more deeply understand trends and patterns in research findings across multiple studies. The purpose of this article is to illustrate structured review or metasynthesis procedures for qualitative research, as well as, novel metaanalysis procedures for the kinds of multiple treatment designs common to IDT settings. Sample analyses are used to discuss key methodological ideas as a way to introduce researchers to these techniques.
This is a preview of subscription content, log in via an institution to check access.
Access this article
Subscribe and save.
 Get 10 units per month
 Download Article/Chapter or eBook
 1 Unit = 1 Article or 1 Chapter
 Cancel anytime
Price includes VAT (Russian Federation)
Instant access to the full article PDF.
Rent this article via DeepDyve
Institutional subscriptions
Similar content being viewed by others
Trends and Issues in Qualitative Research Methods
Educational design research: grappling with methodological fit
Gauging the Effectiveness of Educational Technology Integration in Education: What the BestQuality MetaAnalyses Tell Us
Explore related subjects.
 Artificial Intelligence
 Digital Education and Educational Technology
AttrideStirling, J. (2001). Thematic networks: An analytic tool for qualitative research. Qualitative Research, 1 (3), 385–405.
Article Google Scholar
BarnettPage, E., & Thomas, J. (2009). Methods for the synthesis of qualitative research: a critical review. ESRC National Centre for Research Methods, Social Science Research Unit, Institute of Education, University of London, London, 01/09.
Barton, E. E., Pustejovsky, J. E., Maggin, D. M., & Reichow, B. (2017). Technologyaided instruction and intervention for students with ASD: A metaanalysis using novel methods of estimating effect sizes for singlecase research. Remedial and Special Education, 38 (6), 371–386.
Beal, C. R., Arroyo, I., Cohen, P. R., & Woolf, B. P. (2010). Evaluation of AnimalWatch: An intelligent tutoring system for arithmetic and fractions. Journal of Interactive Online Learning, 9 (1), 64–77.
Google Scholar
Belland, B., Walker, A., Kim, N., & Lefler, M. (2017a). Synthesizing results from empirical research on computerbased scaffolding in STEM education: A metaanalysis. Review of Educational Research, 82 (2), 309–344.
Belland, B., Walker, A., & Kim, N. (2017b). A bayesian network metaanalysis to synthesize the influence of contexts of scaffolding use on cognitive outcomes in STEM education. Review of Educational Research, 87 (6), 1042–1081.
Bhatnagar, N., Lakshmi, P. V. M., & Jeyashree, K. (2014). Multiple treatment and indirect treatment comparisons: An overview of network metaanalysis. Perspectives in Clinical Research, 5 (4), 154–158. https://doi.org/10.4103/22293485.140550 .
Boote, D. N., & Beile, P. (2005). Scholars before researchers: On the centrality of the dissertation literature review in research preparation. Educational Researcher, 34 (6), 3–15. https://doi.org/10.3102/0013189X034006003 .
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to metaanalysis . Chichester: Wiley.
Book Google Scholar
Bulu, S., & Pedersen, S. (2010). Scaffolding middle school students’ content knowledge and illstructured problem solving in a problembased hypermedia learning environment. Educational Technology Research & Development, 58 (5), 507–529.
Cooper, H. (2009). Research synthesis and metaanalysis: A stepbystep approach (applied social research methods) . Thousand Oaks: Sage Publications.
Cooper, H. (2016). Research synthesis and metaanalysis: A stepbystep approach (5th ed.). Thousand Oaks: Sage publications.
Cooper, H., & Hedges, L. V. (1994). The handbook of research synthesis . New York: Russell Sage Foundation.
Finfgeld, D. L. (2003). Metasynthesis: The state of the art  so far. Qualitative Health Research, 13 (7), 893–904.
FinfgeldConnett, D. (2010). Generalizability and transferability of metasynthesis research findings. Journal of Advanced Nursing, 66 (2), 246–254.
FinfgeldConnett, D. (2014). Metasynthesis findings: Potential versus reality. Qualitative Health Research, 24 , 1581–1591. https://doi.org/10.1177/1049732314548878 .
FinfgeldConnett, D. (2016). The future of theorygenerating metasynthesis research. Qualitative Health Research, 26 (3), 291–293.
Finlayson, K., & Dixon, A. (2008). Qualitative metasynthesis: A guide for the novice. Nurse Researcher, 15 (2), 59–71.
Glass, G. V. (1976). Primary, secondary, and metaanalysis of research. Educational Researcher, 5 (10), 3–8.
Glass, G. V. (2000). Metaanalysis at 25 . http://www.gvglass.info/papers/meta25.html
Gurevitch, J., Koricheva, J., Nakagawa, S., & Stewart, G. (2018). Metaanalysis and the science of research synthesis. Nature, 555 , 175–182.
Harden, A. (2010). Mixedmethods systematic reviews: Integrating quantitative and qualitative findings. Focus Technical Brief, 25 , 1–8.
Heller, R. (1990). The role of hypermedia in education: A look at the research issues. Journal of Research on Computing in Education, 22 , 431–441.
Higgins, S. (2016). Metasynthesis and comparative metaanalysis of education research findings: Some risks and benefits. Review of Education, 4 (1), 31–53.
Higgins, J. P., & Green, S. (2008). Cochrane handbook for systematic reviews of interventions . Chichester: Wiley. https://doi.org/10.1002/9780470712184 .
Higgins, J. P. T., Altman, D. G., Gøtzsche, P. C., Jüni, P., Moher, D., Oxman, A. D., … Sterne, J. (2011). The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ, 343 . Retrieved from http://www.bmj.com/content/343/bmj.d5928
Karich, A. C., Burns, M. K., & Maki, K. E. (2014). Updated metaanalysis of learner control within educational technology. Review of Educational Research, 84 (3), 392–410.
Lachal, J., RevahLevy, A., Orri, M., & Moro, M. R. (2017). Metasynthesis: An original method to synthesize qualitative literature. Frontiers in Psychiatry, 8 , 1–9.
Lam, S. K. H., & Owen, A. (2007). Combined resynchronisation and implantable defibrillator therapy in left ventricular dysfunction: Bayesian network metaanalysis of randomised controlled trials. BMJ: British Medical Journal, 335 (7626), 925–928.
Leary, H., Shelton, B. E., & Walker, A. (2010). Rich visual media metaanalyses for learning: An approach at metasynthesis. Paper presented at the American Educational Research Association annual meeting, Denver, CO.
Lin, H. (2015). A metasynthesis of empirical research on the effectiveness of computermediated communication (CMC) in SLA. Language Learning & technology, 19 (2), 85–117 Retrieved from http://llt.msu.edu/issues/june2015/lin.pdf .
Lipsey, M., & Wilson, D. (2001). Practical metaanalysis (Applied social research methods) . Thousand Oaks: Sage Publications.
Liu, Y., Cornish, A., & Clegg, J. (2007). ICT and special educational needs: Using metasynthesis for bridging the multifaceted divide. In Y. Shi, G. D. van Albada, J. Dongarra, & P. M. A. Sloot (Eds.), Computational science–ICCS 2007. ICCS 2007. Lecture notes in computer science (Vol. 4490). Berlin, Heidelberg: Springer.
Ludvigsen, M. S., Hall, E. O. C., Meyer, G., Fegran, L., Aagaard, H., & Uhrenfeldt, L. (2016). Using Sandelowski and Barroso’s metasynthesis method in advancing qualitative evidence. Qualitative Health Research, 26 (3), 320–329.
Lumley, T. (2002). Network metaanalysis for indirect treatment comparisons. Statistics in Medicine, 21 (16), 2313–2324. https://doi.org/10.1002/sim.1201 .
Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project: Evolution, critique and future directions. Statistics in Medicine, 28 (25), 3049–3067.
McKenney, S., & Reeves, T. C. (2012). Conducting educational design research . London: Routledge.
Mumtaz, S. (2000). Factors affecting teachers’ use of information and communications technology: A review of the literature. Journal of Information Technology for Teacher Education, 9 (3), 319–342.
Noblit, G. W., & Hare, R. D. (1988). Metaethnography: Synthesizing qualitative studies . London: Sage.
Paterson, B. L., Thorne, S. E., Canam, C., & Jillings, C. (2001). Metastudy of qualitative health research: A practical guide to metaanalysis and metasynthesis . London: Sage.
Puhan, M. A., Schünemann, H. J., Murad, M. H., Li, T., BrignardelloPetersen, R., Singh, J. A., … Guyatt, G. H. (2014). A GRADE Working Group approach for rating the quality of treatment effect estimates from network metaanalysis. BMJ, 349, g5630.
Reeves, T., & Oh, E. (2017). The goals and methods of educational technology research over a quarter century (1989–2014). Educational Technology Research & Development, 65 (2), 325–339.
Salanti, G., Giovane, C. D., Chaimani, A., Caldwell, D. M., & Higgins, J. P. T. (2014). Evaluating the quality of evidence from a network metaanalysis. PLOS ONE, 9 (7), e99682.
Sandelowski, M., & Barroso, J. (2003). Toward a metasynthesis of qualitative findings on motherhood in HIVpositive women. Research in Nursing and Health, 26 (2), 153–170.
Sandelowski, M., Docherty, S., & Emden, C. (1997). Quality metasynthesis: Issues and techniques. Research in Nursing and Health, 20 , 365–371.
Taquero, J. M. (2011). A metaethnographic synthesis of support services in distance learning programs. Journal of Information Technology Education: Innovations in Practice, 10 , 157–179.
Thorne, S., Jensen, L., Kearney, M. H., Noblit, G., & Sandelowski, M. (2004). Qualitative metasynthesis: Reflection on methodological orientation and ideological agenda. Qualitative Health Research, 14 (10), 1342–1365.
Tondeur, J., van Braak, J., Sang, G., Voogt, J., Fisser, P., & OttenbreitLewftwich, A. (2012). Preparing preservice teachers to integrate technology in education: A synthesis of qualitative evidence. Computers & Education, 59 (1), 134–144.
Torraco, R. J. (2016). Writing integrative literature reviews: Using the past and present to explore the future. Human Resource Development Review, 15 (4), 404–428.
Walker, A., Belland, B., Kim, N., & Lefler, M. (2016). Searching for structure: A bayesian network metaanalysis of computerbased scaffolding research in STEM education. Paper presented at the American Educational Research Association, San Antonio, TX.
Walsh, D., & Downe, S. (2005). Metasynthesis method for qualitative research: A literature review. Methodological Issues in Nursing Education, 50 (2), 204–211.
Young, J. (2017). Technologyenhanced mathematics instruction: A secondorder metaanalysis of 30 years of research. Educational Research Review, 22 , 19–33.
Download references
Author information
Authors and affiliations.
Brigham Young University, 150G MCKB, Provo, UT, 84602, USA
Heather Leary
Utah State University, 2830 Old Main Hill, Logan, UT, 84322, USA
Andrew Walker
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Heather Leary .
Ethics declarations
Conflict of interest.
The authors declare that they have no conflict of interest.
Human Studies
This article does not contain any studies with human participants performed by any of the authors.
Rights and permissions
Reprints and permissions
About this article
Leary, H., Walker, A. MetaAnalysis and MetaSynthesis Methodologies: Rigorously Piecing Together Research. TechTrends 62 , 525–534 (2018). https://doi.org/10.1007/s1152801803127
Download citation
Published : 18 June 2018
Issue Date : September 2018
DOI : https://doi.org/10.1007/s1152801803127
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt contentsharing initiative
 Metaanalysis
 Metasynthesis
 Synthesis methods
 Education research
 Find a journal
 Publish with us
 Track your research
Cochrane UK is now closed
Due to changes in NIHR funding in evidence synthesis, Cochrane UK is now closed. We are so grateful to all our contributors and supporters.
If you would like to get involved with Cochrane, please visit the Join Cochrane pages
To learn more about Cochrane, visit Cochrane's homepage
If you would like to propose a topic for a Cochrane Review, please see our instructions for authors
If you have any queries, please contact Cochrane Support
Cochrane UK
Metaanalysis: what, why, and how.
This is an excerpt from a blog originally published on Students 4 Best Evidence
What is a metaanalysis?
Metaanalysis is a statistical technique for combining data from multiple studies on a particular topic.
Metaanalyses play a fundamental role in evidencebased healthcare. Compared to other study designs (such as randomized controlled trials or cohort studies), the metaanalysis comes in at the top of the evidencebased medicine pyramid. This is a pyramid which enables us to weigh up the different levels of evidence available to us. As we go up the pyramid, each level of evidence is less subject to bias than the level below it. Therefore, metaanalyses can be seen as the pinnacle of evidencebased medicine (1).
Metaanalyses began to appear as a leading part of research in the late 70s. Since then, they have become a common way for synthesizing evidence and summarizing the results of individual studies (2).
Read the full article here
We couldn’t find any results matching your search.
Please try using other words for your search or explore other sections of the website for relevant information.
We’re sorry, we are currently experiencing some issues, please try again later.
Our team is working diligently to resolve the issue. Thank you for your patience and understanding.
News & Insights
META Quantitative Stock Analysis
August 20, 2024 — 08:31 am EDT
Written by John Reese for Validea >
Below is Validea's guru fundamental report for META PLATFORMS INC ( META ) . Of the 22 guru strategies we follow, META rates highest using our P/B Growth Investor model based on the published strategy of Partha Mohanram . This growth model looks for low booktomarket stocks that exhibit characteristics associated with sustained future growth.
META PLATFORMS INC ( META ) is a largecap growth stock in the Business Services industry. The rating using this strategy is 88% based on the firm’s underlying fundamentals and the stock’s valuation. A score of 80% or above typically indicates that the strategy has some interest in the stock and a score above 90% typically indicates strong interest.
The following table summarizes whether the stock meets each of this strategy's tests. Not all criteria in the below table receive equal weighting or are independent, but the table provides a brief overview of the strong and weak points of the security in the context of the strategy's criteria.
BOOK/MARKET RATIO:  
RETURN ON ASSETS:  
CASH FLOW FROM OPERATIONS TO ASSETS:  
CASH FLOW FROM OPERATIONS TO ASSETS VS. RETURN ON ASSETS:  
RETURN ON ASSETS VARIANCE:  
SALES VARIANCE:  
ADVERTISING TO ASSETS:  
CAPITAL EXPENDITURES TO ASSETS:  
RESEARCH AND DEVELOPMENT TO ASSETS: 
Detailed Analysis of META PLATFORMS INC
META Guru Analysis
META Fundamental Analysis
More Information on Partha Mohanram
Partha Mohanram Portfolio
About Partha Mohanram : Sometimes the best investing strategies don't come from the world of investing. Sometimes research that changes the investing world can come from the halls of academia. Partha Mohanram is a great example of this. While academic research has shown that value investing works over time, it has found the opposite for growth investing. Mohanram turned that research on its head by developing a growth model that produced significant market outperformance. His research paper "Separating Winners from Losers among Low BooktoMarket Stocks using Financial Statement Analysis" looked at the criteria that can be used to separate growth stocks that continue their upward trajectory from those that don't. Mohanram is currently the John H. Watson Chair in Value Investing at the University of Toronto and was previously an Associate Professor at the Columbia Business School.
Additional Research Links
Top NASDAQ 100 Stocks
Top Technology Stocks
Magnificent Seven Stocks
High Momentum Stocks
Top AI Stocks
High Insider Ownership Stocks
About Validea : Validea is an investment research service that follows the published strategies of investment legends. Validea offers both stock analysis and model portfolios based on gurus who have outperformed the market over the longterm, including Warren Buffett, Benjamin Graham, Peter Lynch and Martin Zweig. For more information about Validea, click here
The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.
Stocks mentioned
More related articles.
This data feed is not available at this time.
Sign up for the TradeTalks newsletter to receive your weekly dose of trading news, trends and education. Delivered Wednesdays.
To add symbols:
 Type a symbol or company name. When the symbol you want to add appears, add it to My Quotes by selecting it and pressing Enter/Return.
 Copy and paste multiple symbols separated by spaces.
These symbols will be available throughout the site during your session.
Your symbols have been updated
Edit watchlist.
 Type a symbol or company name. When the symbol you want to add appears, add it to Watchlist by selecting it and pressing Enter/Return.
Opt in to Smart Portfolio
Smart Portfolio is supported by our partner TipRanks. By connecting my portfolio to TipRanks Smart Portfolio I agree to their Terms of Use .
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
 Publications
 Account settings
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
 Advanced Search
 Journal List
 Hippokratia
 v.14(Suppl 1); 2010 Dec
Metaanalysis in medical research
The objectives of this paper are to provide an introduction to metaanalysis and to discuss the rationale for this type of research and other general considerations. Methods used to produce a rigorous metaanalysis are highlighted and some aspects of presentation and interpretation of metaanalysis are discussed.
Metaanalysis is a quantitative, formal, epidemiological study design used to systematically assess previous research studies to derive conclusions about that body of research. Outcomes from a metaanalysis may include a more precise estimate of the effect of treatment or risk factor for disease, or other outcomes, than any individual study contributing to the pooled analysis. The examination of variability or heterogeneity in study results is also a critical outcome. The benefits of metaanalysis include a consolidated and quantitative review of a large, and often complex, sometimes apparently conflicting, body of literature. The specification of the outcome and hypotheses that are tested is critical to the conduct of metaanalyses, as is a sensitive literature search. A failure to identify the majority of existing studies can lead to erroneous conclusions; however, there are methods of examining data to identify the potential for studies to be missing; for example, by the use of funnel plots. Rigorously conducted metaanalyses are useful tools in evidencebased medicine. The need to integrate findings from many studies ensures that metaanalytic research is desirable and the large body of research now generated makes the conduct of this research feasible.
Important medical questions are typically studied more than once, often by different research teams in different locations. In many instances, the results of these multiple small studies of an issue are diverse and conflicting, which makes the clinical decisionmaking difficult. The need to arrive at decisions affecting clinical practise fostered the momentum toward "evidencebased medicine" 1 – 2 . Evidencebased medicine may be defined as the systematic, quantitative, preferentially experimental approach to obtaining and using medical information. Therefore, metaanalysis, a statistical procedure that integrates the results of several independent studies, plays a central role in evidencebased medicine. In fact, in the hierarchy of evidence ( Figure 1 ), where clinical evidence is ranked according to the strength of the freedom from various biases that beset medical research, metaanalyses are in the top. In contrast, animal research, laboratory studies, case series and case reports have little clinical value as proof, hence being in the bottom.
Metaanalysis did not begin to appear regularly in the medical literature until the late 1970s but since then a plethora of metaanalyses have emerged and the growth is exponential over time ( Figure 2 ) 3 . Moreover, it has been shown that metaanalyses are the most frequently cited form of clinical research 4 . The merits and perils of the somewhat mysterious procedure of metaanalysis, however, continue to be debated in the medical community 5 – 8 . The objectives of this paper are to introduce metaanalysis and to discuss the rationale for this type of research and other general considerations.
MetaAnalysis and Systematic Review
Glass first defined metaanalysis in the social science literature as "The statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings" 9 . Metaanalysis is a quantitative, formal, epidemiological study design used to systematically assess the results of previous research to derive conclusions about that body of research. Typically, but not necessarily, the study is based on randomized, controlled clinical trials. Outcomes from a metaanalysis may include a more precise estimate of the effect of treatment or risk factor for disease, or other outcomes, than any individual study contributing to the pooled analysis. Identifying sources of variation in responses; that is, examining heterogeneity of a group of studies, and generalizability of responses can lead to more effective treatments or modifications of management. Examination of heterogeneity is perhaps the most important task in metaanalysis. The Cochrane collaboration has been a longstanding, rigorous, and innovative leader in developing methods in the field 10 . Major contributions include the development of protocols that provide structure for literature search methods, and new and extended analytic and diagnostic methods for evaluating the output of metaanalyses. Use of the methods outlined in the handbook should provide a consistent approach to the conduct of metaanalysis. Moreover, a useful guide to improve reporting of systematic reviews and metaanalyses is the PRISMA (Preferred Reporting Items for Systematic reviews and Metaanalyses) statement that replaced the QUOROM (QUality Of Reporting of Metaanalyses) statement 11 – 13 .
Metaanalyses are a subset of systematic review. A systematic review attempts to collate empirical evidence that fits prespecified eligibility criteria to answer a specific research question. The key characteristics of a systematic review are a clearly stated set of objectives with predefined eligibility criteria for studies; an explicit, reproducible methodology; a systematic search that attempts to identify all studies that meet the eligibility criteria; an assessment of the validity of the findings of the included studies (e.g., through the assessment of risk of bias); and a systematic presentation and synthesis of the attributes and findings from the studies used. Systematic methods are used to minimize bias, thus providing more reliable findings from which conclusions can be drawn and decisions made than traditional review methods 14 , 15 . Systematic reviews need not contain a metaanalysisthere are times when it is not appropriate or possible; however, many systematic reviews contain metaanalyses 16 .
The inclusion of observational medical studies in metaanalyses led to considerable debate over the validity of metaanalytical approaches, as there was necessarily a concern that the observational studies were likely to be subject to unidentified sources of confounding and risk modification 17 . Pooling such findings may not lead to more certain outcomes. Moreover, an empirical study showed that in metaanalyses were both randomized and nonrandomized was included, nonrandomized studies tended to show larger treatment effects 18 .
Metaanalyses are conducted to assess the strength of evidence present on a disease and treatment. One aim is to determine whether an effect exists; another aim is to determine whether the effect is positive or negative and, ideally, to obtain a single summary estimate of the effect. The results of a metaanalysis can improve precision of estimates of effect, answer questions not posed by the individual studies, settle controversies arising from apparently conflicting studies, and generate new hypotheses. In particular, the examination of heterogeneity is vital to the development of new hypotheses.
Individual or Aggregated Data
The majority of metaanalyses are based on a series of studies to produce a point estimate of an effect and measures of the precision of that estimate. However, methods have been developed for the metaanalyses to be conducted on data obtained from original trials 19 , 20 . This approach may be considered the "gold standard" in metaanalysis because it offers advantages over analyses using aggregated data, including a greater ability to validate the quality of data and to conduct appropriate statistical analysis. Further, it is easier to explore differences in effect across subgroups within the study population than with aggregated data. The use of standardized individuallevel information may help to avoid the problems encountered in metaanalyses of prognostic factors 21 , 22 . It is the best way to obtain a more global picture of the natural history and predictors of risk for major outcomes, such as in scleroderma 23 – 26 .This approach relies on cooperation between researchers who conducted the relevant studies. Researchers who are aware of the potential to contribute or conduct these studies will provide and obtain additional benefits by careful maintenance of original databases and making these available for future studies.
Literature Search
A sound metaanalysis is characterized by a thorough and disciplined literature search. A clear definition of hypotheses to be investigated provides the framework for such an investigation. According to the PRISMA statement, an explicit statement of questions being addressed with reference to participants, interventions, comparisons, outcomes and study design (PICOS) should be provided 11 , 12 . It is important to obtain all relevant studies, because loss of studies can lead to bias in the study. Typically, published papers and abstracts are identified by a computerized literature search of electronic databases that can include PubMed ( www.ncbi.nlm.nih.gov./entrez/query.fcgi ), ScienceDirect ( www.sciencedirect.com ), Scirus ( www.scirus.com/srsapp ), ISI Web of Knowledge ( http://www.isiwebofknowledge.com ), Google Scholar ( http://scholar.google.com ) and CENTRAL (Cochrane Central Register of Controlled Trials, http://www.mrw.interscience.wiley.com/cochrane/cochrane_clcentral_articles_fs.htm ). PRISMA statement recommends that a full electronic search strategy for at least one major database to be presented 12 . Database searches should be augmented with hand searches of library resources for relevant papers, books, abstracts, and conference proceedings. Crosschecking of references, citations in review papers, and communication with scientists who have been working in the relevant field are important methods used to provide a comprehensive search. Communication with pharmaceutical companies manufacturing and distributing test products can be appropriate for studies examining the use of pharmaceutical interventions.
It is not feasible to find absolutely every relevant study on a subject. Some or even many studies may not be published, and those that are might not be indexed in computersearchable databases. Useful sources for unpublished trials are the clinical trials registers, such as the National Library of Medicine's ClinicalTrials.gov Website. The reviews should attempt to be sensitive; that is, find as many studies as possible, to minimize bias and be efficient. It may be appropriate to frame a hypothesis that considers the time over which a study is conducted or to target a particular subpopulation. The decision whether to include unpublished studies is difficult. Although language of publication can provide a difficulty, it is important to overcome this difficulty, provided that the populations studied are relevant to the hypothesis being tested.
Inclusion or Exclusion Criteria and Potential for Bias
Studies are chosen for metaanalysis based on inclusion criteria. If there is more than one hypothesis to be tested, separate selection criteria should be defined for each hypothesis. Inclusion criteria are ideally defined at the stage of initial development of the study protocol. The rationale for the criteria for study selection used should be clearly stated.
One important potential source of bias in metaanalysis is the loss of trials and subjects. Ideally, all randomized subjects in all studies satisfy all of the trial selection criteria, comply with all the trial procedures, and provide complete data. Under these conditions, an "intentiontotreat" analysis is straightforward to implement; that is, statistical analysis is conducted on all subjects that are enrolled in a study rather than those that complete all stages of study considered desirable. Some empirical studies had shown that certain methodological characteristics, such as poor concealment of treatment allocation or no blinding in studies exaggerate treatment effects 27 . Therefore, it is important to critically appraise the quality of studies in order to assess the risk of bias.
The study design, including details of the method of randomization of subjects to treatment groups, criteria for eligibility in the study, blinding, method of assessing the outcome, and handling of protocol deviations are important features defining study quality. When studies are excluded from a metaanalysis, reasons for exclusion should be provided for each excluded study. Usually, more than one assessor decides independently which studies to include or exclude, together with a welldefined checklist and a procedure that is followed when the assessors disagree. Two people familiar with the study topic perform the quality assessment for each study, independently. This is followed by a consensus meeting to discuss the studies excluded or included. Practically, the blinding of reviewers from details of a study such as authorship and journal source is difficult.
Before assessing study quality, a quality assessment protocol and data forms should be developed. The goal of this process is to reduce the risk of bias in the estimate of effect. Quality scores that summarize multiple components into a single number exist but are misleading and unhelpful 28 . Rather, investigators should use individual components of quality assessment and describe trials that do not meet the specified quality standards and probably assess the effect on the overall results by excluding them, as part of the sensitivity analyses.
Further, not all studies are completed, because of protocol failure, treatment failure, or other factors. Nonetheless, missing subjects and studies can provide important evidence. It is desirable to obtain data from all relevant randomized trials, so that the most appropriate analysis can be undertaken. Previous studies have discussed the significance of missing trials to the interpretation of intervention studies in medicine 29 , 30 . Journal editors and reviewers need to be aware of the existing bias toward publishing positive findings and ensure that papers that publish negative or even failed trials be published, as long as these meet the quality guidelines for publication.
There are occasions when authors of the selected papers have chosen different outcome criteria for their main analysis. In practice, it may be necessary to revise the inclusion criteria for a metaanalysis after reviewing all of the studies found through the search strategy. Variation in studies reflects the type of study design used, type and application of experimental and control therapies, whether or not the study was published, and, if published, subjected to peer review, and the definition used for the outcome of interest. There are no standardized criteria for inclusion of studies in metaanalysis. Universal criteria are not appropriate, however, because metaanalysis can be applied to a broad spectrum of topics. Published data in journal papers should also be crosschecked with conference papers to avoid repetition in presented data.
Clearly, unpublished studies are not found by searching the literature. It is possible that published studies are systemically different from unpublished studies; for example, positive trial findings may be more likely to be published. Therefore, a metaanalysis based on literature search results alone may lead to publication bias.
Efforts to minimize this potential bias include working from the references in published studies, searching computerized databases of unpublished material, and investigating other sources of information including conference proceedings, graduate dissertations and clinical trial registers.
Statistical analysis
The most common measures of effect used for dichotomous data are the risk ratio (also called relative risk) and the odds ratio. The dominant method used for continuous data are standardized mean difference (SMD) estimation. Methods used in metaanalysis for post hoc analysis of findings are relatively specific to metaanalysis and include heterogeneity analysis, sensitivity analysis, and evaluation of publication bias.
All methods used should allow for the weighting of studies. The concept of weighting reflects the value of the evidence of any particular study. Usually, studies are weighted according to the inverse of their variance 31 . It is important to recognize that smaller studies, therefore, usually contribute less to the estimates of overall effect. However, wellconducted studies with tight control of measurement variation and sources of confounding contribute more to estimates of overall effect than a study of identical size less well conducted.
One of the foremost decisions to be made when conducting a metaanalysis is whether to use a fixedeffects or a randomeffects model. A fixedeffects model is based on the assumption that the sole source of variation in observed outcomes is that occurring within the study; that is, the effect expected from each study is the same. Consequently, it is assumed that the models are homogeneous; there are no differences in the underlying study population, no differences in subject selection criteria, and treatments are applied the same way 32 . Fixedeffect methods used for dichotomous data include most often the MantelHaenzel method 33 and the Peto method 34 (only for odds ratios).
Randomeffects models have an underlying assumption that a distribution of effects exists, resulting in heterogeneity among study results, known as τ2. Consequently, as software has improved, randomeffects models that require greater computing power have become more frequently conducted. This is desirable because the strong assumption that the effect of interest is the same in all studies is frequently untenable. Moreover, the fixed effects model is not appropriate when statistical heterogeneity (τ2) is present in the results of studies in the metaanalysis. In the randomeffects model, studies are weighted with the inverse of their variance and the heterogeneity parameter. Therefore, it is usually a more conservative approach with wider confidence intervals than the fixedeffects model where the studies are weighted only with the inverse of their variance. The most commonly used randomeffects method is the DerSimonian and Laird method 35 . Furthermore, it is suggested that comparing the fixedeffects and randomeffect models developed as this process can yield insights to the data 36 .
Heterogeneity
Arguably, the greatest benefit of conducting metaanalysis is to examine sources of heterogeneity, if present, among studies. If heterogeneity is present, the summary measure must be interpreted with caution 37 . When heterogeneity is present, one should question whether and how to generalize the results. Understanding sources of heterogeneity will lead to more effective targeting of prevention and treatment strategies and will result in new research topics being identified. Part of the strategy in conducting a metaanalysis is to identify factors that may be significant determinants of subpopulation analysis or covariates that may be appropriate to explore in all studies.
To understand the nature of variability in studies, it is important to distinguish between different sources of heterogeneity. Variability in the participants, interventions, and outcomes studied has been described as clinical diversity, and variability in study design and risk of bias has been described as methodological diversity 10 . Variability in the intervention effects being evaluated among the different studies is known as statistical heterogeneity and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity manifests itself in the observed intervention effects varying by more than the differences expected among studies that would be attributable to random error alone. Usually, in the literature, statistical heterogeneity is simply referred to as heterogeneity.
Clinical variation will cause heterogeneity if the intervention effect is modified by the factors that vary across studies; most obviously, the specific interventions or participant characteristics that are often reflected in different levels of risk in the control group when the outcome is dichotomous. In other words, the true intervention effect will differ for different studies. Differences between studies in terms of methods used, such as use of blinding or differences between studies in the definition or measurement of outcomes, may lead to differences in observed effects. Significant statistical heterogeneity arising from differences in methods used or differences in outcome assessments suggests that the studies are not all estimating the same effect, but does not necessarily suggest that the true intervention effect varies. In particular, heterogeneity associated solely with methodological diversity indicates that studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this may not always be the case.
The scope of a metaanalysis will largely determine the extent to which studies included in a review are diverse. Metaanalysis should be conducted when a group of studies is sufficiently homogeneous in terms of subjects involved, interventions, and outcomes to provide a meaningful summary. However, it is often appropriate to take a broader perspective in a metaanalysis than in a single clinical trial. Combining studies that differ substantially in design and other factors can yield a meaningless summary result, but the evaluation of reasons for the heterogeneity among studies can be very insightful. It may be argued that these studies are of intrinsic interest on their own, even though it is not appropriate to produce a single summary estimate of effect.
Variation among k trials is usually assessed using Cochran's Q statistic, a chisquared (χ 2 ) test of heterogeneity with k1 degrees of freedom. This test has relatively poor power to detect heterogeneity among small numbers of trials; consequently, an αlevel of 0.10 is used to test hypotheses 38 , 39 .
Heterogeneity of results among trials is better quantified using the inconsistency index I 2 , which describes the percentage of total variation across studies 40 . Uncertainty intervals for I 2 (dependent on Q and k) are calculated using the method described by Higgins and Thompson 41 . Negative values of I 2 are put equal to zero, consequently I 2 lies between 0 and 100%. A value >75% may be considered substantial heterogeneity 41 . This statistic is less influenced by the number of trials compared with other methods used to estimate the heterogeneity and provides a logical and readily interpretable metric but it still can be unstable when only a few studies are combined 42 .
Given that there are several potential sources of heterogeneity in the data, several steps should be considered in the investigation of the causes. Although randomeffects models are appropriate, it may be still very desirable to examine the data to identify sources of heterogeneity and to take steps to produce models that have a lower level of heterogeneity, if appropriate. Further, if the studies examined are highly heterogeneous, it may be not appropriate to present an overall summary estimate, even when random effects models are used. As Petiti notes 43 , statistical analysis alone will not make contradictory studies agree; critically, however, one should use common sense in decisionmaking. Despite heterogeneity in responses, if all studies had a positive point direction and the pooled confidence interval did not include zero, it would not be logical to conclude that there was not a positive effect, provided that sufficient studies and subject numbers were present. The appropriateness of the point estimate of the effect is much more in question.
Some of the ways to investigate the reasons for heterogeneity; are subgroup analysis and metaregression. The subgroup analysis approach, a variation on those described above, groups categories of subjects (e.g., by age, sex) to compare effect sizes. The metaregression approach uses regression analysis to determine the influence of selected variables (the independent variables) on the effect size (the dependent variable). In a metaregresregression, studies are regarded as if they were individual patients, but their effects are properly weighted to account for their different variances 44 .
Sensitivity analyses have also been used to examine the effects of studies identified as being aberrant concerning conduct or result, or being highly influential in the analysis. Recently, another method has been proposed that reduces the weight of studies that are outliers in metaanalyses 45 . All of these methods for examining heterogeneity have merit, and the variety of methods available reflects the importance of this activity.
Presentation of results
A useful graph, presented in the PRISMA statement 11 , is the fourphase flow diagram ( Figure 3 ).
This flowdiagram depicts the flow of information through the different phases of a systematic review or metaanalysis. It maps out the number of records identified, included and excluded, and the reasons for exclusions. The results of metaanalyses are often presented in a forest plot, where each study is shown with its effect size and the corresponding 95% confidence interval ( Figure 4 ).
The pooled effect and 95% confidence interval is shown in the bottom in the same line with "Overall". In the right panel of Figure 4 , the cumulative metaanalysis is graphically displayed, where data are entered successively, typically in the order of their chronological appearance 46 , 47 . Such cumulative metaanalysis can retrospectively identify the point in time when a treatment effect first reached conventional levels of significance. Cumulative metaanalysis is a compelling way to examine trends in the evolution of the summaryeffect size, and to assess the impact of a specific study on the overall conclusions 46 . The figure shows that many studies were performed long after cumulative metaanalysis would have shown a significant beneficial effect of antibiotic prophylaxis in colon surgery.
Biases in metaanalysis
Although the intent of a metaanalysis is to find and assess all studies meeting the inclusion criteria, it is not always possible to obtain these. A critical concern is the papers that may have been missed. There is good reason to be concerned about this potential loss because studies with significant, positive results (positive studies) are more likely to be published and, in the case of interventions with a commercial value, to be promoted, than studies with nonsignificant or "negative" results (negative studies). Studies that produce a positive result, especially large studies, are more likely to have been published and, conversely, there has been a reluctance to publish small studies that have nonsignificant results. Further, publication bias is not solely the responsibility of editorial policy as there is reluctance among researchers to publish results that were either uninteresting or are not randomized 48 . There are, however, problems with simply including all studies that have failed to meet peerreview standards. All methods of retrospectively dealing with bias in studies are imperfect.
It is important to examine the results of each metaanalysis for evidence of publication bias. An estimation of likely size of the publication bias in the review and an approach to dealing with the bias is inherent to the conduct of many metaanalyses. Several methods have been developed to provide an assessment of publication bias; the most commonly used is the funnel plot. The funnel plot provides a graphical evaluation of the potential for bias and was developed by Light and Pillemer 49 and discussed in detail by Egger and colleagues 50 , 51 . A funnel plot is a scatterplot of treatment effect against a measure of study size. If publication bias is not present, the plot is expected to have a symmetric inverted funnel shape, as shown in Figure 5A .
In a study in which there is no publication bias, larger studies (i.e., have lower standard error) tend to cluster closely to the point estimate. As studies become less precise, such as in smaller trials (i.e., have a higher standard error), the results of the studies can be expected to be more variable and are scattered to both sides of the more precise larger studies. Figure 5A shows that the smaller, less precise studies are, indeed, scattered to both sides of the point estimate of effect and that these seem to be symmetrical, as an inverted funnelplot, showing no evidence of publication bias. In contrast to Figure 5A , Figure 5B shows evidence of publication bias. There is evidence of the possibility that studies using smaller numbers of subjects and showing an decrease in effect size (lower odds ratio) were not published.
Asymmetry of funnel plots is not solely attributable to publication bias, but may also result from clinical heterogeneity among studies. Sources of clinical heterogeneity include differences in control or exposure of subjects to confounders or effect modifiers, or methodological heterogeneity between studies; for example, a failure to conceal treatment allocation. There are several statistical tests for detecting funnel plot asymmetry; for example, Eggers linear regression test 50 , and Begg's rank correlation test 52 but these do not have considerable power and are rarely used. However, the funnel plot is not without problems. If high precision studies really are different than low precision studies with respect to effect size (e.g., due different populations examined) a funnel plot may give a wrong impression of publication bias 53 . The appearance of the funnel plot plot can change quite dramatically depending on the scale on the yaxis  whether it is the inverse square error or the trial size 54 .
Other types of biases in metaanalysis include the time lag bias, selective reporting bias and the language bias. The time lag bias arises from the published studies, when those with striking results are published earlier than those with nonsignificant findings 55 . Moreover, it has been shown that positive studies with high early accrual of patients are published sooner than negative trials with low early accrual 56 . However, missing studies, either due to publication bias or timelag bias may increasingly be identified from trials registries.
The selective reporting bias exists when published articles have incomplete or inadequate reporting. Empirical studies have shown that this bias is widespread and of considerable importance when published studies were compared with their study protocols 29 , 30 . Furthermore, recent evidence suggests that selective reporting might be an issue in safety outcomes and the reporting of harms in clinical trials is still suboptimal 57 . Therefore, it might not be possible to use quantitative objective evidence for harms in performing metaanalyses and making therapeutic decisions.
Excluding clinical trials reported in languages other than English from metaanalyses may introduce the language bias and reduce the precision of combined estimates of treatment effects. Trials with statistically significant results have been shown to be published in English 58 . In contrast, a later more extensive investigation showed that trials published in languages other than English tend to be of lower quality and produce more favourable treatment effects than trials published in English and concluded that excluding nonEnglish language trials has generally only modest effects on summary treatment effect estimates but the effect is difficult to predict for individual metaanalyses 59 .
Evolution of metaanalyses
The classical metaanalysis compares two treatments while network metaanalysis (or multiple treatment metaanalysis) can provide estimates of treatment efficacy of multiple treatment regimens, even when direct comparisons are unavailable by indirect comparisons 60 . An example of a network analysis would be the following. An initial trial compares drug A to drug B. A different trial studying the same patient population compares drug B to drug C. Assume that drug A is found to be superior to drug B in the first trial. Assume drug B is found to be equivalent to drug C in a second trial. Network analysis then, allows one to potentially say statistically that drug A is also superior to drug C for this particular patient population. (Since drug A is better than drug B, and drug B is equivalent to drug C, then drug A is also better to drug C even though it was not directly tested against drug C.)
Metaanalysis can also be used to summarize the performance of diagnostic and prognostic tests. However, studies that evaluate the accuracy of tests have a unique design requiring different criteria to appropriately assess the quality of studies and the potential for bias. Additionally, each study reports a pair of related summary statistics (for example, sensitivity and specificity) rather than a single statistic (such as a risk ratio) and hence requires different statistical methods to pool the results of the studies 61 . Various techniques to summarize results from diagnostic and prognostic test results have been proposed 62 – 64 . Furthermore, there are many methodologies for advanced metaanalysis that have been developed to address specific concerns, such as multivariate metaanalysis 65 – 67 , and special types of metaanalysis in genetics 68 but will not be discussed here.
Metaanalysis is no longer a novelty in medicine. Numerous metaanalyses have been conducted for the same medical topic by different researchers. Recently, there is a trend to combine the results of different metaanalyses, known as a metaepidemiological study, to assess the risk of bias 79 , 70 .
Conclusions
The traditional basis of medical practice has been changed by the use of randomized, blinded, multicenter clinical trials and metaanalysis, leading to the widely used term "evidencebased medicine". Leaders in initiating this change have been the Cochrane Collaboration who have produced guidelines for conducting systematic reviews and metaanalyses 10 and recently the PRISMA statement, a helpful resource to improve reporting of systematic reviews and metaanalyses has been released 11 . Moreover, standards by which to conduct and report metaanalyses of observational studies have been published to improve the quality of reporting 71 .
Metaanalysis of randomized clinical trials is not an infallible tool, however, and several examples exist of metaanalyses which were later contradicted by single large randomized controlled trials, and of metaanalyses addressing the same issue which have reached opposite conclusions 72 . A recent example, was the controversy between a metaanalysis of 42 studies 73 and the subsequent publication of the largescale trial (RECORD trial) that did not support the cardiovascular risk of rosiglitazone 74 . However, the reason for this controversy was explained by the numerous methodological flaws found both in the metaanalysis and the large clinical trial 75 .
No single study, whether metaanalytic or not, will provide the definitive understanding of responses to treatment, diagnostic tests, or risk factors influencing disease. Despite this limitation, metaanalytic approaches have demonstrable benefits in addressing the limitations of study size, can include diverse populations, provide the opportunity to evaluate new hypotheses, and are more valuable than any single study contributing to the analysis. The conduct of the studies is critical to the value of a metaanalysis and the methods used need to be as rigorous as any other study conducted.
IMAGES
COMMENTS
A systematic review is an article that synthesizes available evidence on a certain topic utilizing a specific research question, prespecified eligibility criteria for including articles, and a systematic method for its production. Whereas a metaanalysis is a quantitative, epidemiological study design used to assess the results of articles ...
It is easy to confuse systematic reviews and metaanalyses. A systematic review is an objective, reproducible method to find answers to a certain research question, by collecting all available studies related to that question and reviewing and analyzing their results. A metaanalysis differs from a systematic review in that it uses statistical ...
Metaanalysis is a statistical tool that provides pooled estimates of effect from the data extracted from individual studies in the systematic review. The graphical output of metaanalysis is a forest plot which provides information on individual studies and the pooled effect. Systematic reviews of literature can be undertaken for all types of ...
The first step in conducting a metaanalysis, as with any other empirical study, is the definition of the research question. Most importantly, the research question determines the realm of constructs to be considered or the type of interventions whose effects shall be analyzed.
When conducted properly, a meta‐analysis of medical studies is considered as decisive evidence because it occupies a top level in the hierarchy of evidence. An understanding of the principles, performance, advantages and weaknesses of meta‐analyses is important. Therefore, we aim to provide a basic understanding of meta‐analyses for ...
A systematic review may or may not include a metaanalysis, which provides a statistical approach to quantitatively combine results of studies eligible for a systematic review topic [ 2, 3, 4, 5 ].
Metaanalysis—the quantitative, scientific synthesis of research results—has been both revolutionary and controversial, with rapid advances and broad implementation resulting in substantial ...
Abstract Systematic reviews and metaanalyses lie on the top of the evidence hierarchy because they utilize explicit methods for literature search and retrieval of studies relevant to the review question as well as robust methodology for quality assessment of included studies and quantitative synthesis of results. As opposed to narrative reviews which represent the authors' personal ...
Learn how to conduct a metaanalysis with our comprehensive guide. Discover the definition and steps to conduct successful research.
A systematic review is guided filtering and synthesis of all available evidence addressing a specific, focused research question, generally about a specific intervention or exposure. The use of standardized, systematic methods and preselected eligibility criteria reduce the risk of bias in identifying, selecting and analyzing relevant studies.
Metaanalysis is the statistical combination of results from two or more separate studies. Potential advantages of metaanalyses include an improvement in precision, the ability to answer questions not posed by individual studies, and the opportunity to settle controversies arising from conflicting claims.
Undertaking research approaches, like systematic reviews and/or metaanalysis, involve great responsibility. They provide reliable information that has a real impact on society. Elsevier offers a number of services that aim to help researchers achieve excellence in written text, suggesting the necessary amendments to fit them into a targeted ...
Metaanalysis is the statistical combination of the results of multiple studies addressing a similar research question. An important part of this method involves computing a combined effect size across all of the studies. As such, this statistical approach involves extracting effect sizes and variance measures from various studies. [ 1] Metaanalyses are integral in supporting research grant ...
Abstract Metaanalysis is a research method for systematically combining and synthesizing findings from multiple quantitative studies in a research domain. Despite its importance, most literature evaluating metaanalyses are based on data analysis and statistical discussions. This paper takes a holistic view, comparing metaanalyses to traditional systematic literature reviews. We described ...
Sometimes the results of all of the studies found and included in a systematic review can be summarized and expressed as an overall result. This is known as a metaanalysis. The overall outcome of the studies is often more conclusive than the results of individual studies. But it only makes sense to do a metaanalysis if the results of the ...
Meta Synthesis A meta synthesis is the systematic review and integration of findings from qualitative studies (Lachal et al., 2017). Reviews of qualitative information can be conducted and reported using the same replicable, rigorous, and transparent methodology and presentation. A metasynthesis can be used when a review aims to integrate qualitative research. A metasynthesis attempts to ...
Metaanalysis would be used for the following purposes: To establish statistical significance with studies that have conflicting results. To develop a more correct estimate of effect magnitude. To provide a more complex analysis of harms, safety data, and benefits. To examine subgroups with individual numbers that are not statistically significant.
Combining quantitative and qualitative studies requires a researcher to use a metaanalysis method and then a qualitative metasynthesis method, with a final synthesis step of uniting the findings together. Harden ( 2010) provides an example of and process for using a mixedmethods metasynthesis method.
Metaanalyses play a fundamental role in evidencebased healthcare. Compared to other study designs (such as randomized controlled trials or cohort studies), the metaanalysis comes in at the top of the evidencebased medicine pyramid. This is a pyramid which enables us to weigh up the different levels of evidence available to us.
Systematic Reviews and Metaanalysis: Understanding the Best Evidence in Primary Healthcare. Healthcare decisions for individual patients and for public health policies should be informed by the best available research evidence. The practice of evidencebased medicine is the integration of individual clinical expertise with the best available ...
Original research: Global, regional and national burden of traumatic brain injury and spinal cord injury, 19902019: a systematic analysis for the Global Burden of Disease Study 2019  PMC. ... Our metaanalysis of 14 studies compares the outcomes of DC and CO for the treatment of ASDH, to evaluate the current state of evidence for surgical ...
META PLATFORMS INC is a largecap growth stock in the Business Services industry. The rating using this strategy is 88% based on the firm's underlying fundamentals and the stock's valuation.
Aim: Individuals with anorexia nervosa (AN) often present with substance use and substance use disorders (SUDs). However, the prevalence of substance use and SUDs in AN has not been studied indepth, especially the differences in the prevalence of SUDs between AN types [e.g., ANR (restrictive type) and ANBP (bingeeating/purge type]. Therefore, this systematic review and metaanalysis aimed ...
The goal of this study is to present a brief theoretical foundation, computational resources and workflow outline along with a working example for performing systematic or rapid reviews of basic research followed by metaanalysis. Conventional metaanalytic techniques are extended to accommodate methods and practices found in basic research.
Metaanalysis is a quantitative, formal, epidemiological study design used to systematically assess the results of previous research to derive conclusions about that body of research. Typically, but not necessarily, the study is based on randomized, controlled clinical trials.