Normas para la pruebas en la toma de decisiones políticas.Kai Ruggeri

Standards for evidence in policy decision-making

A multidisciplinary team of scientists, policymakers, government officials, and academics present a framework for classifying evidence used in policy.

Kai Ruggeri, Assistant Professor, Columbia University
Sander van der Linden, Director, Cambridge Social Decision-making Lab
Y. Claire Wang, Vice President, New York Academy of Medicine
Francesca Papa, Junior Policy Analyst, OECD
Zeina Afif, Senior Social Scientist, World Bank
Johann Riesch, Principal Research Scientist, Max-Planck-Institut für Plasmaphysik
James Green, Chief Scientist, NASA      

https://socialsciences.nature.com/users/399005-kai-ruggeri/posts/standards-for-evidence-in-policy-decision-making?fbclid=IwAR1sceIUPdeRv-_7fer6eNQaqqFNXAWnc6CBcjp0eqAkwwCN2A537qnt2yI


Abstract: Benefits from applying scientific evidence to policy have long been recognized by experts on both ends of the science-policy interface. The COVID-19 pandemic declared in March 2020 urgently demands robust inputs for policymaking, whether biomedical, behavioral, epidemiological, or logistical. Unfortunately, this need arises at a time of growing misinformation and poorly vetted facts repeated by influential sources, meaning there has never been a more critical time to implement standards for evidence. In this piece, we present THEARI, a new framework to help set standards for the quality of evidence used in policy-making. This framework will help manage risks while also providing a reasonable pathway for applying breakthroughs in treatments and policy solutions in an attempt to stem the harm already impacting the well-being of populations around the world.

Resumen: Los beneficios de la aplicación de la evidencia científica a la política han sido reconocidos desde hace mucho tiempo por los expertos en ambos extremos de la interfaz científico-política. La pandemia COVID-19, declarada en marzo de 2020, exige con urgencia aportaciones sólidas para la elaboración de políticas, ya sean biomédicas, conductuales, epidemiológicas o logísticas. Lamentablemente, esta necesidad surge en un momento de creciente desinformación y de hechos poco investigados y repetidos por fuentes influyentes, lo que significa que nunca ha habido un momento más crítico para aplicar normas para las pruebas. En este artículo, presentamos THEARI, un nuevo marco para ayudar a establecer estándares para la calidad de la evidencia utilizada en la elaboración de políticas. Este marco ayudará a gestionar los riesgos y al mismo tiempo proporcionará una vía razonable para aplicar los avances en los tratamientos y las soluciones políticas en un intento de frenar el daño que ya está afectando al bienestar de las poblaciones de todo el mundo.


“For emphasis, I run some risk of overstatement.” – Charles Lindblom, 1959

Introduction

There is growing demand for scientists to improve how they communicate evidence to decision-makers and the public (National Academies of Sciences, Engineering, and Medicine 2017). While finding common ground across scientific disciplines is often challenging (Johnson, 2013), effective science communication is crucial in assisting policymakers to design evidence-based interventions that will benefit entire populations. There is expanding investment into evidence-based practices. However, there remains substantial heterogeneity in standards for defining evidence across scientific fields and policy domains, which is especially a burden during crises such as the COVID-19 pandemic, where warnings have long been raised but were not fully heeded (Cheng et al., 2007). In this paper, we propose standard guidelines to support communicating evidence to policymakers. Such standards benefit scientific progress and policymakers while encouraging wider appreciation for empirical evidence.[1]

Evidence in policy

As of early 2017, all 50 US states and the District of Columbia demonstrate at least a modest level of integrating evidence into one or more policy domains (Pew-MacArthur, 2017). The absence of a common standard for identifying, defining, or integrating evidence into policy decisions, however, has resulted in substantial variability in how advanced these processes are.

With the Foundations for Evidence-Based Policymaking Act of 2018 now law in the United States, establishing such standards has immediate value. The “Evidence Act” involves a number of guidelines, notably influencing what policy areas are given priority, how information is disseminated, how agencies should aim to learn from evidence, and how to evaluate a range of policy actions. Yet, with the wide spectrum of content that can be treated as evidence, how best to identify reliable and appropriate sources will remain a challenge in government institutions. In spite of these challenges, increased emphasis on utilizing scientific insights presents a clear opportunity to improve standards for the application of evidence in policy.

Six decades ago, Lindblom (1959) outlined the opportunities and challenges of linking those insights from science to applications in policy, best characterized by the quote at the start of this manuscript. As outlined here, these challenges remain today, and due to COVID-19, have suddenly returned to the fore.

Introduccion

Existe una demanda creciente de que los científicos mejoren la forma en que comunican las pruebas a los responsables de la toma de decisiones y al público (Academias Nacionales de Ciencias, Ingeniería y Medicina 2017). Si bien suele ser difícil encontrar un terreno común en todas las disciplinas científicas (Johnson, 2013), la comunicación científica eficaz es fundamental para ayudar a los encargados de formular políticas a diseñar intervenciones basadas en pruebas que beneficien a poblaciones enteras. Se está ampliando la inversión en prácticas basadas en pruebas. Sin embargo, sigue habiendo una gran heterogeneidad en las normas para definir las pruebas en los distintos campos científicos y ámbitos normativos, lo que supone una carga especialmente durante crisis como la pandemia COVID-19, en la que las advertencias se han planteado durante mucho tiempo pero no se han tenido plenamente en cuenta (Cheng et al., 2007). En este documento proponemos directrices estándar para apoyar la comunicación de las pruebas a los encargados de la formulación de políticas. Esas normas benefician al progreso científico y a los encargados de la formulación de políticas, a la vez que fomentan una mayor apreciación de las pruebas empíricas[1].

La evidencia en la política

A principios de 2017, los 50 estados de EE.UU. y el Distrito de Columbia demuestran al menos un modesto nivel de integración de la evidencia en uno o más ámbitos de política (Pew-MacArthur, 2017). Sin embargo, la ausencia de una norma común para identificar, definir o integrar las pruebas en las decisiones de política ha dado lugar a una variabilidad sustancial en el grado de avance de estos procesos.

Con la Ley de bases para la formulación de políticas basadas en pruebas de 2018, actualmente en vigor en los Estados Unidos, el establecimiento de esas normas tiene un valor inmediato. La "Ley sobre las bases de la prueba" comprende una serie de directrices, que influyen en particular en las esferas de política a las que se da prioridad, la forma en que se difunde la información, el modo en que los organismos deben tratar de aprender de las pruebas y la forma de evaluar una serie de medidas de política. Sin embargo, con el amplio espectro de contenidos que pueden tratarse como pruebas, la mejor manera de identificar fuentes fiables y apropiadas seguirá siendo un reto para las instituciones gubernamentales. A pesar de estos desafíos, el mayor énfasis en la utilización de los conocimientos científicos presenta una clara oportunidad de mejorar las normas para la aplicación de la evidencia en las políticas.

Hace seis decenios, Lindblom (1959) esbozó las oportunidades y los desafíos de vincular esos conocimientos de la ciencia a las aplicaciones en la política, que se caracterizan mejor por la cita que figura al comienzo de este manuscrito. Como se señala aquí, estos retos siguen existiendo hoy en día, y debido a COVID-19, han vuelto repentinamente al primer plano

Formulating evidence-based policy for COVID-19

The COVID-19 pandemic poignantly illustrates the need for robust evidence-based policy. Some of the most critical questions for a generation are now pressed on leaders around the world. How should countries respond to effectively limit the spread of the coronavirus? Why have some interventions, in certain countries, had more success than others? What information can be trusted for implementing at scale?

Answering these questions is now particularly taxing, due to the conjunction of several factors unfolding on a global scale:

  1. Over-supply of scientific evidence
  2. Increasingly complex political processes
  3. Rapid diffusion of information and misinformation (often through social media)
  4. High level of uncertainties on many aspects, including reliability of available data and extent of cross-country comparability.

Behavioral science suggests that the policy interpretation of existing information can be particularly prone to biases in this context of scarcity of time and resources (Mullainathan and Shafir, 2013). Specifically for COVID-19, it is not just a gap in evidence of ‘what works’, but multiplying uncertainties for decision-makers due to not having sufficient time to find out. As a result, formulating evidence-informed policies appears to be most challenging right when we most need it, and countries are approaching the issue very differently.

La pandemia COVID-19 ilustra de manera conmovedora la necesidad de una política sólida basada en pruebas. Algunas de las preguntas más críticas de una generación se encuentran ahora presionando a los líderes de todo el mundo. ¿Cómo deben responder los países para limitar eficazmente la propagación del coronavirus? ¿Por qué algunas intervenciones, en ciertos países, han tenido más éxito que otras? ¿Qué información se puede confiar para su aplicación a escala?

Responder a estas preguntas es ahora particularmente difícil, debido a la conjunción de varios factores que se desarrollan a escala mundial:

    1.El exceso de pruebas científicas
   2. Procesos políticos cada vez más complejos
  3.Rápida difusión de información y desinformación (a menudo a través de los medios de comunicación social)
  4. Alto nivel de incertidumbre en muchos aspectos, incluida la fiabilidad de los datos disponibles y el grado de comparabilidad entre países.

Las ciencias del comportamiento sugieren que la interpretación política de la información existente puede ser particularmente propensa a sesgos en este contexto de escasez de tiempo y recursos (Mullainathan y Shafir, 2013). En el caso concreto de COVID-19, no se trata sólo de una laguna en la evidencia de "lo que funciona", sino de una multiplicación de las incertidumbres para los responsables de la toma de decisiones por no disponer de tiempo suficiente para averiguarlo. En consecuencia, la formulación de políticas basadas en pruebas parece ser más difícil justo cuando más lo necesitamos, y los países están enfocando la cuestión de manera muy diferente.


Public policies to mitigate the spread of COVID-19 also have the potential to draw on behavioral insights, such as how to effectively encourage frequent hand washing, motivating individuals to distance themselves physically from others, ensuring widespread compliance with medical advice, and evaluating the mental health effects of long-term isolation. This requires a need for all forms of evidence to be classified systematically, separating what is viable from what is merely plausible, aesthetic, or novel (Smaldino & McElreath, 2016). 

Without authoritarian intervention, South Korea drastically slowed the spread of the virus through unprecedented testing regimens, early physical isolation, and rapid tracing to quarantine the infected (Zastrow, 2020). In Italy, the government initiated an ad hoc Technical Scientific Committee to refine lockdown measures on the basis of scientific recommendations (Protezione Civile, 2020). Contrarily, the United Kingdom has witnessed widespread controversy and threats to hundreds of thousands of lives over its initial decision to delay actions, in part, based on fears of “behavioral fatigue” spreading throughout the population. This prompted a public letter signed by over 600 behavioral scientists in the UK to reconsider given the lack of sufficient evidence to support the concept (Chater, 2020).

If the British government had implemented a systematic framework such as the one described here, it would have become clearer that the evidence on “fatigues” (behavioral, media, isolation) is disparate at best, of mixed quality, and has a concerning lack of randomized controlled trials in support. Using our proposed THEARI rating system (introduced below), it would have been likely that experts would have considered the evidence between the stages of “empirical” and “applicable”, yet far from “replicable” or “impactful”. While innovation and new approaches to large-scale interventions will likely be necessary to combat the pandemic, the survival of entire populations in face of such crises should not rely on such limited information when better information is clearly available. This example illustrates how the lack of systematic assessment of evidence can impede optimal policymaking.

Of course, these concerns are not unique to the COVID-19 pandemic, so applications are possible on both immediate and broader fronts.

THEARI - A simple framework for standards of evidence in policymaking

To establish standards for evaluating evidence in policy contexts, we developed the Theoretical, Empirical, Applicable, and Replicable Impact rating system, (THEARI; Fig. 1). This five-tier system ranges from one (theory only) to five (impact validated) full stars. Its purpose is to provide guidance for scientists and policymakers to classify what qualifies as evidence and potential appropriateness for application.

THEARI rates a given insight by determining what evidence underlies it. Rather than requiring a policymaker to assess the evidence subjectively, or for researchers to champion their own work, the rating centers and standardizes the assessment of evidence. We recommend using the standard to inform decision-making by making ratings visible on journals as badges (Nosek et al., 2015) or retrospectively by external raters, such as those conducting a systematic review or policy briefs. Even more critical may be to make use of these on preprint registries, given their increasing visibility in mainstream media.

The scale also aims to provide conceptual clarity in a context where heterogeneity in (or absence of) standards exists between locations, policy domains, and scientific disciplines. Where little information is available but a decision is necessary, it can be used to align related debates. Where an entire body of evidence including effective interventions is available, it can be used to identify the most robust insights available. We refer to evidence here as scientifically produced insights or conclusions reported through peer-review or other recognized specialist dissemination channels, though there are certainly other forms. 


Figure 1. The THEARI rating system with an example from behavioral science.

The THEARI rating system

The system is meant to apply to visible ratings of a study for compilation of inputs in policy decisions. In practice, the rating would be applied to any published work as a header, footnote, or badge. Awarding five shaded stars is discouraged; the implication is that there should always be an opening for further research, even when – or perhaps especially when – validated impact has been achieved. Notably, there is no rating for opinions, commentaries, or editorials.

El sistema está destinado a aplicarse a las calificaciones visibles de un estudio para la compilación de aportaciones a las decisiones de política. En la práctica, la calificación se aplicaría a cualquier trabajo publicado como encabezamiento, nota a pie de página o distintivo. Se desaconseja la concesión de cinco estrellas sombreadas; la implicación es que siempre debe haber una apertura para nuevas investigaciones, incluso cuando - o tal vez especialmente cuando - se ha logrado un impacto validado. En particular, no hay calificación para las opiniones, comentarios o editoriales.



Consider the increasing use of social norms in behavioral policy as outlined in Figure 1. Initial papers defined a specific issue (suboptimal behaviors). Additional studies went further by identifying clear behavioral roots (observation of group behavior influences individual choice). Interventions were then proposed and tested, followed by replications. Further validation through successful trials across a number of domains and locations then facilitated systematic study of real-world impacts. It is not mandatory that each step be explicitly, discretely fulfilled to proceed to the next level; higher levels would, however, help assure that lower levels are met. The amount and quality of evidence we have today on the effectiveness of social norms allows informed applications of such interventions to the COVID crisis. In particular, through decades  of applications and replications (Cialdini, 2012), we have built sophisticated awareness of the related impact of descriptive social norms (what most people do) versus injunctive social norms (what most people think is the right thing to do). Evidence suggests that policymakers should not try to mobilize action against socially disapproved behavior by depicting it as frequent, as this might backfire by inadvertently installing a counterproductive descriptive norm in the minds of the public (Smerdon et al., 2019) - such as by sending the signal that hoarding toilet paper is common rather than undesirable.

Importantly, THEARI (nor any framework) cannot be used to guarantee all evidence applies equally in all situations and is therefore an absolute confirmation of outcome. It is to be used as an assessment of evidence available, not as a indicator of certainty. This is particularly critical to note in times of COVID-19, where substantial evidence exists on certain interventions (e.g., hydroxychloroquine or face masks), but not necessarily evidence that speaks to the potential application.

A comprehensive articulation of all THEARI levels and limitations is provided through the endnote following this article.

Considerar el creciente uso de normas sociales en la política de comportamiento como se indica en la figura 1. Los trabajos iniciales definieron un tema específico (conductas subóptimas). Estudios adicionales fueron más allá al identificar claras raíces conductuales (la observación del comportamiento grupal influye en la elección individual). Luego se propusieron y probaron intervenciones, seguidas de réplicas. La validación adicional a través de ensayos exitosos en varios dominios y lugares facilitó el estudio sistemático de los impactos en el mundo real. No es obligatorio que cada paso se cumpla de forma explícita y discreta para pasar al siguiente nivel; sin embargo, los niveles más altos ayudarían a asegurar que se cumplan los niveles más bajos. La cantidad y calidad de las pruebas de que disponemos hoy en día sobre la eficacia de las normas sociales permite aplicar con conocimiento de causa esas intervenciones a la crisis de COVID. En particular, a través de décadas de aplicaciones y réplicas (Cialdini, 2012), hemos creado una sofisticada conciencia del impacto relacionado de las normas sociales descriptivas (lo que la mayoría de la gente hace) frente a las normas sociales cautelares (lo que la mayoría de la gente piensa que es lo correcto). Las pruebas sugieren que los encargados de la formulación de políticas no deberían tratar de movilizar la acción contra el comportamiento socialmente desaprobado describiéndolo con tanta frecuencia, ya que esto podría resultar contraproducente al instalar inadvertidamente una norma descriptiva contraproducente en la mente del público (Smerdon et al., 2019), por ejemplo, enviando la señal de que el acaparamiento de papel higiénico es común y no indeseable.

Es importante señalar que THEARI (ni ningún marco) no puede utilizarse para garantizar que todas las pruebas se apliquen por igual en todas las situaciones y, por lo tanto, es una confirmación absoluta del resultado. Debe utilizarse como una evaluación de las pruebas disponibles, no como un indicador de certeza. Esto es particularmente importante de señalar en los tiempos de COVID-19, en los que existen pruebas sustanciales sobre ciertas intervenciones (por ejemplo, la hidroxicloroquina o las máscaras faciales), pero no necesariamente pruebas que hablen de la aplicación potencial.

En la nota final que sigue a este artículo se ofrece una amplia articulación de todos los niveles y limitaciones de THEARI. .

Common definitions of evidence are good for science

Standards for evidence are also important within scientific circles. In 1952, when Owen Storey proposed that the whistling noise heard in radio communications was due to plasma in the Earth’s atmosphere, his argument was so heavily refuted that even his academic advisor suggested he drop the idea or risk being ridiculed as a scholar. In suggesting supersonic solar winds, Eugene Parker was similarly rebutted, and only when an eventual Nobel Laureate came to his defense was the initial manuscript published. Fortunately, as converging evidence validated these theories over time, they became cannon in science and practice. These unfortunate trajectories ultimately resulted in positive outcomes, but also created two major concerns. First, what valuable evidence has not been mobilized due to subjective treatment? Second, what inefficiencies have resulted from the same circumstances? To an extent, clearer standards for these would provide one possibility for improvement on both fronts.

Alternatively, consider current debate on the imminent threat of climate change: while substantial evidence has led to near-consensus in the scientific community, unsubstantiated denials of causes and impacts receive disproportionate attention (Cook et al., 2016). This imbalance harms scientific progress and stalls action addressing climate change (Lewandowsky et al., 2015). Similarly, failure to act on correct information in the context of COVID-19 will inevitably have implications for scientific progress, national security, and human survival.

Las definiciones comunes de las pruebas son buenas para la ciencia

Las normas sobre pruebas también son importantes en los círculos científicos. En 1952, cuando Owen Storey propuso que el silbido que se oía en las comunicaciones por radio se debía al plasma de la atmósfera terrestre, su argumento fue tan refutado que incluso su asesor académico sugirió que abandonara la idea o se arriesgara a ser ridiculizado como académico. Al sugerir vientos solares supersónicos, Eugene Parker fue rebatido de manera similar, y sólo cuando un eventual ganador del Premio Nobel vino a su defensa se publicó el manuscrito inicial. Afortunadamente, a medida que la evidencia convergente validó estas teorías con el tiempo, se convirtieron en cañón en la ciencia y la práctica. Estas desafortunadas trayectorias resultaron finalmente en resultados positivos, pero también crearon dos preocupaciones principales. En primer lugar, ¿qué pruebas valiosas no se han movilizado debido al tratamiento subjetivo? En segundo lugar, ¿qué ineficiencias han resultado de las mismas circunstancias? En cierta medida, unas normas más claras al respecto ofrecerían una posibilidad de mejora en ambos frentes.

Otra posibilidad es considerar el debate actual sobre la amenaza inminente del cambio climático: si bien hay pruebas sustanciales que han llevado a un casi consenso en la comunidad científica, las negaciones infundadas de las causas y los impactos reciben una atención desproporcionada (Cook et al., 2016). Este desequilibrio perjudica el progreso científico y paraliza las medidas que se ocupan del cambio climático (Lewandowsky et al., 2015). Del mismo modo, si no se actúa con la información correcta en el contexto de COVID-19, ello tendrá inevitablemente consecuencias para el progreso científico, la seguridad nacional y la supervivencia humana.

Improving applications and replicability of evidence increases public trust in the discovery process of researchers (Nosek et al., 2015; Wingen et al., 2019). It also creates efficiency in policymaking processes by limiting reliance on arbitrary, competing opinions, which are unfortunately common in science and policy debates (Head, 2010; Howlett & Mukherjee, 2017). Standardizing evidence ratings of insights for policymaking helps counter false media balance and science denial by providing a common framework for using, rating, and referring to the weight of scientific evidence. Behavioral research finds that motivated reasoning is less likely to occur when people have to “give reasons” for why they support a particular position (Ballarini and Sloman, 2017), which rating systems such as THEARI facilitate.

Standards for ranking the progression of available information exist in many applied domains (irrespective of policy relevance). These are primarily for the purpose of drawing clear distinctions between what should and should not inform critical decision-making. For example, the Daubert standard for evidence in legal proceedings and Technology Readiness Levels (TRLs) in NASA, which established thresholds for when innovative tools are ready for widespread implementation. In the mid-1970s, TRLs were developed as a discipline-independent metric to allow more effective assessment of the maturity of new technologies, with detailed definitions first published in 1995. Abstraction within. TRL allows a clear definition of the level of development relevant to many fields. Institutional adaptations now range from technology investment in the European Commission to the development of fusion reactor materials (Riesch et al., 2016). 

While TRL inherently emphasizes the highest rating should be expected, it encourages progress by demonstrating room for improvement when only lower values have been validated. However, TRL is specifically designed to describe the state of maturity in the implementation of innovative technology. The same mindset may be useful for assessing evidence, yet different frameworks are necessary for two reasons. First, new technologies cannot be used until they meet certain thresholds; sometimes policies have to be developed on limited evidence. Second, new evidence can actually revert existing evidence to a lower level; new technology does not revert on TRL, only fails to advance. 

In medical contexts, standards are more common, as comparisons among multiple interventions or treatments are fundamental for decision-making. For example, GRADE (Grading of Recommendations Assessment, Development and Evaluation) is a framework to rate the quality of scientific evidence in systematic reviews (from very low to high) to help inform evidence-based clinical guidelines. To evaluate the potential of clinical interventions, RCTs start out as high-quality and observational research as low quality, the GRADE approach then rates up or down based on the quality of the underlying evidence (e.g. risk of bias, effect-size, confounders, etc). However, systematic reviews themselves have a number of limitations, not least being that they cannot correct for errors in original studies, forcing an ‘old dog, wrong trick’ approach to policy choices (Ruggeri et al., 2016): aggregating poor quality data does not correct for poor quality. There is also recent evidence that collaborative replications may be more reliable for producing valid insights (Kvarnen et al., 2019). We take this directly into account with THEARI by highlighting that the best evidence requires multiple lines of investigation and a plurality of robust methods, not necessarily one over another. 


Figure 2. Prior iterations of a framework for standards specifically for behavioral science influenced much of this work, such as the Index for Evidence in Policy (INDEP).

Index for Evidence in Policy (INDEP) - Ruggeri et al. (2019)

The predecessor INDEP framework for evidence from behavioral science comes from Ruggeri, K., Stuhlreyer, J., Immonen, J., Mareva, S., Paul, A., Robbiani, A., Thielen, F. Gelashvili, A., Cavassini, F., & Nairu, F. (2019). In K. Ruggeri (ed). Behavioral insights for public policy: Concepts and cases. Routledge.


Appealing to different methodologies, ranging from theoretical models to randomized controlled trials (RCTs), quasi-experiments, and laboratory research can enable public bodies to leverage the complementary strengths of these techniques (OECD, 2019). Policymakers rely on information from many sources to make decisions, which makes the communication of evidence as critical to them as it is to the general public (Doubleday & Wildson, 2012). 

THEARI ratings provide a common language to assist all sides in understanding the level of evidence developed on a topic or from a single study, not to oversimplify critical nuance. By applying THEARI, opinions are not equated with empirical findings across scientific and policy domains and the framework ensures that a variety of perspectives is still considered alongside a consistent metric for evidence available. This frame aims to highlight that not all scientific contributions are created equal. While there is value in appealing to different lines of evidence, it is crucial to distinguish what specific additions each level of evidence will bring to policymaker toolkits, specifically at the five levels proposed.

Oversimplification, particularly given the types of evidence and other influences in policy may only erode trust between the public, researchers, and decision-makers (Head, 2010), as do failures in replication (Wingen et al., 2019). Each rating aims to bring structure to those discussions without overstating the weight of a single finding, instead providing a reference for categorizing relevant, available evidence. We present an overview of additional strengths, weaknesses, and risks in Table 1.

El recurso a diferentes metodologías, que van desde los modelos teóricos hasta los ensayos controlados aleatorios (ECA), los cuasiexperimentos y la investigación de laboratorio, puede permitir a los organismos públicos aprovechar las ventajas complementarias de estas técnicas (OCDE, 2019). Los encargados de la formulación de políticas dependen de la información procedente de muchas fuentes para tomar decisiones, lo que hace que la comunicación de las pruebas sea tan crítica para ellos como lo es para el público en general (Doubleday & Wildson, 2012).

Las clasificaciones de THEARI proporcionan un lenguaje común para ayudar a todas las partes a comprender el nivel de las pruebas desarrolladas sobre un tema o a partir de un solo estudio, sin simplificar excesivamente el matiz crítico. Aplicando THEARI, las opiniones no se equiparan con los hallazgos empíricos en todos los dominios científicos y políticos, y el marco asegura que se sigan considerando una variedad de perspectivas junto con una métrica consistente para la evidencia disponible. Este marco tiene como objetivo destacar que no todas las contribuciones científicas se crean de la misma manera. Si bien es valioso apelar a diferentes líneas de evidencia, es crucial distinguir qué adiciones específicas aportará cada nivel de evidencia a los conjuntos de herramientas de los encargados de la formulación de políticas, específicamente en los cinco niveles propuestos.

La simplificación excesiva, en particular teniendo en cuenta los tipos de pruebas y otras influencias en las políticas, sólo puede erosionar la confianza entre el público, los investigadores y los encargados de la adopción de decisiones (Head, 2010), al igual que los fallos en la reproducción (Wingen et al., 2019). Cada calificación tiene por objeto estructurar esos debates sin exagerar el peso de un solo hallazgo, sino que proporciona una referencia para clasificar las pruebas pertinentes y disponibles.

En el cuadro 1 presentamos una visión general de los puntos fuertes, los puntos débiles y los riesgos adicionales.


Table 1. Strengths, weaknesses, and potential risks for applying standards for evidence.

Actionable strengths

Practical limitations

Dangers to avoid

Give standard for comparing evidence, regardless of current state

Does not specify a point where evidence is sufficient for a decision

Absolute thresholds that undermine open science or set unrealistic minimums, particularly where a decision is urgent or risks would become imminent

Accessible scale for expert and lay audiences

A simplified tool referring to likely complex topics cannot always result in policies backed by robust study from top academic journals

Using standards to mask misinformation or poorly designed studies

Anyone can reassess even if a score has been proposed

‘Amount’ of evidence may vary depending on context of application, such as urgency of need or disposition/bias of those evaluating - it is not always possible to have all the desired information

Static evaluations of evidence that do not acknowledge replication failures or adapt to new evidence

Systematic but practical ratings that can be updated over time

Is likely many effective interventions were trialed before substantial evidence was available on the issue, which creates ambiguity in rating

Ignoring conflicts of interest in funded studies, which may be presented in especially strong terms in support of a finding, thus objectively strong evidence if bias not considered

Unbiased by arbitrary thresholds

Does not consider participation or bias in policy or research, meaning inter-reliability is critical

Assuming scientific evidence is the only feature in policy decisions

No mandate for when to use, especially if decision is urgent

Difficult to compare between high impact, low evidence; low impact, high evidence

Purpose-driven research that lowers standards for discovery

Possible for retrospective, ex ante, and ex post assessment in application

Quality assessment of specific study rigor, especially analysis (Johnson, 2013), is separate but necessary and should emphasize quality, not volume

Interpreting a single rating as reflective of all features of a particular study – the rating should explicitly apply only to the primary insight


Going forward  

Standards should be valuable to all members of the population, whether or not they value scientific evidence. Communicating those standards as well as the evidence is a major challenge (Broomell & Kane, 2017), especially in the context of health and medicine (Politi et al., 2007). In 2007, a team of researchers from Hong Kong (Cheng et al., 2007) published a warning letter about the re-emergence of SARS-like coronaviruses, and how it was a “time bomb” (p. 683). Their work, which they support with over 400 evidence-based references, would clearly meet the highest levels of evidence-based policy thresholds, and was backed by other studies of experts (Bruine de Bruin et al., 2006), yet the outbreak occurred with seemingly minimal preparation. This is not a specific fault of any single group, but using a simplified and informed standard for identifying the best quality evidence for policy action is again an urgent need.

In presenting THEARI, the ultimate benefit we envision is setting a common framework as a starting point for utilizing evidence in policy discussions, overcoming biases and the effects of inconsistent definitions or unreliable insights. This encourages policymakers to place more value on evidence by providing support for meaningful arguments that may otherwise be disregarded as incongruent with current thinking, even amongst scientists. Researchers can remain encouraged to continue study without overly emphasizing immediate application to the detriment of discovery, while also increasing understanding between those who may seek to utilize current insights. Doing so effectively should result in improved public policy approaches that ultimately serve the well-being of populations around the world.

For emphasis, we run some risk of oversimplification.

Avanzando... 

Las normas deben ser valiosas para todos los miembros de la población, tanto si valoran las pruebas científicas como si no. Comunicar esas normas, así como las pruebas, es un gran desafío (Broomell y Kane, 2017), especialmente en el contexto de la salud y la medicina (Politi et al., 2007). En 2007, un equipo de investigadores de Hong Kong (Cheng y otros, 2007) publicó una carta de advertencia sobre el resurgimiento de coronavirus similares al SRAS, y sobre el hecho de que se trataba de una "bomba de tiempo" (pág. 683). Su labor, que respaldan con más de 400 referencias basadas en pruebas, alcanzaría claramente los niveles más altos de umbrales de política basados en pruebas, y fue respaldada por otros estudios de expertos (Bruine de Bruin y otros, 2006), aunque el brote se produjo con una preparación aparentemente mínima. No se trata de un fallo específico de un solo grupo, pero la utilización de una norma simplificada e informada para identificar las pruebas de mejor calidad para la adopción de medidas de política es, una vez más, una necesidad urgente.

Al presentar THEARI, el beneficio final que prevemos es el establecimiento de un marco común como punto de partida para utilizar la evidencia en los debates sobre políticas, superando los sesgos y los efectos de las definiciones incoherentes o las percepciones poco fiables. Esto alienta a los responsables de las políticas a dar más valor a las pruebas, proporcionando apoyo a argumentos significativos que de otra manera podrían ser desestimados por ser incongruentes con el pensamiento actual, incluso entre los científicos. Se puede seguir alentando a los investigadores a que continúen estudiando sin insistir demasiado en la aplicación inmediata en detrimento de los descubrimientos, al tiempo que se aumenta la comprensión entre los que pueden tratar de utilizar los conocimientos actuales. Hacerlo de manera eficaz debería dar lugar a mejores enfoques de política pública que, en última instancia, sirvan para el bienestar de las poblaciones de todo el mundo.

Para hacer hincapié, corremos cierto riesgo de simplificar demasiado.



 


References  

Ballarini, C., & Sloman, S. A. (2017). Reasons and the “Motivated Numeracy Effect.”. In Proceedings of the 39th annual meeting of the Cognitive Science Society (pp. 1580-1585). 

Broomell, S. B., & Kane, P. B. (2017). Public perception and communication of scientific uncertainty. Journal of Experimental Psychology: General, 146(2), 286.

Bruine de Bruin, W., Fischhoff, B., Brilliant, L., & Caruso, D. (2006). Expert judgments of pandemic influenza risks. Global Public Health. 1(2), 179-194.

Cialdini, R. B. (2012). The focus theory of normative conduct. In P. A. M. Van Lange, A. W. Kruglanski, & E. T. Higgins (Eds.), Handbook of Theories of Social Psychology. Vol.2 (pp. 295–312). London: Sage.

Cook, J., Oreskes, N., Doran, P. T., Anderegg, W. R., Verheggen, B., Maibach, E. W., ... & Nuccitelli, D. (2016). Consensus on consensus: a synthesis of consensus estimates on human-caused global warming. Environmental Research Letters, 11(4), 048002.

Chater, N., (2020, 16 March). People Won't Get 'Tired' Of Social Distancing – The Government Is Wrong To Suggest Otherwise. The Guardian. Available at: https://www.theguardian.com/co... (Accessed: 21 March 2020).

Cheng, V. C., Lau, S. K., Woo, P. C., & Yuen, K. Y. (2007). Severe acute respiratory syndrome coronavirus as an agent of emerging and reemerging infection. Clinical Microbiology Reviews, 20(4), 660-694.

Doubleday, R., Wilsdon, J. (2012). Science policy: Beyond the great and good. Nature, 485 (7398), 301-302.

Funtowicz, S. O., & Ravetz, J. R. (1995). Science for the post normal age. In: Westra L., Lemons J. (eds) Perspectives on Ecological Integrity, 146-161. Springer, Dordrecht.

Head, B. W. (2010). Reconsidering evidence-based policy: Key issues and challenges. Policy and Society, 29(2), 77-94.

Howlett, M., Mukherjee, I. (2017). Policy design: From tools to patches. Canadian Public Administration, 60(1), 140-144.

Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313-19317.

Kvarven, A., Strømland, E., & Johannesson, M. (2019). Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature Human Behaviour, 1-12. 

Lewandowsky, S., Oreskes, N., Risbey, J. S., Newell, B. R., & Smithson, M. (2015). Seepage: Climate change denial and its effect on the scientific community. Global Environmental Change, 33, 1-13.

Lindblom, C. (1959). The science of muddling through. Public Administration Review, 19(2), 79-88.

Mullainathan, S., & Shafir, E. (2013). Scarcity: Why having too little means so much. Macmillan.

Munafò, M. R., & Davey-Smith, G. (2018). Robust research needs many lines of evidence. Nature. 553 (7689), 399–401.

National Academies of Sciences, Engineering, and Medicine. (2017). Communicating Science Effectively: A Research Agenda. The National Academies Press, Washington, DC.

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., ... & Contestabile, M. (2015). Promoting an open research culture. Science, 348(6242), 1422-1425.

OECD. (2019). Delivering Better Policies Through Behavioural Insights: New Approaches. OECD Publishing, Paris.

Pew-MacArthur Foundation. (2017). How states engage in evidence-based policymaking: A national assessment. Pew Charitable Trusts, Washington, DC.

Politi, M. C., Han, P. K., & Col, N. F. (2007). Communicating the uncertainty of harms and benefits of medical interventions. Medical Decision Making, 27(5), 681-695.

Protezione Civile, (2020). Decree of the Head of Department n. 371, 5 February 2020, on the Institution of a Scientific Committee. Accessible at: http://www.protezionecivile.go...

Riesch, J., Han, Y., Almanstötter, J., Coenen, J. W., Höschen, T. , Jasper,B., ...  Neu, R. (2016). Development of tungsten fibre-reinforced tungsten composites towards their use in DEMO—potassium doped tungsten wire. Physica Scripta, (T167), 014006.

Ruggeri, K., Maguire, Á., & Cook, G. (2016). The “Next Big Thing” in Treatment for Relapsed or Refractory Multiple Myeloma May Be Held Back by Design—Between the Lines. JAMA Oncology, 2(11), 1405-1406.

Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384.

Smerdon, D., Offerman, T., & Gneezy, U. (2019). ‘Everybody’s doing it’: on the persistence of bad social norms. Experimental Economics, 1-29.

Wingen, T., Berkessel, J. B., & Englich, B. (2019). No replication, no trust? How low replicability influences trust in psychology. Social Psychological and Personality Science, 11(4), 454-463.

Zastrow, M. (2020, 18 March). South Korea is reporting intimate details of COVID-19 cases: has it helped? Nature News.


 

Acknowledgments: We thank Bhaven Sampat (Columbia University), Thomas Zurbuchen (NASA), Marion Barthelemy (United Nations), Maja Friedemann (University College, London), Tomas Folke (Columbia University), and Tobias Wingen (University of Cologne) for input and coordination support on this manuscript. We also thank Pompa Debroy, Russ Burnett, and Michael Hand from the Office for Evaluation Sciences in the General Services Administration in Washington, D.C. for providing input on the final building of the argument.  

Author contributions: KR was responsible for conceptualization, project administration, resources, writing (all drafts, all sections and features), editing, reviewing, and submission; SvdL was responsible for broad writing and reviewing; FP also provided broad writing and reviewing; CW provided extensive comments and edits; ZA reviewed and edited specific sections; JR was responsible for writing and reviewing specific sections and the THEARI descriptions; JG was responsible for responsible for reviewing drafts and supervising KR.

Funding: No funding was received in direct support of this work. YCW's time is supported in part by grant R24AG064191 from the National Institute of Aging, National Institutes of Health.

Competing interests: Authors declare no competing interests.

Author note: The text was prepared originally on the importance of standards in evidence, building input iteratively over several years. It has been adapted to retain the broad scope while including multiple applications to the COVID-19 pandemic. This a blunt version of the topic; supplemental materials complement this with further references and commentary.

How to cite: Ruggeri, K., van der Linden, S., Wang, Y. C., Papa, F., Riesch, J., Green, J. (2020). Standards for evidence in policy decision-making. Nature Research Social and Behavioural Sciences, 39905. go.nature.com/2zdTQIs

[1] Given the urgency of the topic, here we present foundational arguments; a supplementary document with further references for each will be available at https://psyarxiv.com/fjwvk/.


Go to the profile of Kai Ruggeri

Kai Ruggeri

Assistant Professor, Columbia University

Professor at Columbia University, Mailman School of Public Health, studying population behaviors and decision-making in an effort to generate more effective policy. Defines “more effective policy” as interventions that protect and improve population security, stability, and well-being. If you know any students or early career researchers that want to get involved in large-scale, international collaborations, tell them to check out https://jrp.pscholars.org


Que la ciencia revolucione la politica

Rafael Yuste es neurocientífico, catedrático de la Universidad de Columbia (EE UU) y profesor Ikerbasque del Donostia International Physics Center (DIPC) de San Sebastián.

Darío Gil es doctor en Ingeniería Eléctrica e Informática. Actualmente dirige el área de investigación de IBM.

https://elpais.com/ideas/2020-06-06/que-la-ciencia-revolucione-la-politica.html?ssm=TW_CC
-
Importa la calidad de los gobiernos?
https://alde.es/blog/importa-la-calidad-de-los-gobiernos/?fbclid=IwAR0gySAVquUoAjlWtAFF16_pJbl2FTVIMmncx1bD-zN_SJl-oL5HLRAAwIc

No hay comentarios:

   Escenarios 2025 ¿Como reaccionara Europa?     The World Ahead, The Economist, y que presentará las tendencias clave que influirán en 202...