Highlights §
- To survive in the twenty-first century and beyond we must transform our secretive and fragile culture into a truly open and rigorous science—one that celebrates openness as much as it appreciates innovation, that prizes robustness as much as novelty. (Location 155)
- Faced with the career pressure to publish positive findings in the most prestigious and selective journals, it is now standard practice for researchers to analyze complex data in many different ways and report only the most interesting and statistically significant outcomes. (Location 607)
- The rules of engagement do not require authors to specify which analytic decisions were a priori (confirmatory) and which were post hoc (exploratory)—in fact, such transparency is likely to penalize authors competing for publication in the most prestigious journals. (Location 640)
- Although each of these scenarios is different, they all share one thing: in each case the researcher is attempting to push the p value just over the line. If this is your goal then it would make sense to stop p-hacking as soon as the p value drops below .05—after all, why spend additional resources only to risk that an effect that is currently publishable might “disappear” with the addition of more participants or by looking at the data in a different way? By focusing on merely crossing the significance threshold, the outcome should be to create a cluster of p values just below .05. (Location 720)
- The only way to verify that studies are not p-hacked is to show that the authors planned their methods and analysis before they analyzed the data—and the only way to prove that is through study preregistration. (Location 817)
- Preregistration. The most thorough solution to p-hacking and other forms of hidden flexibility (including HARKing) is to prespecify our hypotheses and primary analysis strategies before examining data. (Location 932)
- Replication is the immune system of science, identifying false discoveries by testing whether other scientists can repeat them. (Location 1033)
- Instead of valuing the reproducibility of results, psychology has embraced a tabloid culture where novelty and interest-value are paramount and the truth is left begging. Psychology thus succumbs to our third major transgression: the sin of unreliability. (Location 1040)
- conceptual replications don’t actually replicate previous experiments; they instead assume (rather than test) the truth of the findings revealed in those experiments, infer the underlying cause, and then seek converging evidence for that inferred cause using an entirely different experimental procedure. (Location 1061)
- for every 1,000 papers published in psychology, only two will seek to directly replicate a previous experiment, and just one of those will be published by a different team to the original study. (Location 1087)
- While it is the case that the effect size required to achieve statistical significance is greater for low-powered experiments, this doesn’t mean that the probability of those significant results being true is necessarily higher. Here we must be careful not to confuse the probability of the data under the null hypothesis (p) with the probability that an obtained positive result is a true positive. (Location 1245)
- Therefore, high power not only helps us limit the rate of false negatives, but it also caps the rate of false positives—just as a powerful experiment increases the chances of finding an oasis, it also reduces the chances of leading us toward a mirage. (Location 1269)
- For basic scientists, the decisive indicator of our empirical achievements is the advancement of theory that accurately predicts and explains future observations. As we saw in chapter 3, this path eventually leads us to refine theories into such razor-sharp explanatory instruments that they become laws. For applied psychological scientists, the goal is more practical, but the logic is the same: the greatest signifier of success is that we changed something in society, whether addressing a problem such as obesity or mental illness, using psychological evidence to develop a successful policy, or building something that in a demonstrable sense “works.” Yet in our obsession to quantify quality—to turn a thing from words into numbers—we have allowed these aspirations to become overshadowed by a culture of metrics-based performance management. In doing so, psychology commits our seventh and final transgression: the sin of bean counting. (Location 2977)
- But hang on, you might ask, is it entirely wrong to assess individual scientists or their institutions based on grant capture? Surely what makes an awarded grant an indicator of research quality is the fact that someone other than you felt your research was important enough to fund, right? While this is partially true, it is also misleading. What counts in science is the quality of the research that is actually conducted and published, not the hypothetical quality of research we plan to conduct, regardless of how impressive that plan may sound.22 (Location 3220)
- Because in judging a researcher’s achievements we sum inputs (money spent) with outputs (research produced) rather than dividing them. (Location 3233)
- Being awarded a research grant is a fine achievement and something for which a researcher can feel justly proud. But grant capture should be completely irrelevant for assessing the contribution of researchers to science because a grant is not a contribution to science. Rewarding academics for acquiring grants is like deciding the winner of a football tournament based on which team spent more on boots. (Location 3235)
- If you are a junior researcher in psychology you will want to tally up a high quantity of first-authorships, ideally in journals that are prestigious or with high JIFs. As you progress into research leadership and emerge as a “lab head,” it becomes more important to cultivate a rich crop of last authorships as an indicator of seniority. In this way, we can see how the sin of bean counting manifests not simply as a ledger of articles published, but as the quantity of articles published with specific attributions, over and above the scientific quality and importance of the work, and ignoring the murky imprecision of rank authorship. (Location 3300)
- Metrics like JIF are meaningless as indicators of scientific quality, but citations themselves at the level of individual articles—while not indicators of quality—are suggestive of influence and interest-value.37 (Location 3352)
- The sin of bean counting exists because we lack a robust way of assessing the quality of scientific output in the short to the medium term. How do we weigh up the scientific contributions of different individuals in the competition for jobs and funding? How do we decide if a completed grant was successful? Do we check whether independent scientists replicated the funded studies, or at least that the studies were conducted rigorously and are therefore replicable? Until psychology develops a proper mechanism for attributing credit and assessing genuine research quality then bean counting will continue to reign supreme. (Location 3369)
- To adopt a Popperian perspective, a scientific discipline can be said to be one in which the phenomenon under investigation is quantifiable, the hypothesis testable, the experiment repeatable, and the theory falsifiable. (Location 3395)
- Note what is missing from the Stage 2 review criteria. There is no assessment of impact, novelty, or originality of results. There is no consideration of how conclusive the results happen to be. There is no weight placed on whether or not the experimental hypothesis was supported. Such considerations may be relevant for scientists in deciding whether a study is exciting or newsworthy, but they tell us nothing about scientific quality or the longer-term contribution of the study. For Registered Reports, the outcomes of hypothesis tests are irrelevant to whether a piece of scientific research meets a sufficiently high standard to warrant publication. (Location 3604)
- A 2015 analysis of medical trials in the prevention of heart disease found that, since the advent of mandatory clinical trial registration in 2000, the percentage of published trials showing a statistically significant effect dropped from 57 percent (17 out of 30) to just 8 percent (2 out of 25).28 Preregistration might therefore help put the brakes on false discoveries. (Location 3822)
- All data should be archived because, sooner or later, data that are not archived are lost to science and to future generations of scientists. (Location 3974)
- As a Department we do not discuss h-index metrics and we do not count publications or rank them as to who is first author. We just ask, has the candidate really changed significantly how we understand chemistry. (Location 4134)
- Therefore, this book should not be seen as an attack on defenders of the status quo but as an intervention on ourselves, and a mission plan for self-improvement. As the saying goes, there is nothing noble in being superior to your fellow person—true nobility is being superior to your former self. (Location 4216)