Could machine learning fuel a reproducibility crisis in science
- Type:#article
- Date read: 2022-08-17
- Subject: machine learning
- Bibliography: https://www.nature.com/articles/d41586-022-02035-w
- #newsletter
Example citation
Key takeaways
- The pre-print is Kapoor2022 - Leakage and the Reproducbility Crisis in ML-based Science
- People usually apply ML-models after only a few hours of self-study, and reviewers don’t have time to scrutinize the code.
- Risk of over-optimism related to ML models
- Hype around AI and inadequate checks and balances
- Most important issue: Data leakage
Kapoor and Narayanan claim that, once errors are corrected, these models perform no better than standard statistical techniques.
Comparison is logistic regression.