Vector Faculty Member Marzyeh Ghassemi recently contributed to a Viewpoint article for JAMA online: Challenges to the Reproducibility of Machine Learning Models in Health Care. Written by Ghassemi, Andrew L Beam and Arjun Manrai, the article looks at the intersection between machine learning and health care and the benefits and challenges of reproducibility and replicability.
Reproducibility refers to the ability of researchers to get the same results as the original study when given access to its underlying data and analysis code. However, even when a study is reproducible, the authors caution, it does not mean that it is replicable (where someone applies the same methods to a different set of data), or that generalizations can be made from its results.
Machine learning is helping to create many new clinical prediction tools. Though they offer potential improvements to the quality of care for patients and cost savings for hospitals, they also present unique challenges and obstacles to reproducibility, something that needs to be carefully considered for patient safety as these techniques begin to be deployed in hospitals.
“There is a significant push in the machine learning community to create robust results,” says Ghassemi. Using high-capacity models that can model many relationships among many variables, can add complexity to reproducibility; often there are technical choices made by default, or as part of an installation process, that could impact the outcome of a study. “A doctor in another place could run that same model on their data and get very different results.”
Code sharing and data accessibility are both important goals that Ghassemi believes must always be kept top-of-mind. “The long-term risks of creating niche unverifiable results in a clinical setting are real, and we should invest in a culture of research that allows for intellectual verification.”