The challenges of Reproducibility and Replicability (R and R) in computer science experiments have become a focus of attention in the last decade, as efforts to follow good research practices have increased. However, experiments using Deep Learning remain difficult to reproduce due to their complexity.
Tasks such as estimating poverty indicators (e.g., wealth index levels) from remote sensing imagery, which require large datasets across diverse geographic locations, would be impossible without DL. To examine reproducibility, we review three DL experiments analyzing visual indicators from satellite and street imagery, identifying challenges in data, methods, and workflows.
Based on this assessment, we propose a checklist incorporating FAIR principles to evaluate reproducibility and recommend actions to improve it and reduce wasted effort. This work is intended for a broad audience, including researchers, authors, and reviewers.