Treffer: Decoding Data Science Upskilling: Insights From 5 Years of Data Science Projects at the Centers for Disease Control and Prevention, 2019-2023.
Original Publication: Frederick, MD : Aspen Publishers, c1995-
Kana MA, Khanijahani A, Raji IA, Adamu A, Linkov F. Data use in public health. In: Kiel JM, Kim GR, Ball MJ, eds. Healthcare Information Management Systems: Cases, Strategies, and Solutions. Springer International Publishing; 2022: 181-199. doi:10.1007/978-3-031-07912-2_12.
Mirin N, Mattie H, Jackson L, Samad Z, Chunara R. Data science in public health: building next generation capacity. Harv Data Sci Rev. 2022;4(4). doi:10.1162/99608f92.18da72db. (PMID: 10.1162/99608f92.18da72db)
CDC. Data modernization initiative | CDC. October 18, 2024. Accessed October 22, 2024. https://www.cdc.gov/surveillance/data-modernization/index.html.
Bertulfo MCP, Kirkcaldy RD, Franzke LH, Papagari Sangareddy SR, Reza F. Advancing data science among the federal public health workforce: the data science upskilling program, centers for disease control and prevention. J Public Health Manag Pract. 2024;30(2):E41. doi:10.1097/PHH.0000000000001865. (PMID: 10.1097/PHH.0000000000001865)
Martinez-Millana A, Saez-Saez A, Tornero-Costa R, Azzopardi-Muscat N, Traver V, Novillo-Ortiz D. Artificial intelligence and its impact on the domains of universal health coverage, health emergencies and health promotion: an overview of systematic reviews. Int J Med Inf. 2022;166:104855. doi:10.1016/j.ijmedinf.2022.104855. (PMID: 10.1016/j.ijmedinf.2022.104855)
Mooney SJ, Pejaver V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health. 2018;39(1):95-112. doi:10.1146/annurev-publhealth-040617-014208. (PMID: 10.1146/annurev-publhealth-040617-014208)
Wiemken TL, Kelley RR. Machine learning in epidemiology and health outcomes research. Annu Rev Public Health. 2020;41(1):21-36. doi:10.1146/annurev-publhealth-040119-094437. (PMID: 10.1146/annurev-publhealth-040119-094437)
Dichev C, Dicheva D. Towards data science literacy. Procedia Comput Sci. 2017;108:2151-2160. doi:10.1016/j.procs.2017.05.240. (PMID: 10.1016/j.procs.2017.05.240)
Russell SJ, Norvig P. Artificial Intelligence: A Modern Approach. Fourth ed. Pearson; 2021.
de Beaumont Foundation. Association of state and territorial health officials. Public Health Workforce Interests and Needs Survey Data Dashboard. 2025.
Council of State and Territorial Epidemiologists (CSTE). Data Science Team Training (DSTT). Accessed August 4, 2025. https://www.cste.org/page/dstt-webpage.
Federal Chief Data Officer’s Council (CDOC). CDOC Data Skills Training Program: case Studies. 2021. Accessed August 4, 2025. https://resources.data.gov/assets/documents/CDOC%20Data%20Skills%20Case%20Studies%20v6.pdf.
CDC. Artificial Intelligence and machine learning: applying advanced tools for public health. July 3, 2023. Accessed January 21, 2025. https://www.cdc.gov/surveillance/data-modernization/technologies/ai-ml.html.
Weitere Informationen
Context: Public health organizations are increasingly recognizing the value and potential of data science. However, a gap remains in understanding how data science is being applied in public health.
Objective: This article provides a comprehensive overview of data science applications in real-world public health settings. By describing the characteristics of projects supported by the Centers for Disease Control and Prevention's Data Science Upskilling (DSU) program during 2019-2023, we seek to guide future efforts in public health data science workforce development and data modernization.
Methods: We manually reviewed DSU applications and final presentations about the projects compiled during 2019-2023. We analyzed projects based on 7 characteristics, including public health domain and task, data science topic and method, data modality, tools, and programming languages used.
Results: DSU supported 112 data science projects across 5 annual cohorts (2019-2023). Many projects addressed the COVID-19 pandemic (13%), infectious diseases (13%), and vaccines (11%). Approximately half the projects used data visualization (54%) and statistics (51%), with 42% employing artificial intelligence (AI) and machine learning (ML). Furthermore, 52% of projects were designed to support decision making, and 22% sought to improve processes and programs. Learners primarily used RStudio (50%), Jupyter Notebooks (41%), and Power BI (26%), along with Python (56%) and R (55%). AI and ML use increased from 33% of projects in 2019 to 56% in 2023, demonstrating an evolving focus on advanced methodologies.
Conclusions: Many teams prioritized data visualization, such as dashboards and visualization tools to support decision making, indicating opportunities for additional infrastructure and training in this area. We observed increasing use of AI and ML, suggesting a need for staff upskilling in these domains. Optimally leveraging data science technologies will require workforce development strategies and data modernization efforts to keep pace with the rapidly evolving field.
(Copyright © 2025 Wolters Kluwer Health, Inc. All rights reserved.)
The authors declare no conflicts of interest.