The challenges of big data in low- and middle-income countries: from paper to petabytes


Generation of digital data has expanded exponentially over the last decade, inspiring visions of data-driven healthcare and precision medicine. But the promise of big data is tempered by today’s reality in low resource settings: weak health systems and limited governance structures complicate its application. Many of the countries in greatest need continue to struggle to collect vital statistics on births and deaths, with epidemiological data of variable reliability typically coming from only small, sentinel sites. However, with the falling cost of aggregating and coordinating resources and services electronically, big data stands to deliver disproportionately large benefits to low- and middle-income countries (LMICs). Effective targeting of interventions is increasingly important when the availability of resources is limited.

The collection of individual level information – a prerequisite for big data – is fraught with ethical, regulatory and procedural challenges. Of widespread concern is the risk of breach of privacy, and, as a result, the thought of digitised and centralised repositories of personal records instils fear in many. This concern is further amplified when information is about individuals in vulnerable populations and communities. Even very basic health data – ethnicity, reproductive health history, sexually transmitted infections, diseases with a genetic basis, or risk exposures for disease – has the potential for misuse, leading to discrimination, personal danger or death. The risk of accidental or intentional breaches of data security may be increased with limited literacy, high corruption, or rapid technology transition. In many LMIC settings, legislation supporting the privacy and security of information is frequently underdeveloped and rarely enforced. Robust data sharing guidelines between LMIC stakeholders are often lacking, hampering big data solutions and compromising those in play.

The persistent tension between disease-specific (‘vertical’) programs and health-system (‘horizontal’) focused approaches remains unresolved. Big data arguably fits best with a horizontal approach, potentially improving data for a breadth of diseases to support the new Sustainable Development Goals. However, global health remains a siloed undertaking, often driven by disease specific interests. Ensuring inclusive data collection, dissemination and application is critical for maximizing big data’s potential.

Informed, reflective and resourced stewardship is critical to enable positive outcomes from health big data in LMICs. Unfortunately, the global health community has a patchy record of cohesive and inclusive governance of technical developments. Optimising the application of big data is much more than establishing confidentiality safeguards and minimum standards. A broad effort to establish enforceable interoperability standards is imperative to creating meaningful insight.

Big data’s mechanism of action is magnification; sheer size makes risks and benefits larger. This magnification is greater in low resource settings where big data are most needed and most vulnerable to fragmentation and misuse. Conscious and committed leadership, analysis and technical guidance are needed to minimise these risks. Complexities should not be underestimated; the shift from paper to petabytes in LMICs is a seismic change. Shepherding that transition provides an opportunity for global health institutions to demonstrate governance.

Image caption: “Logo for the Big Data for Health in Africa meeting, hosted by the African Partnership for Chronic Disease Research in Entebbe, Uganda on 3rd-4th November 2016. An initiative building capacity and expertise in Big Data and data science to ensure that African countries are able to capitalise on the scientific  technical, social and economic benefits of this new global industry”


Statistical Modeling for Biomedical Researchers – online resources and class notes

Blog Post by William D. Dupont, PhD, Professor of Biostatistics and Preventive Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, USA

I teach a course in intermediate-level biostatistics as part of the Master of Public Health program at Vanderbilt University.  This program is targeted at clinical fellows who are interested in academic careers in population-based medicine.  Class notes for this course are posted at in both pdf and MS-PowerPoint formats.  This web site also contains the data files used in this course and log files illustrating the analyses performed in the lecture notes.  These notes are based on my text: Statistical Modeling for Biomedical ResearchersThe goal of both this text and these notes is to provide hands-on instruction in modern multi-variable statistical analysis while using a minimum of mathematics.  My web page for this text may be found at more of this post

%d bloggers like this: