Generoso Roberto — Public Health Data Analyst

Professional case studies

COVID-19 data capacity building across seven local councils

USAID-RTI · Project manager

Programme management

Managed a USAID-funded initiative to strengthen COVID-19 data management capacity. Coordinated cross-functional teams spanning IT, data science, public health, and disaster risk reduction across seven local government councils.

Coordinated multi-disciplinary teams across seven councils

Strengthened local COVID-19 data collection and reporting systems

Bridged data science and public health practice at local government level

Data management·Capacity building·Cross-functional coordination

PALS evaluation across ten NHS Trusts

Healthwatch City of London · Research and projects volunteer

Research

Co-authored an evaluation of Patient Advice and Liaison Services across ten NHS Trusts, assessing the usability, accessibility, and clarity of patient-facing information on each Trust's website. Conducted qualitative coding of patient and public feedback and contributed written sections that were included verbatim in the final published report.

Published report on the Healthwatch City of London website

Qualitative coding of patient feedback across ten NHS Trusts

Identified variability in digital accessibility and PALS visibility across Trusts

Qualitative coding·Desk research·NHS system familiarity·Excel

National patient safety study across 41 hospitals

University of the Philippines Manila, College of Medicine · Science research specialist

Research

Contributed to a national-scale patient safety study. Built Excel-based automation tools that generated summaries for 600 patient safety indicators, reducing manual reporting burden. Co-authored a peer-reviewed journal article on the findings.

Automated reporting for 600 patient safety indicators

Co-authored peer-reviewed publication

Excel automation·Patient safety·Peer-reviewed research

AMR burden and costing study — five tertiary hospitals

Independent researcher

Data analysis

Oversaw data collection from paper-based patient records at one of the five tertiary hospitals and led statistical analysis across 684 patient records using logistic and multivariate regression in Stata. Findings were submitted to the Department of Health as a co-authored research report and presented at the 2nd National Antimicrobial Resistance Summit.

684 patient records, logistic and multivariate regression

Statistical modelling in Stata

Presented findings at the 2nd National Antimicrobial Resistance Summit

Stata·Regression analysis·Health economics

Municipal health leadership — disease surveillance and outbreak response

Department of Health · Municipal health officer

Leadership

Led all preventive health programmes, disease surveillance, and outbreak response for an underserved rural municipality. Coordinated community health workers, nurses, and midwives across ward-level service delivery.

Led municipality-wide disease surveillance system

Managed multi-disciplinary community health team

Delivered outbreak response for underserved population

National postgraduate programme — learning management

Development Academy · Learning manager

Operations

Managed full-cycle implementation of a two-year national postgraduate programme for approximately 100 rural physicians. Coordinated government officials, faculty, and administrative processes. Identified and resolved a six-month contract payment backlog that had stalled programme operations.

Managed programme for ~100 physicians across two years

Resolved six-month payment backlog blocking operations

Coordinated government, academic, and administrative stakeholders

Technical projects

COVID-19 Vaccine Prioritisation: Clustering Philippine Municipalities by Risk

Eskwelabs Data Science Fellowship Capstone · 2021

Python Streamlit

In early 2021, the Philippines began COVID-19 vaccination with a limited supply and 1,480 municipalities to cover. The Department of Health needed a transparent, data-driven method to decide which communities should receive vaccines first. Five government datasets existed but had never been combined into a single analytical framework for this purpose. This project built that framework.

What I built

I collected, cleaned, and merged all five datasets at the municipality level, resolving inconsistent naming conventions across agencies and handling 94 municipalities with no recorded health facility bed data. I then ran exploratory analysis, spatial autocorrelation using Moran's I and LISA, and a hierarchical k-means clustering approach to group municipalities into four priority tiers. The analysis is deployed as a working Streamlit application where decision-makers can view choropleth maps of any risk variable, explore spatial autocorrelation hot spots and outliers, view the four-cluster priority map at national or provincial level, and update individual municipality data to rerun the model.

What it found

27 municipalities had no health facility beds at all, including several with active COVID-19 cases at the time of analysis. These represent the sharpest equity gaps in the dataset. The highest-priority cluster captured Metro Manila and major cities: highest case burden, highest density, relatively better infrastructure but overwhelming absolute need. The lowest-priority cluster by immediate risk was also the lowest by healthcare access, which is the equity argument for parallel allocation even during early rollout. High case numbers and high poverty do not always coincide. The analysis separates these two dimensions rather than collapsing them into a single score.

Limitations: Data is cross-sectional as of March 2021. The model does not account for cold chain capacity, local government implementation capacity, or vaccine hesitancy. Not an official DOH tool and was not used in actual policy decisions.

1,480 municipalities clustered into four vaccination priority tiers

27 municipalities identified with no health facility beds, including active COVID-19 cases

Working Streamlit app deployed for interactive exploration by decision-makers

Python·Pandas·GeoPandas·scikit-learn·libpysal·esda·Streamlit

Open app GitHub repository

NHS A&E Admissions Analysis: Synthetic Data

Personal project

Python Dask

NHS A&E departments face persistent pressure on waiting times, admission rates, and resource allocation. Understanding which patients stay longest, when demand peaks, and what drives length of stay is foundational to operational planning. This project analyses four years of A&E admission data from a single NHS provider to identify the patient, clinical, and temporal factors most associated with extended A&E time and high investigation burden.

The dataset

The data source is the NHS England A&E Synthetic Dataset, a realistic but privacy-safe dataset generated as part of an NHS England pilot to enable data sharing without exposing real patient records. The full dataset is approximately 4.29GB and contains over 65 million rows covering A&E attendances across 200+ NHS providers from 2014 to 2018. At that size, the dataset could not be loaded into memory using standard pandas on a consumer laptop. I used Dask to process the full file in chunks, filter to admitted patients, identify the provider with the highest admission volume, and extract its 326,444 records for analysis.

What it found

The strongest predictor of extended A&E time is investigation burden. Patients undergoing eight or more investigations have a median stay exceeding 240 minutes. But the more revealing finding is about acuity. Patients with high healthcare resource needs spend less time in A&E than those with low resource needs, likely because high-acuity cases move through faster clinical pathways. The combination driving the longest stays is low healthcare resource need, older age, ambulance arrival, and night admission. Operational pressure in this dataset is not driven primarily by the most complex patients. It is driven by a volume of lower-acuity older patients who require extensive workup before a clinical decision can be made. That distinction matters for how A&E capacity is planned and staffed.

Other findings: admissions increased year-on-year from 2014 to 2018. Demand peaks in March and troughs in August. Saturdays have the highest weekly admission rate. The majority of patients in this dataset came from the most deprived areas by IMD decile.

Limitations: The dataset is synthetic and cannot be used for clinical or policy decisions. Analysis is limited to a single unidentified NHS provider and may not generalise across the system. The NHS England hosting page for this dataset is no longer active.

4.29GB dataset processed using Dask across 200+ NHS providers

326,444 admission records analysed from the highest-volume provider

Investigation burden and acuity identified as key drivers of extended A&E stays

Python·Dask·Pandas·Matplotlib·Seaborn·Jupyter Notebook

GitHub repository

Life Expectancy and Health Indicators: Global Trends 2000 to 2015

Personal project

Python Tableau

Life expectancy is one of the most direct measures of whether a health system is working for its population. This project examines how life expectancy changed between 2000 and 2015 across countries grouped by income level, and which health and socioeconomic indicators track most closely with those changes.

What I built

I combined WHO life expectancy data with UN country income classifications, cleaned and prepared the data in Python, and built an interactive Tableau dashboard exploring trends across income groups and nine associated indicators.

What it found

Globally, average life expectancy increased from 66 years in 2000 to 71 years in 2015. The largest gains were in low-income countries, where life expectancy rose by 8 years from a base of 54. Lower middle-income countries gained 5 years; upper middle and high-income countries each gained 4.

Indicators associated with lower life expectancy, including adult deaths, under-five deaths, infant deaths, HIV/AIDS deaths, and adolescent thinness, declined across most income groups over the period. Indicators associated with higher life expectancy, including polio vaccination coverage, health expenditure, years of schooling, and income composition, increased across most groups.

Two findings cut against the general trend. In low-income countries, adolescent thinness did not decline, pointing to persistent food insecurity. In high-income countries, HIV/AIDS-related infant deaths increased over the period.

By 2015, countries with the lowest life expectancy and the poorest indicators across all nine measures were concentrated in Africa. The equity gap is not closing uniformly. Low-income African countries made the biggest relative gains but remained furthest from the global average.

Limitations: Analysis covers 2000 to 2015 only. Data is from a Kaggle WHO dataset and a UN income classification document. Correlations between indicators and life expectancy are descriptive, not causal.

Global life expectancy increased from 66 to 71 years between 2000 and 2015

Low-income countries gained the most, rising 8 years from a base of 54

African countries consistently worst-performing across all nine indicators by 2015

Python·Pandas·Tableau

View dashboard GitHub repository

Diabetes Health Indicators: Risk Factors, Demographics, and Disease Burden

Personal project

Tableau

Chronic disease prevalence is not evenly distributed. Who gets diabetes, and how their health is affected, depends heavily on demographic characteristics, body weight, and modifiable risk factors. This project builds an interactive Tableau dashboard to let users explore those relationships across a US population health dataset.

What I built

Using the Diabetes Health Indicators dataset from Kaggle, I cleaned and prepared the data and built a three-panel interactive Tableau dashboard. Users can filter by demographic characteristics, disease status, and risk factors, then explore how those selections change the distribution of weight categories, disease prevalence, perceived general health, and days of physical and mental ill health per month. The middle panel runs a Pareto analysis to identify which population subgroups account for 80% of diabetes cases in the selected view.

What it shows

The dashboard makes visible what aggregate statistics obscure: that risk factors and disease burden concentrate in specific demographic and weight-category subgroups, and that the same risk factor can have very different associations with health outcomes depending on which group you are looking at. The Pareto view identifies where a small number of subgroups account for a disproportionate share of cases.

Limitations: US population health survey data only. Findings are descriptive and exploratory. No causal claims can be made from this analysis.

Tableau Public

View dashboard

Poverty Rates in the USA in 2015: Geographic Patterns and Associated Factors

Personal project

Tableau

Poverty alleviation requires knowing not just where poverty is highest but which factors are most closely associated with it and whether those patterns are consistent across geographies. This project uses US Census data to explore county-level poverty patterns across the contiguous United States in 2015.

What I built

Using a US Census demographic dataset from Kaggle, I selected six variables: poverty count, poverty percentage, median income, employment rate, percentage of minorities, and percentage of public sector workers. I cleaned and prepared the data in Tableau and built a multi-panel dashboard covering bar charts, scatter plots, and an interactive county-level map.

What it found

The headline finding is that the states with the largest number of people living below the poverty line are not the same states with the highest poverty percentage. These are two different problems requiring different responses. A state with a large poor population but a moderate poverty rate needs different resourcing than a state with a small population and an extremely high poverty rate.

At county level, the variation within states is as striking as the variation between them. The gap between the highest and lowest poverty counties within a single state can be larger than the gap between states.

On associated factors: higher unemployment, higher minority population percentage, and higher public sector employment all correlate positively with poverty percentage. Higher median income correlates negatively. Geographically, the counties with the highest poverty percentages cluster in the southeastern and southwestern United States and in parts of South Dakota.

Limitations: Single year cross-sectional data from 2015 only. Correlations are descriptive, not causal. Data is at county level and may mask variation within counties.

States with the highest poverty population differ from those with the highest poverty percentage

County-level variation within states as large as variation between states

High poverty counties cluster in southeastern and southwestern US and parts of South Dakota

Tableau

View dashboard

Coming soon

Women's health outcomes & PCOS prevalence — UK data analysis.

Skills & tools

Analysis & languages

PythonSQLStataPandasscikit-learn

Visualisation & reporting

TableauMatplotlibExcel automationData storytelling

Public health domain

EpidemiologyDisease surveillanceAMRHealth economicsPatient safety

Methods

Regression analysisK-means clusteringQualitative content analysisProgramme evaluation

About

Public health physician with a decade of experience in disease surveillance, health systems research, and programme management — including USAID-funded data initiatives. Now bringing formal data science training to public health analytics in the UK.

GMC Registered Eskwelabs Data Science Fellowship Masterschool Data Analytics Peer-reviewed author

Looking for: Data, research, or evaluation roles in public health, health equity, or digital health implementation, where clinical background and analytical skills are an asset. UK-based, no sponsorship required.

Get in touch Full background GitHub Download CV