Methodology.
How we measure the gap between poverty and philanthropic funding across 651 Indian districts, using three government datasets and a composite scoring framework.
All Sectors, default weights. Under a sector filter, scores can reach ~100.
The screening tool.
The Philanthropic Opportunity Score (POS) quantifies where in India the gap between poverty and corporate social responsibility funding is widest. It is designed as a screening tool for CSR and philanthropic decision-makers who need district-level evidence to guide geographic strategy. The score does not prescribe investment decisions; it surfaces the districts where further due diligence is most warranted.
Six stages, end to end.
From raw PDFs to a ranked ledger. Every record passes through six transforms before it earns a score.
Extract
Parse PDF, Excel, and CSV sources
Clean
Exclude unattributable records
Match
Reconcile names across datasets
Tier
Group by population tertiles
Normalize
Min-max scale to 0-1
Score
Weighted aggregation
Three public datasets.
Poverty, spending, population. Each sourced from a government system of record.
NITI Aayog National Multidimensional Poverty Index 2023, based on the National Family Health Survey 5 (2019-21). This report uses the Alkire-Foster methodology from Oxford's OPHI to measure multidimensional poverty across health, education, and standard of living. District-level headcount ratios were extracted for both the 2015-16 and 2019-21 survey rounds, covering 653 districts across 36 states and union territories.
Ministry of Corporate Affairs National CSR Portal, via Dataful.in (Dataset 1612). Ten fiscal years of CSR expenditure data (FY2014-15 through FY2023-24) at the district-sector level. The most recent three years (FY2021-24) feed the supply gap calculation. Approximately 60.7% of gross CSR spending (₹1,29,660 crore of ₹2,13,594 crore, FY2014-15 to FY2023-24) is classified as "Pan India" or lacks a district code and cannot be attributed to specific districts. These records are excluded entirely, which means district-level CSR totals are conservative.
Census of India 2011 district-level population data covering 640 districts. This remains the most recent complete district census. The 2021 Census was postponed. Population figures serve as denominators for per capita calculations and as the basis for population-tier classification. Districts created after 2011 (through administrative reorganization) were matched to their parent district populations where possible.
Reconciling the joins.
Three independently maintained datasets, none of which agrees on district names or boundaries. The MPI is the spine; CSR and Census are joined onto it.
Census 2011
Registrar General of India
"Hooghly"
640 districts
MPI 2023
NITI Aayog (NFHS-5)
"Hugli (Hooghly)"
653 districts
CSR / MCA
Ministry of Corporate Affairs
"Hugli"
663 (state, district) keys
One canonical district record
West Bengal / Hugli
State-scoped fuzzy match
rapidfuzz.fuzz.ratio
>= 75% similarity
Conservative, set low to maximize coverage.
75 to 90%
Listed in the verification report for reviewer inspection.
match never crosses states
Why state scoping matters
Himachal Pradesh
Hamirpur
pop ~454k
|| 100% ||
names identical
BUT
state border blocks the match
Uttar Pradesh
Hamirpur
pop ~1.1M (different district)
Rename aliases (no boundary change)
A pure rename keeps the Census 2011 population unchanged; only the district label is updated.
Post-2011 carve-out recast
Over fifty districts in the MPI list did not exist in Census 2011. They were carved from older parent districts after the census was taken, so the parent population must be split, never inherited whole.
Worked example: Maharashtra Thane, split into Palghar (2014) and a residual Thane
Palghar
gazette / DCHB sourced
Thane (residual)
computed: parent minus child
The residual is never the full pre-carve-out total, so Thane is not double-counted with Palghar.
Source-cited catalogue
scripts/csr/data/external/population_recast_2011.csv
Each row carries a clickable URL pointing to a Census District Census Handbook (DCHB), the relevant state gazette notification, or a Wikipedia article that itself cites those primary sources.
Coverage (50+ districts)
- Telangana 2014 (21 children + 9 residual parents)
- Ladakh 2019
- Bardhaman split 2017 (Paschim + Purba)
- Jaintia Hills bifurcation 2012
- Palghar, Kondagaon, Gariyaband, and others
Hard fail (no silent fallback)
If any post-2011 district lacks a recast row, the pipeline aborts and names the missing district. There is no fallback to summing parent populations.
Three conservation invariants (fatal on violation)
sum of children + residual within 0.5% of Census parent total
catches a wrong recast number
matched population within minus 5% to plus 0.5% of Census state total
catches a state-level double-count
matched population at most Census 2011 total + 0.5%
catches an overall over-count
651 / 653MPI districts scored
Unmatched districts lack a CSR record entirely or name-collide irrecoverably across parent states.
Three signals.
N for need, G for gap, U for unmoved. Each one a separate test; together, the whitespace index.
Poverty Severity
The MPI headcount ratio from NFHS-5, the share of a district's population that is multidimensionally poor across health, education, and standard of living indicators. Higher values indicate greater unmet need.
Funding Gap
How much less CSR per person a district receives compared to its population-tier median. A district receiving more than its tier median scores zero on this dimension. It is not underfunded relative to comparable peers.
Persistent Poverty
The retention ratio: what fraction of 2015-16 poverty persists in 2019-21. Values near one indicate no improvement. Values above one indicate poverty worsened. Districts without a 2015-16 baseline (55 of 651, mostly post-2011 carve-outs that did not exist at NFHS-4) receive the median retention ratio as an imputation. This biases the U dimension toward the population mean for those districts, which is a conservative choice in an index designed to surface outliers.
Compare like with like.
CSR spending across Indian districts is heavily right-skewed. The mean CSR density is roughly ₹607 per person while the median is ₹100, a ratio that reflects a long right tail driven by corporate headquarters districts like Mumbai, Bengaluru, and Pune.
A single national median benchmark would flag nearly every rural district as underfunded. Instead, districts are split into three population tiers at the 33rd and 67th percentile boundaries. Each tier receives its own median CSR density as the benchmark.
Figure 02
Tier Median CSR · ₹ per person
Benchmark is tier-specific, not national
Note: Tier 3 (Large) has a lower median than Tier 2 (Medium). This non-monotonic pattern reflects the composition of large districts, which include heavily rural districts with minimal corporate presence alongside urban commercial centres. The tier boundaries are population percentiles, not CSR-ordered groupings.
Normalize, weight, score.
Min-max normalize each component.
Each raw component is scaled to a 0–1 range across all 651 districts. This approach, recommended by the OECD Handbook on Constructing Composite Indicators, ensures components with different units and magnitudes contribute proportionally to the final score.
Weighted linear aggregation.
POS = (0.40 × N̂ + 0.40 × Ĝ + 0.20 × Û) × 100
Where N̂ is normalized poverty severity, Ĝ is normalized funding gap, and Û is normalized poverty persistence. The result is a score from 0 to 100. Users can adjust all weights through the interactive simulator.
Equal partners, one junior.
Need and Gap receive equal weight (40% each) because both are necessary conditions for philanthropic whitespace. A district must be both poor and underfunded to represent a genuine opportunity. A poor district receiving adequate CSR is not a whitespace; a well-funded district with low poverty is not a priority.
Persistence receives lower weight (20%) because the retention ratio derives from only two time points (NFHS-4 and NFHS-5), spaced five years apart. District-level sampling variance is higher than for the headcount ratio itself, making this the least precise of the three signals. This weighting structure mirrors the approach used by the UNDP Human Development Index, where unequal weights reflect differential measurement reliability.
Balanced
Equal emphasis on need and gap
Highest Need
Prioritize poverty severity
Most Underfunded
Prioritize funding gap
Stuck Districts
Prioritize persistent poverty
How the scores fall.
Figure 03 · POS Distribution
N = 651All Sectors, default weights: scores span 0–100. Under a sector filter, unfunded sectors saturate Ĝ at 1.0 and scores can reach ~100.
Defining neglect.
A district is flagged as neglected when it falls simultaneously in the bottom 25th percentile of CSR per person (below ₹13 per person) and the top 25th percentile of MPI headcount ratio (above 21.1% poverty). These are districts where need is highest and funding lowest. The current dataset identifies 44 such districts.
Standing on the shoulders.
The composite scoring approach follows the OECD Handbook on Constructing Composite Indicators (Nardo et al., 2008), the standard international reference for composite index construction. Min-max normalization and weighted linear aggregation follow the same design principles used by the UNDP Human Development Index and the Global Multidimensional Poverty Index itself.
What the score cannot see.
Every index has a blind spot. Here are seven we know about.
This is a screening tool for geographic prioritization, not a causal model. A high score does not guarantee that investment will produce impact. It indicates where the gap between need and funding is widest.
District-level CSR totals are conservative. Approximately 60.7% of gross CSR spending is classified as Pan India or lacks a district code and cannot be attributed to specific districts. This exclusion is not random: Pan-India programs disproportionately originate from districts housing corporate headquarters, meaning the funding gap is likely overstated for well-connected urban districts and understated for rural districts that may benefit from these programmes without receiving attribution.
Population denominators are from Census 2011, over eight years before the MPI survey. Districts carved from parents after 2011 inherit the parent's full population (flagged `population_imputed`), which deflates per-capita CSR and inflates G for those districts. Fast-growing districts may also be 10–15% understated relative to current population.
MPI data reflects conditions in 2019-21 (NFHS-5). District-level poverty may have shifted in the years since data collection.
The Unresolved component (U) imputes the median retention ratio (0.555, IQR: 0.442–0.674, std: 0.274) for 55 of 651 districts that lack a 2015-16 baseline — predominantly districts carved from parent districts after Census 2011 (and therefore absent from NFHS-4). U values for imputed districts are estimates, not observations.
The score does not account for government spending beyond CSR, private philanthropy outside the Companies Act framework, or international development aid flowing to these districts.
The three components are treated as independent dimensions. In practice, poverty severity, funding gaps, and poverty persistence may be correlated, which could amplify the signal for districts that score high on multiple dimensions.