Methodology

Methodology.

How we measure the gap between poverty and philanthropic funding across 651 Indian districts, using three government datasets and a composite scoring framework.

Districts Scored

651

POS Range

0–100

All Sectors, default weights. Under a sector filter, scores can reach ~100.

01 / Objective

The screening tool.

The Philanthropic Opportunity Score (POS) quantifies where in India the gap between poverty and corporate social responsibility funding is widest. It is designed as a screening tool for CSR and philanthropic decision-makers who need district-level evidence to guide geographic strategy. The score does not prescribe investment decisions; it surfaces the districts where further due diligence is most warranted.

02 / Pipeline

Six stages, end to end.

From raw PDFs to a ranked ledger. Every record passes through six transforms before it earns a score.

Step1

Extract

Parse PDF, Excel, and CSV sources

Step2

Clean

Exclude unattributable records

Step3

Match

Reconcile names across datasets

Step4

Tier

Group by population tertiles

Step5

Normalize

Min-max scale to 0-1

Step6

Score

Weighted aggregation

03 / Sources

Three public datasets.

Poverty, spending, population. Each sourced from a government system of record.

Poverty01

NITI Aayog National Multidimensional Poverty Index 2023, based on the National Family Health Survey 5 (2019-21). This report uses the Alkire-Foster methodology from Oxford's OPHI to measure multidimensional poverty across health, education, and standard of living. District-level headcount ratios were extracted for both the 2015-16 and 2019-21 survey rounds, covering 653 districts across 36 states and union territories.

CSR02

Ministry of Corporate Affairs National CSR Portal, via Dataful.in (Dataset 1612). Ten fiscal years of CSR expenditure data (FY2014-15 through FY2023-24) at the district-sector level. The most recent three years (FY2021-24) feed the supply gap calculation. Approximately 60.7% of gross CSR spending (₹1,29,660 crore of ₹2,13,594 crore, FY2014-15 to FY2023-24) is classified as "Pan India" or lacks a district code and cannot be attributed to specific districts. These records are excluded entirely, which means district-level CSR totals are conservative.

Census03

Census of India 2011 district-level population data covering 640 districts. This remains the most recent complete district census. The 2021 Census was postponed. Population figures serve as denominators for per capita calculations and as the basis for population-tier classification. Districts created after 2011 (through administrative reorganization) were matched to their parent district populations where possible.

04 / Integration

Reconciling the joins.

The Join Pipeline

Three independently maintained datasets, none of which agrees on district names or boundaries. The MPI is the spine; CSR and Census are joined onto it.

MCA / NITI Aayog / RGI

Census 2011

Registrar General of India

"Hooghly"

640 districts

MPI 2023

NITI Aayog (NFHS-5)

"Hugli (Hooghly)"

653 districts

CSR / MCA

Ministry of Corporate Affairs

"Hugli"

663 (state, district) keys

One canonical district record

West Bengal / Hugli

Step 01

State-scoped fuzzy match

Algorithm

rapidfuzz.fuzz.ratio

Threshold

>= 75% similarity

Conservative, set low to maximize coverage.

Caution band

75 to 90%

Listed in the verification report for reviewer inspection.

State scoping

match never crosses states

Why state scoping matters

Himachal Pradesh

Hamirpur

pop ~454k

|| 100% ||

names identical

BUT

state border blocks the match

Uttar Pradesh

Hamirpur

pop ~1.1M (different district)

Step 02

Rename aliases (no boundary change)

Gurgaon to GurugramMysore to MysuruAllahabad to PrayagrajBelgaum to BelagaviFaizabad to AyodhyaSikkim 2021 (x4)+ 20 more

A pure rename keeps the Census 2011 population unchanged; only the district label is updated.

Step 03

Post-2011 carve-out recast

Over fifty districts in the MPI list did not exist in Census 2011. They were carved from older parent districts after the census was taken, so the parent population must be split, never inherited whole.

Worked example: Maharashtra Thane, split into Palghar (2014) and a residual Thane

Census 2011 Thane (pre-carve-out)11,060,148

2,990,116

8,070,032

Palghar

gazette / DCHB sourced

Thane (residual)

computed: parent minus child

2,990,116 + 8,070,032 = 11,060,148conserved

The residual is never the full pre-carve-out total, so Thane is not double-counted with Palghar.

Source-cited catalogue

scripts/csr/data/external/population_recast_2011.csv

Each row carries a clickable URL pointing to a Census District Census Handbook (DCHB), the relevant state gazette notification, or a Wikipedia article that itself cites those primary sources.

Coverage (50+ districts)

Telangana 2014 (21 children + 9 residual parents)
Ladakh 2019
Bardhaman split 2017 (Paschim + Purba)
Jaintia Hills bifurcation 2012
Palghar, Kondagaon, Gariyaband, and others

Hard fail (no silent fallback)

If any post-2011 district lacks a recast row, the pipeline aborts and names the missing district. There is no fallback to summing parent populations.

Step 04

Three conservation invariants (fatal on violation)

✓Per parent

sum of children + residual within 0.5% of Census parent total

catches a wrong recast number

✓Per state

matched population within minus 5% to plus 0.5% of Census state total

catches a state-level double-count

✓National

matched population at most Census 2011 total + 0.5%

catches an overall over-count

Coverage

651 / 653MPI districts scored

651 matched and scored2 unmatched

Unmatched districts lack a CSR record entirely or name-collide irrecoverably across parent states.

05 / Dimensions

Three signals.

N for need, G for gap, U for unmoved. Each one a separate test; together, the whitespace index.

N40%

Poverty Severity

The MPI headcount ratio from NFHS-5, the share of a district's population that is multidimensionally poor across health, education, and standard of living indicators. Higher values indicate greater unmet need.

G40%

Funding Gap

How much less CSR per person a district receives compared to its population-tier median. A district receiving more than its tier median scores zero on this dimension. It is not underfunded relative to comparable peers.

U20%

Persistent Poverty

The retention ratio: what fraction of 2015-16 poverty persists in 2019-21. Values near one indicate no improvement. Values above one indicate poverty worsened. Districts without a 2015-16 baseline (55 of 651, mostly post-2011 carve-outs that did not exist at NFHS-4) receive the median retention ratio as an imputation. This biases the U dimension toward the population mean for those districts, which is a conservative choice in an index designed to surface outliers.

06 / Stratification

Compare like with like.

CSR spending across Indian districts is heavily right-skewed. The mean CSR density is roughly ₹607 per person while the median is ₹100, a ratio that reflects a long right tail driven by corporate headquarters districts like Mumbai, Bengaluru, and Pune.

A single national median benchmark would flag nearly every rural district as underfunded. Instead, districts are split into three population tiers at the 33rd and 67th percentile boundaries. Each tier receives its own median CSR density as the benchmark.

Tier 1Up to 10.9 lakh

Tier 210.9 lakh to 22.7 lakh

Tier 3Above 22.7 lakh

Figure 02

Tier Median CSR · ₹ per person

Loading data...

Benchmark is tier-specific, not national

Note: Tier 3 (Large) has a lower median than Tier 2 (Medium). This non-monotonic pattern reflects the composition of large districts, which include heavily rural districts with minimal corporate presence alongside urban commercial centres. The tier boundaries are population percentiles, not CSR-ordered groupings.

07 / Aggregation

Normalize, weight, score.

Step 01i

Min-max normalize each component.

Each raw component is scaled to a 0–1 range across all 651 districts. This approach, recommended by the OECD Handbook on Constructing Composite Indicators, ensures components with different units and magnitudes contribute proportionally to the final score.

Step 02ii

Weighted linear aggregation.

POS = (0.40 × N̂ + 0.40 × Ĝ + 0.20 × Û) × 100

Where N̂ is normalized poverty severity, Ĝ is normalized funding gap, and Û is normalized poverty persistence. The result is a score from 0 to 100. Users can adjust all weights through the interactive simulator.

08 / Weighting

Equal partners, one junior.

Need and Gap receive equal weight (40% each) because both are necessary conditions for philanthropic whitespace. A district must be both poor and underfunded to represent a genuine opportunity. A poor district receiving adequate CSR is not a whitespace; a well-funded district with low poverty is not a priority.

Persistence receives lower weight (20%) because the retention ratio derives from only two time points (NFHS-4 and NFHS-5), spaced five years apart. District-level sampling variance is higher than for the headcount ratio itself, making this the least precise of the three signals. This weighting structure mirrors the approach used by the UNDP Human Development Index, where unequal weights reflect differential measurement reliability.

Balanced

40%

20%

Equal emphasis on need and gap

Highest Need

60%

25%

15%

Prioritize poverty severity

Most Underfunded

20%

65%

15%

Prioritize funding gap

Stuck Districts

25%

30%

45%

Prioritize persistent poverty

09 / Distribution

How the scores fall.

Figure 03 · POS Distribution

N = 651

Loading distribution data...

All Sectors, default weights: scores span 0–100. Under a sector filter, unfunded sectors saturate Ĝ at 1.0 and scores can reach ~100.

10 / Classification

Defining neglect.

A district is flagged as neglected when it falls simultaneously in the bottom 25th percentile of CSR per person (below ₹13 per person) and the top 25th percentile of MPI headcount ratio (above 21.1% poverty). These are districts where need is highest and funding lowest. The current dataset identifies 44 such districts.

11 / Precedent

Standing on the shoulders.

The composite scoring approach follows the OECD Handbook on Constructing Composite Indicators (Nardo et al., 2008), the standard international reference for composite index construction. Min-max normalization and weighted linear aggregation follow the same design principles used by the UNDP Human Development Index and the Global Multidimensional Poverty Index itself.

12 / Caveats

What the score cannot see.

Every index has a blind spot. Here are seven we know about.

This is a screening tool for geographic prioritization, not a causal model. A high score does not guarantee that investment will produce impact. It indicates where the gap between need and funding is widest.

District-level CSR totals are conservative. Approximately 60.7% of gross CSR spending is classified as Pan India or lacks a district code and cannot be attributed to specific districts. This exclusion is not random: Pan-India programs disproportionately originate from districts housing corporate headquarters, meaning the funding gap is likely overstated for well-connected urban districts and understated for rural districts that may benefit from these programmes without receiving attribution.

Population denominators are from Census 2011, over eight years before the MPI survey. Districts carved from parents after 2011 inherit the parent's full population (flagged `population_imputed`), which deflates per-capita CSR and inflates G for those districts. Fast-growing districts may also be 10–15% understated relative to current population.

MPI data reflects conditions in 2019-21 (NFHS-5). District-level poverty may have shifted in the years since data collection.

The Unresolved component (U) imputes the median retention ratio (0.555, IQR: 0.442–0.674, std: 0.274) for 55 of 651 districts that lack a 2015-16 baseline — predominantly districts carved from parent districts after Census 2011 (and therefore absent from NFHS-4). U values for imputed districts are estimates, not observations.

The score does not account for government spending beyond CSR, private philanthropy outside the Companies Act framework, or international development aid flowing to these districts.

The three components are treated as independent dimensions. In practice, poverty severity, funding gaps, and poverty persistence may be correlated, which could amplify the signal for districts that score high on multiple dimensions.