Air quality datasets for India

Gautam Pradhan
Jun 21, 2022

India has some of the most noxious air you can breathe.

IQAir’s World Air Quality Report 2021 says:

  • Central and South Asia have 46 of the top 50 polluted cities in the world
  • 70% of air pollution deaths (5 million per year) occur in this region
  • India accounts for 11 among the top 15 polluted cities in this region

How do we fix this? You cannot fix what you do not measure.

The sources of air pollution are numerous and dispersed. Once emitted, the prevailing weather decides how long the pollutant stays in the air and where it spreads.

How we measure air pollution

Pollutant concentrations can be measured by a wide range of pollution monitors ranging from low cost devices (like Purpleair) to research grade reference monitors.

These monitors are located at specific points. You could also move them around but at any given time they measure the pollution at a specific point.

This measurement tells us what the exposure level at that location is. It can help us take precautionary measures during peak exposure times. We can also understand the pollution trends over a period of time for that location.

We provide data from the continuous air quality monitoring program of the CPCB and from Purpleair sensors to our customers through Earthmetry Signals.

But there are less than 400 research-grade monitors in India today. You cannot really make proper exposure decisions for most locations.

Locations (and average measurements) of pollution monitors in India.

We could turn to satellite data. Satellites can measure the presence of different types of aerosols depending on what instruments they are equipped with.

Satellites measure what is a called a column concentration. It is one value for an entire column extending from the satellite to the ground. Reducing this column concentration to a value at ground level will need additional data and models.

While it does cover all regions, the spatial resolution of Sentinel 5-P’s TROPOMI instrument is 7km x 7km. A small city can fit entirely within one pixel.

The most spatially granular use case can be to understand differences between one zone and another of a large city.

This instrument is better used for large-scale analysis. Check out the below image of a wildfire event in the Australian coast for such a use case.

Twitter/X link

The other problem with current satellite-based measurements is their frequency. Sentinel 5-P satellite provides global coverage but only once a day. This limits use cases that need more frequent data.

There is a program to launch geostationary satellites to cover North America (by NASA), Europe (by ESA) and Asia (by South Korea). Geostationary satellites keep pace with the the Earth’s rotation. So they appear to be at rest and cover the same geography continuously.

The South Korean satellite will only cover India at the edge of its swathe. But this will be much better than the current state of the art.

All these types of pollution measurements can tell us what the exposure is at a given place and time. What they do not tell us is how to fix the problem.

Fixing the problem needs us to look at where pollution comes from and how it spreads.

Where pollution comes from

The sources of pollution are numerous and hard to characterize. They can be stationary (like industrial or power plant chimneys) or moving (like vehicles).

Open fires, both natural (forest fires) and man-made (farm or landfills), contribute to it. Fire for heating and cooking both indoors and outdoors.

You don’t always need to burn fuel. Dust from construction, industry and transport also contribute significantly.

Very little emissions are captured before release.

We need to try and learn as much as possible about all these sources and the rate at which they emit. Such data is called an emissions inventory.

Building these inventories lets us identify key sources of pollution and their respective contribution to overall pollution. It allows us to prioritize and fix what is possible.

Some sources of pollution and how pollution is measured

It takes a lot of resources to build an emissions inventory dataset and keep it updated.

Urban Emissions has compiled an emissions inventory across India that they use for their pollution forecast models.

Several other institutions and Government bodies in India have built emissions inventories but these are not available to the public.

Is it sufficient to measure pollution and build the emissions inventory? To truly link the two, we need to model how pollution moves around.

How pollutants move

Pollutants can move around and dissipate. They can accumulate in the same area. The weather dictates all of this. Temperature, wind, rain, humidity and sunlight all play a role.

The chemicals that constitute pollutants can also transform due to various reactions into byproducts that have different effects.

How do you model this? Let us first imagine how this works.

At 6AM as you wake up, there is already some traffic. Power plants are working. Some industries are operating. All these activities result in emissions. It is a cold winter morning without much wind. There is not much sunlight and the pollutants just accumulate in an area close to where they are emitted.

By 10AM, 4 more hours of emissions have accumulated and the conditions are such that you are advised to stay indoors or use masks to step out.

By 12PM, the sun is strong enough to cut through the haze and can transform some pollutants. Sunlight triggers hundreds of different types of reactions and creates different byproducts.

By 4PM, the wind picks up a bit. The wind can take away the pollution. Or bring it to you if you are downwind!

As you can see the weather plays a significant role. So to model pollution, you plugin in an emissions inventory into a weather model and see where the pollution goes.

These models are called chemical transport models. If you want to learn more about this you can try this online course or many others like it.

These models are highly complicated and building them is best done through collaborative efforts of many scientists. The most commonly used models are GEOS-Chem and WRF-Chem.

So you need an emissions inventory, weather forecasts (such as GFS) and the compute power to run GEOS-Chem or WRF-Chem.

After this you need data delivery mechanisms to use the gridded output and transform them to various useful summaries.

This is a tall order. The Indian Institute of Tropical Meteorology, Pune runs the SAFAR model. But access to the model inputs or outputs is difficult outside the research community.

Urban Emissions runs India-level and, more granular, city-level models. We have partnered with them to build pipelines to help customers use the output data in their workflows and also to validate their outputs against measurements from pollution monitors.

What do these models help us do?

They help us work out where the pollutants you breathe in, come from. For example, we can identify that crop burning in Punjab contributes to XX% of pollution in Delhi in the winter and vehicles, YY%.

This allows planners to correctly prioritize actions that have the biggest impact.

It answers questions like: is moving to EVs in Delhi more impactful than buying crop-waste from Punjab? Or do we need to do both of these to have a meaningful outcome?

You can sample the Air Quality datasets on Earthmetry Signals

We also offer datasets on electricity, climate and other sectors. DM me on Twitter or contact us through our website for deeper conversations.

Transform your data journey

Join us to get easy access to a
wide portfolio of datasets.