Open data resources spanning multiple areas of science

I’m interested in looking at data sets that span a wide range but not limited to social, climate, energy, sports, and healthcare.

Some of the useful data sources I’ve looked at to understand the world around me better are listed below in no particular order:

  1. The World Bank: The world bank provides a data catalog that is very versatile and searchable. Specifically, this URL is useful to download a particular data set and analyze it with R / Python.
    The R package wbstats provides an interface to access the data.
  2. Air quality life index: Air quality life index measures how much longer we can live if we were to breathe clean air. The data here is fine-tuned to the regional level.
  3. NOAA: The national oceanic and atmospheric administration provides free access to archives of global historical weather and climate data.
  4. WHO: The dataset is the World Health Organization’s principal health statistics repository. It contains statistics for more than 1000 indicators for 194 member states of the WHO.
  5. FiveThirtyEight: The datasets by FiveThirtyEight range from sports to politics. Some of the data is well-curated as CSV files and ready to use. I specifically like their polling data and soccer data.
  6. UN Data: The UN Data brings a variety of data resources compiled by the UN. They cover a vast range of statistical themes from agriculture to trade grouped by country.
  7. Happy Planet Index: The happy planet index is another unique dataset that measures “sustainable wellbeing” by country. I was surprised to find that Costa Rica was 1st on the HPI.
  8. Soccer Power Index(SPI) — FiveThirtyEight: The FiveThirtyEight SPI is powered mainly by the R package, engsoccerdata. The SPI informs the world of soccer with predictions and is used by many platforms like ESPN.
  9. Living while black: The “living while black” dataset was collected by Baratunde Thurston. He talks about this topic in a fantastic TED talk, and I encourage everyone to visit, This is a very original and unique dataset.
  10. Gapminder: The Gapminder dataset accompanies the fantastic book Factfulness. The goal of the gapminder dataset and the book is to provide a more fact-based view of the world. The overdramatized picture, as shown on media, does not particularly help our world view.

Originally published at on June 10, 2020.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store