Recently I took the course Visualizing Geospatial Data in Python on DataCamp’s interactive learning platform. To consolidate the new learning, I visualized some spatial datasets for Kenya. Let us find out how the location of Financial Service Providers relates to population.
DataCamp’s course introduced me to the GeoPandas and Folium modules with an emphasis on the visualization of geospatial data. I won’t recommend Python for its beautiful cartography but being able to model geospatial data in a Python environment is extremely valuable. Within this context, map visualizations are important for exploratory data analysis and the presentation of results.
- GeoPandas – this module was developed to make working with geospatial data in Python easier. It combines the capabilities of Pandas and Shapely to manipulate geographic data and geometries. GeoPandas also depends on Fiona to access various GIS data formats. Due to these dependencies, it is recommended GeoPandas is installed in an Anaconda data science environment using: conda install -c conda-forge geopandas.
- Folium – this module makes beautiful interactive maps using the leaflet.js library. Maps contain a basemap for location reference and can be styled to your requirements. I added it to my Anaconda environment using: conda install folium -c conda-forge.
And here is an overview of the spatial datasets that were used:
- Administrative areas – county boundaries and sub-county boundaries were obtained from the ArcGIS Online public portal. These datasets lack accreditation since Kenya doesn’t disseminate authoritative spatial datasets, but this is the best we could get.
- Facilities – the dataset FinAccess geospatial mapping 2013 was downloaded from Harvard Dataverse. The data contains the Lat/Long location of Financial Service Providers (FSPs) in Kenya by type as collected in 2013. The data was published by the Bill and Melissa Gates Foundation, Central Bank of Kenya, and FSD Kenya in 2015.
In addition, we used population numbers from the Kenya Population and Housing Census 2009, which were downloaded from the KNBS website. More specifically we used total population aggregated at county and sub-county levels.
A typical data science project will include more extensive modeling, but we will use the map visualizations with some basic data manipulations to gain insights into the following:
- The distribution of Financial Service Providers by county and sub-county.
- The relationship between the distribution of Financial Service Providers and Population.
Let’s hope that these insights will be memorable and provoking so that we scrutinize our workflows and start thinking about more advanced analysis and additional datasets that we could use.
Preparing the Data
The data on the Financial Service Providers was imported into a Python notebook using the Pandas read_csv function. I imported a subset for Garissa County and a dataset for the whole of Kenya.
Here is how the head of the Garissa Data frame looks like after dropping redundant columns with the drop method and a bit of clean-up:
Notice that the Data frame has Latitude and Longitude columns which capture the location of the Financial Service Providers in geographic coordinates.
To import the county and sub-county boundary shapefiles I used the GeoPandas read_file function with the path to the file as the only argument. After dropping a few redundant columns and a bit of clean-up the Geodata frame for Kenya looks like this:
Notice the geometry column, which stores the geographic coordinates of the county polygon. Below are the first rows of the Geodata frame for the sub-counties of Garissa with a different set of attributes:
Notice the addition of the Area_sqkm column in square kilometers while the geometry is in geographic coordinates. Here is the code that I used to calculate it:
And these are the logical steps:
- Print the existing CRS for reference.
- Convert the geometry to the desired projected coordinate system with the to_crs In our case we use epsg: 3857 for WGS84 Web Mercator (Auxilliary Sphere).
- Calculate the Area_sqkm column from the projected coordinate values in meters with the help of the area attribute of the geometry column.
- Convert the geometry column back to geographic coordinates for easier visualization and interpretation.
To convert a Data frame into a Geodata frame we can use the GeoDataFrame function. Here is how the Financial Service Providers for Kenya were imported as a Geodata frame with a single block of code:
Not sure whether you can read the code, but here are the logical steps.
- Use the Pandas read_csv function to import the FSP csv file as a Data frame.
- Drop redundant columns with the drop
- Renamed the column headers with the DataFrame’s columns
- Use string functions (title, replace) to format the values for FSP type.
- Create the coordinate reference system (CRS) for the Geodata frame. In this case, we use epsg:4326 since the FSP coordinates were collected by GPS using WGS84 as the datum.
- Create a point geometry column using the Shapely Point constructor with the values form the Longitude and Latitude columns.
- Use the GeoDataFrame functions with DataFrame, CRS, and geometry column as arguments to create a Geodata frame.
The head of the GeoDataFrame resulting from the execution of this code looks like this:
Geo-Visualization with GeoPandas
Moving on to the exciting part of our analysis which is visualizing geographic data in Python. The easiest way is to create a scatter plot with Matplotlib using Longitude for the x-values and Latitude for the y-values. Here are the code and the resulting plot.
It works, but a background map would put the location of the FSPs in context. We can use the plot method of GeoPandas with the Garissa sub-county Geodata frame. Here are the code and the output:
Notice that the x- and y-axis now use an equidistant scale and that the FSP locations are plotted on top of the Garissa sub-counties. The only problem is that the patches for the sub-counties don’t appear in the legend. We can fix this by plotting both the Garissa sub-county and FSP Geodata frames with the GeoPandas plot method. Here are the code and the output:
Finally, we’ve managed to produce a map that shows the location of Financial Service Providers in relation to the Garissa sub-counties. If you are familiar with Garissa, you might notice that the FSPs are concentrated in Garissa town and the refugee camps in the North and appear along Garissa’s Western border formed by the Tana river.
To determine the number of FSPs per sub-county we can perform a spatial join with GeoPandas’ sjoin function. Here are the code and the results:
A similar method was used to calculate the no. of FSPs for each county in Kenya. After a bit of data wrangling, I was able to calculate and visualize the no. of FSPs per million people by county. Here are the code and the output:
Geo-Visualization with Folium
So far so good, but now we want to use Folium to add the following capabilities to our map:
- Interactivity that allows us to pan, zoom, and query the map.
- A tiled basemap from a reputable source (e.g. OpenStreetMap).
The default basemap used by Folium is OpenStreetMap. For starters let’s create a Folium centered on Garissa county and add the sub-counties and FSPs. Here are the code and the output:
The code that was used contains the following logical steps:
- Calculate the center point of Garissa county using GeoPandas’ dissolve method on the Garissa Geodata frame and the centroid attribute of the geometry column.
- Create the basemap using the Folium map function with the calculated center point as an argument. We set the initial zoom level to 8 to zoom to the extent of Garissa county.
- Use the Folium GeoJson function to add the Garissa sub-county and Garissa FSP Geodata frames to the map.
- Display the map by simply typing its name.
We can improve the map by using a choropleth rather than a single-color map and adding a pop-up to the FSPs. Here are the code and the output:
The marker symbols clutter the display, so it’s difficult to tell how much FSPs there are in Garissa town and the refugee camps in the North. We can use the Folium MarkerClusters plugin to declutter the map. Here are the code and the output.
Would you have guessed that there are more than 100 FSPs in both places?
Finally, let’s use Folium to display the no. of FSPs per million people for each county in Kenya. The example also illustrates how pop-ups can be configured to show data from different fields. Here are the code and the results:
Not surprisingly Nairobi county has the highest density of FSPs in Kenya. High FSP densities in the neighboring counties of Kiambu and Kajiado, could be evidence of urbanization and urban sprawl.
We used the Python modules GeoPandas and Folium to analyze and visualize Financial Service Providers and population statistics for Garissa and Kenya’s 47 counties. Here are some of the key observations:
- GeoPandas does an excellent job at manipulating geospatial data in Geodata Frames. It was used to read shapefiles, create Geodata frames, calculate areas and centroids, project data coordinates, dissolve polygons, perform spatial joins, and make maps.
- Folium creates beautiful interactive maps. It was used to access OpenStreetMap basemaps, create maps, add markers, and configure pop-ups. We only touched the surface and might explore more in a subsequent article.
- Garissa County has a high concentration of Financial Service Providers (FSPs) in Garissa town and the Somali refugee camps in the northern part. Nairobi County has the highest density in Kenya with 4,794 FSPs per million people. It is followed by Kajiado and Kiambu which have a density of 3,354 and 3,280 respectively.