11 min read

Lesson 12.1.2 | Adding map features

This second part of the introduction to mapping in \({\bf\textsf{R}}\) covers adding features to maps with ggplot2 and sf. Specific objectives include:

  • Modify spatial data layers
  • Add multiple layers of information to maps
  • Find the interaction of two layers (crop one to another)
  • Combine data from multiple sources

Materials

Video lecture

Video on YouTube

Walking through the script

Packages and data

pacman::p_load(lwgeom)              # dependency
pacman::p_load(tidyverse, sf, 
               rnaturalearth, 
               rnaturalearthdata)

Load Natural Earth data for South America as sf object:

sa_countries <- ne_countries(continent = "South America", 
                             returnclass = "sf")
names(sa_countries)
##  [1] "scalerank"  "featurecla" "labelrank"  "sovereignt" "sov_a3"    
##  [6] "adm0_dif"   "level"      "type"       "admin"      "adm0_a3"   
## [11] "geou_dif"   "geounit"    "gu_a3"      "su_dif"     "subunit"   
## [16] "su_a3"      "brk_diff"   "name"       "name_long"  "brk_a3"    
## [21] "brk_name"   "brk_group"  "abbrev"     "postal"     "formal_en" 
## [26] "formal_fr"  "note_adm0"  "note_brk"   "name_sort"  "name_alt"  
## [31] "mapcolor7"  "mapcolor8"  "mapcolor9"  "mapcolor13" "pop_est"   
## [36] "gdp_md_est" "pop_year"   "lastcensus" "gdp_year"   "economy"   
## [41] "income_grp" "wikipedia"  "fips_10"    "iso_a2"     "iso_a3"    
## [46] "iso_n3"     "un_a3"      "wb_a2"      "wb_a3"      "woe_id"    
## [51] "adm0_a3_is" "adm0_a3_us" "adm0_a3_un" "adm0_a3_wb" "continent" 
## [56] "region_un"  "subregion"  "region_wb"  "name_len"   "long_len"  
## [61] "abbrev_len" "tiny"       "homepart"   "geometry"

Quick refresher on how to view Natural Earth data as a tibble…

sa_countries %>%
  select(name, economy)
## Simple feature collection with 13 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.41094 ymin: -55.61183 xmax: -34.72999 ymax: 12.4373
## CRS:            +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 10 features:
##             name                    economy                       geometry
## 4      Argentina    5. Emerging region: G20 MULTIPOLYGON (((-65.5 -55.2...
## 21       Bolivia    5. Emerging region: G20 MULTIPOLYGON (((-62.84647 -...
## 22        Brazil   3. Emerging region: BRIC MULTIPOLYGON (((-57.62513 -...
## 29         Chile    5. Emerging region: G20 MULTIPOLYGON (((-68.63401 -...
## 35      Colombia       6. Developing region MULTIPOLYGON (((-75.37322 -...
## 46       Ecuador       6. Developing region MULTIPOLYGON (((-80.30256 -...
## 54  Falkland Is. 2. Developed region: nonG7 MULTIPOLYGON (((-61.2 -51.8...
## 67        Guyana       6. Developing region MULTIPOLYGON (((-59.75828 8...
## 124         Peru    5. Emerging region: G20 MULTIPOLYGON (((-69.59042 -...
## 131     Paraguay    5. Emerging region: G20 MULTIPOLYGON (((-62.68506 -...

… then map in ggplot using geom_sf():

ggplot() +
  geom_sf(data=sa_countries) 

Modifying layers

sf works very well with tidyverse and there are several useful functions that follow a familiar data wrangling format to customize a map.

Dissolve boundaries

Data often come with several levels of information that aren’t always necessary to include every time one makes a map. Note how the map above plots country boundaries, but perhaps one wants just the outline of the continent. One could go and find an layer of the continental outline and hope it all lines up. Better yet, one can just remove the lower-level boundaries in the current data.

Dissolving is the term given to removing feature boundaries to create an ‘empty’ outline. Here is a quick tidy trick to perform a dissolve that uses the st_area function:

sa_cont <- 
  sa_countries %>%
    mutate(area = st_area(.) ) %>% 
      summarise(area = sum(area)) 
    
ggplot() +
  geom_sf(data=sa_cont, fill="white") 

Multiple layers can be added to the same map with repeated calls to geom_sf. Note that each needs to have its own data argument. The order of the layers determines which features are covered up or sit on top.

ggplot() +
  geom_sf(data=sa_cont, fill="white") + 
  geom_sf(data=sa_countries, color="darkgrey", fill=NA) 

Add feature information

Basic aesthetic mapping in ggplot makes it easy to plot features by a column of information stored in the data used in geom_sf(). Here we use the pop_est column to shade each country by its estimated human population:

ggplot() +
  geom_sf(data=sa_cont, fill="white") + 
  geom_sf(data=sa_countries, 
          aes(fill=pop_est), 
          color="darkgrey") 

As with any ggplot, note the difference between fill= outside vs. inside of aes().

Create and plot new data column

Likewise, one can use basic tidyverse data wrangling functions to create new columns based on information available in the dataset. Here we use the same st_area function as above to calculate human population density for each South American country:

sa_countries <-
  sa_countries %>%
    mutate(area = st_area(.) / 1000000, 
           density = pop_est / area, 
           density = as.numeric(density)) 

ggplot() +
  geom_sf(data=sa_cont, fill="white") + 
  geom_sf(data=sa_countries,
          aes(fill=density), 
          color="darkgrey") +
  scale_fill_viridis_c("Human population\ndensity (sq. km)")

A perk of sf objects is that they aren’t exclusively spatial data. One can still treat them as a tibble for any sort of summary or analysis:

sa_countries %>%
  as_tibble %>% 
    select(name, density) %>%
      arrange(desc(density)) %>%
  rename(Country = name, `Humans per sq. km` = density) %>% 
pander::pander("Human population density by country in South America. Lo siento para Las Malvinas.") 
Human population density by country in South America. Lo siento para Las Malvinas.
Country Humans per sq. km
Ecuador 58.12
Colombia 39.63
Venezuela 29.52
Brazil 23.36
Peru 22.56
Chile 20.37
Uruguay 19.76
Paraguay 17.43
Argentina 14.69
Bolivia 9.007
Guyana 3.681
Suriname 3.336
Falkland Is. 0.1919

Finally, use dplyr verbs to focus on one country in a tidy way:

sa_countries <-  
  sa_countries %>%
    mutate(name = case_when(
                  name == "Falkland Is." ~ "Argentina", 
                  T ~ name ) )
sa_countries %>%
  filter(name == "Argentina") %>%
    ggplot() +
      geom_sf() 

Add features from other data sources

So far we’ve just used columns from a single dataset (although we did create a new object when we did the dissolve). Now we will add a totally new type of data from an external data source.

All of the features we’ve plotted so far have been polygons, but now we will add point features.

Download and open files from the Web

First, we need to fetch the data, which are stored on the course GitHub page as a .zip file, which is typical for ESRI shapefiles.

Typically we simply follow the URL in a browser, save the .zip file locally, and unzip it into a folder to which we can direct \({\bf\textsf{R}}\) to find it. But often with spatial data, one might need to download and unpack several .zip files, and doing so one by one is onerous. It would be helpful to be able to write a script that did it automatically. Furthermore, often we don’t need the original .zip file once we’ve unpacked it and converted it to an sf object.

For these two reasons, we demonstrate here how to download and unzip a file from the Web via \({\bf\textsf{R}}\) script. While it could be saved anywhere, this script puts it in a temporary folder that will be automatically cleared out as per the local system settings.

The steps are as follows:

Store the file location as an object:

sa_cities_url = "https://github.com/devanmcg/IntroRangeR/raw/master/data/SouthAmericanCities.zip"

Create a temporary directory called tmp_dir:

tmp_dir = tempdir()

Create an empty .zip file in tmp_dir called tmp:

tmp = tempfile(tmpdir = tmp_dir, fileext = ".zip")

Download the .zip file (argument 1) to the desired folder (argument 2):

download.file(sa_cities_url, tmp) 

Now unzip the folder…

unzip(tmp, exdir = tmp_dir)

… and load it into the global environment as an sf object:

sa_cities <- read_sf(tmp_dir, "South_America_Cities") 
sa_cities 
## Simple feature collection with 60 features and 3 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -79.90939 ymin: -51.71 xmax: -34.91464 ymax: 11.01429
## geographic CRS: WGS 84
## # A tibble: 60 x 4
##    NAME               COUNTRY   CAPITAL              geometry
##    <chr>              <chr>     <chr>             <POINT [°]>
##  1 Rio Gallegos       Argentina N             (-69.41 -51.71)
##  2 Comodoro Rivadavia Argentina N              (-67.5 -45.83)
##  3 Puerto Montt       Chile     N                (-73 -41.48)
##  4 Bahia Blanca       Argentina N       (-62.27407 -38.72527)
##  5 Concepcion         Chile     N        (-72.85164 -36.8833)
##  6 Montevideo         Uruguay   Y             (-56.17 -34.92)
##  7 Buenos Aires       Argentina Y        (-58.40959 -34.6654)
##  8 Santiago           Chile     Y       (-70.64751 -33.47503)
##  9 Rosario            Argentina N       (-60.66394 -32.93774)
## 10 Valparaiso         Chile     N           (-71.29934 -32.9)
## # ... with 50 more rows

Note that the geometry is identified as \(\textsf{Point}\).

We can easily add it to the map; geom_sf will automatically take care of the geometries:

ggplot() +
  geom_sf(data=sa_cont, fill="white") + 
  geom_sf(data=sa_countries, color="darkgrey", fill=NA) +
  geom_sf(data=sa_cities)

One can use filter to zoom in on one country, but it must be done to each call to geom_sf when different data objects are involved:

ggplot() + 
  geom_sf(data=sa_countries  %>% 
            filter(name == "Argentina"),
          color="darkgrey", fill="white") +
  geom_sf(data=sa_cities %>%
            filter(COUNTRY == "Argentina")) +
  geom_sf_label(data=sa_cities %>%
            filter(COUNTRY == "Argentina"), 
            aes(label=NAME), nudge_y = 1.5 ) +
  theme(axis.title = element_blank())

sf might return a warning about correct results for long/lat data, but you can ignore it.

Add features with different CRS

Next we will download an additional shapefile, with features stored as polygons that identify ecological regions for South America.

The data are stored online as a .zip file, so we can use the same steps as above to download, unzip, and read into \({\bf\textsf{R}}\) as an sf object:

EPA_URL = "http://www.ecologicalregions.info/data/sa/sa_eco_l3.zip"
tmp_dir = tempdir()
tmp     = tempfile(tmpdir = tmp_dir, fileext = ".zip")
download.file(EPA_URL, tmp) 
unzip(tmp, exdir = tmp_dir)
EPA_SCA <- read_sf(tmp_dir, "sa_eco_l3")
ggplot(EPA_SCA) +
  geom_sf() 

The default graph simply shows the outlines of the lowest-level features.

Ecoregions are stored at three levels. Level 1 are the broadest, most general biomes, while Level 3 is the most specific, often defined by specific vegetation types.

One maps a specific level by using the fill= aesthetic:

EPA_SCA %>%
  mutate_at(vars(LEVEL3:LEVEL1), as.factor) %>% 
  ggplot() +
    geom_sf(aes(fill=LEVEL1))

Cropping via intersect

One might notice a difference between the ecological data and the political maps: the former includes Central America while the other does not. It is straightforward to remove extraneous features from one spatial data object, essentially by cropping its outermost boundary to that of another spatial data object’s boundary.1

In sf this action is called an intersection, and we use the st_intersection function.

But to get spatial data objects to work together, they must be in the same projection:

EPA_SCA 
## Simple feature collection with 3546 features and 9 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -3988232 ymin: -2605909 xmax: 2743070 ymax: 6129532
## projected CRS:  International_1967_Albers
## # A tibble: 3,546 x 10
##      AREA PERIMETER SA_ECO_ALB SA_ECO_A_1 LEVEL3 LEVEL2 LEVEL1 Shape_Leng
##     <dbl>     <dbl>      <int>      <int> <chr>   <dbl>  <int>      <dbl>
##  1 1.81e6     6853.          2          2 16.1.1   16.1     16      6853.
##  2 5.75e5     3528.          3          3 16.1.1   16.1     16      3528.
##  3 8.31e5     6550.          4          4 16.1.1   16.1     16      6550.
##  4 2.27e6     8769.          5          5 16.1.1   16.1     16      8769.
##  5 2.13e6     7619.          6          8 16.1.1   16.1     16      7619.
##  6 5.84e5     4320.          7          7 16.1.1   16.1     16      4320.
##  7 1.37e6     4964.          8          6 16.1.1   16.1     16      4964.
##  8 2.97e6    14901           9         10 16.1.1   16.1     16     14901.
##  9 1.47e6     9004.         10         12 16.1.1   16.1     16      9004.
## 10 2.68e6     6752.         11         17 16.1.1   16.1     16      6752.
## # ... with 3,536 more rows, and 2 more variables: Shape_Area <dbl>,
## #   geometry <POLYGON [m]>

Note the curved lines of the Albers projection.

The EPA ecoregion data uses the Albers 1967 projection, which is wholly incompatible with the longitude/latitude systems our other objects are in. So the first necessary step is to convert to a common CRS, then perform the intersection:

sa_epa <- EPA_SCA %>%
           mutate_at(vars(LEVEL3:LEVEL1), as.factor) %>% 
            st_transform(4326) %>%
              st_intersection(sa_cont)

Now and then one might get an error at this point in the workflow, something like

TopologyException: Input geom 1 is invalid: Ring Self-intersection at or near point…

The error is discussed here. Basically, the solution is to make the geometry valid, with a handy function called st_make_valid. Depending on which sf object is causing the error, either introduce this function above st_intersection in the pipe, or wrap it around the object in st_intersection().

Remember that the lines mapped represent the delineations of the finest-scale features in the dataset, in this case, the Level 3 ecoregions:

ggplot(sa_epa) +
  geom_sf(aes(fill=LEVEL3), show.legend = FALSE ) 

To map higher levels without the internal divisions, we perform another dissolve:

sa_epa_L1 <- sa_epa %>%
              mutate(area = st_area(.) ) %>% 
                group_by(LEVEL1) %>% 
                summarise()     
      
ggplot(sa_epa_L1) +
  geom_sf(aes(fill=LEVEL1)) 

Associating spatial and non-spatial data

The map would be much more informative if it had meaningful labels for the legend, rather than simply the ecoregion number. Unfortunately this information is not in this specific dataset, but they are available on the course GitHub page:

L1_names_URL = "https://raw.githubusercontent.com/devanmcg/IntroRangeR/master/12_RasGIS/SouthAmericaLevel1Names.csv"

sa_L1_names <- read_csv(L1_names_URL) %>%
                mutate(Level1 = as.factor(Level1))
Level1 Level 1 Ecoregion
13 Temperate Sierras
14 Mexican Tropical Dry Forests
15 Middle American Tropical Wet Forests
16 West Indies
17 Northern Andes
18 Central Andes
19 Southern Andes
20 Amazonian-Orinocan Lowland
21 Eastern Highlands
22 Gran Chaco
23 Pampas
24 Monte-Patagonian

These are clearly helpful labels, but this is a simple tibble, not a spatial sf object. Fortunately we can add the labels to the sf object in a tidy way:

sa_epa_L1 <-
  sa_epa_L1 %>%
    left_join(sa_L1_names, by=c("LEVEL1"="Level1"))
sa_epa_L1
## Simple feature collection with 8 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -81.32674 ymin: -55.61017 xmax: -34.79012 ymax: 12.43177
## geographic CRS: WGS 84
## # A tibble: 8 x 3
##   LEVEL1                                           geometry `Level 1 Ecoregion` 
##   <fct>                                  <MULTIPOLYGON [°]> <chr>               
## 1 17     (((-69.69544 -54.86476, -69.67552 -54.8624, -69.6~ Northern Andes      
## 2 18     (((-65.43958 -28.33546, -65.37718 -28.34834, -65.~ Central Andes       
## 3 19     (((-69.02959 -55.46252, -69.03994 -55.46033, -69.~ Southern Andes      
## 4 20     (((-63.42212 -15.09734, -63.35991 -15.1597, -63.2~ Amazonian-Orinocan ~
## 5 21     (((-48.52598 -27.45306, -48.52886 -27.45169, -48.~ Eastern Highlands   
## 6 22     (((-62.23239 -39.20609, -62.23465 -39.20393, -62.~ Gran Chaco          
## 7 23     (((-62.35636 -38.81733, -62.36428 -38.81409, -62.~ Pampas              
## 8 24     (((-69.99024 -53.71317, -69.98009 -53.70756, -69.~ Monte-Patagonian

Now re-plot, but set fill to the new column with meaningful labels:

sa_L1_gg <-
  ggplot() +
    geom_sf(data=sa_epa_L1, aes(fill=`Level 1 Ecoregion`))
sa_L1_gg

To further dress up the map, we can add the political boundaries and specify the fill gradient to use the trendy viridis color palette:

sa_L1_gg +
  geom_sf(data=sa_countries, fill=NA, color="white") +
  scale_fill_viridis_d(alpha=0.75)


  1. The raster package provides a function literally called crop.