There are a lot of arguments to plot()! Several of these arguments will be discussed in these slides, but not all of them. That means that making plots often involves teaching yourself something new each time with the help pages, Stack Overflow, and other various websites and blogs.
plot(x, y =NULL, type ="p", xlim =NULL, ylim =NULL,log ="", main =NULL, sub =NULL, xlab =NULL, ylab =NULL,ann =par("ann"), axes =TRUE, frame.plot = axes,panel.first =NULL, panel.last =NULL, asp =NA,xgap.axis =NA, ygap.axis =NA, ...)
Scatterplot
# Vectors of coordinatesplot(x = mtcars$wt,y = mtcars$mpg)
What do we notice?
Plots points (type = “p”) of a specific shape (pch = 1)
Axis labels (R code supplied to the arguments)
No header (main = NULL)
Chooses axis ticks for you
…and hundreds more!
Change the main title
plot(x = mtcars$wt, y = mtcars$mpg,main ="Vehicle Efficiency by Weight") # adding the title
Change the axis titles
plot(x = mtcars$wt, y = mtcars$mpg,main ="Vehicle Efficiency by Weight",xlab ="Vehicle Weight (1000 lbs)", #<< adding x-axisylab ="Miles Per Gallon (MPG)") #<< adding y-axis
You can create other types of graphs.
For instance, histogram
hist(x = mtcars$mpg, # data to plotbreaks =15, # change default number of barsxlim =c(10, 35), # change size of x-axismain ="", # no main titlexlab ="Mile Per Gallon (MPG)", # x-axis titlelas =1, # y-axis ticks horizontalborder ="darkblue", # bar border colorcol ="lightblue") # bar fill color
plot() works great for EDA but not for advanced stuff
Something like this is not possible in plot()
Animated gifs are possible with gganimation extension
Understanding the philosophy is 90% of understanding how to create figures with ggplot2
The remaining 10% is learning the various functions that correspond with each part of the philosophy
The Grammar of Graphics
Central Idea: Instead of creating a function for every single type of plot, decompose graphics its its separate components/layers that can be used flexibly to create (almost) any type of plot you want.
Example Data: gapminder
We will be using the gapminder data from the gapminder package for this lecture
Take all the scale values from come from mapping and may have been transformed by statistics and interprets/plots them in some way
e.g., a line geometry (geom_line()) interprets data on way and creates lines on your figure while a boxplot geometry (geom_boxplot()) interprets the data another way
ggplot(data = gapminder,mapping =aes(x = year)) +geom_boxplot() # No `y` mapping needed for boxplot
Some examples
The geoms
geoms is the abbreviation for geometric objects which are used to specify which type of graphic you want to produce (boxplot, barplot, scatterplot, histogram, …). All ggplot2 geoms start with the geom_ prefix.
☝ Plots mean values for all observations within each year
Multiple Geometries
Different geometries do not necessary share the same mapping. For example, geom_point() needs (at minimum) an x and y mapping, but geom_histogram() only needs an x mapping (the statistic determines the y-axis)
There is an “Aesthetics” section in the help page for each geom that describes the required and optional mapping parameters
You can (and often will) have multiple layers of geometries in the same figure
⚠️ The order or your geometries matter, because each later is plotted on top of the previous layers
Adding geometries example
ggplot(data = gapminder,mapping =aes(x = year,y = lifeExp)) +geom_bar(stat ="summary", width =3, fill ="red") +# adding bar graphgeom_line(stat ="summary", color ="blue") # adding line graph
Layer: Statistics
Your data do not always have the required statistics for each type of figure
For example, plotting a boxplot requires calculating the 25th, 50th, and 75th percentiles of your data and the interquartile range
Sometimes your data are exactly what is needed (e.g., creating a scatterplot), in which case you set your statistic to identity which just passes your data on to that layer
Statistics: Errorbars example
However, sometimes you do need to manipulate your data to get the correct aesthetic mapping for a geom (e.g., creating errobars)
Step 1: create summary statistics from our data (mean and standard error)
Statistics are linked to geometries such that each geometry requires a statistic (and vice versa: each statistic requires a geometry)
Thus, geometries have default statistics that try to guess what you want to plot but that can also be changed
Defaults for common geometries:
geom_point(stat = "identity"
geom_count(stat = "sum")
geom_jitter(stat = "identity")
geom_bar(stat = "count")
geom_density(stat = "density")
geom_histrogram(stat = "bin")
geom_boxplot(stat = "boxplot")
geom_violin(stat = "ydensity")
geom_rug(stat = "identity")
geom_freqpoly(stat = "bin")
geom_quantile(stat = "quantile")
geom_smooth(stat = "smooth")
Layer: Scales
How are properties of the axes, colors, and other aesthetics determined?
Scales control the details of how data values are translated to visual properties (e.g., plot Africa with #F8766D, Americas with #B79F00, etc.)
All geometries are given default scales which you can override with the scale_*() function
Layer: Scales
Scale functions have the syntax: scale_<aesthetic>_<type> where <aesthetic> refers to each aesthetic mapping (x, y, color, etc.) and <type> refers to the type of scale (continuous, discrete, log10, etc.)
There are dozens of different types of scales in ggplot2, all of which can be found in this documentation.
Layer: Scales – example
ggplot(data = gapminder,mapping =aes(x = year, y = gdpPercap * pop, color = continent)) +geom_line(stat ="summary", size =1.5) +geom_point(stat ="summary", shape =21, fill ="white", size =2) +scale_x_continuous(name ="Year", breaks =unique(gapminder$year)) +scale_y_continuous(name ="Gross Domestic Product (USD)") +scale_color_brewer(palette ="Set1") +ggtitle("Life Expectancy Over Time by Continent")
Layer: Facets
Often we are focused on creating one figure per plotting area, but we are not constrained to this and may want to create multiple subplots when looking at our data
Facets are multiple panels of plots, with the same plotting logic, on different groups of your data
Use facets to prevent overplotting (plotting too much data in one figure)
Two different kinds of facets: facet_wrap() and facet_grid()
Layer: Facets – facet_wrap()
facet_wrap() takes a column from your data with a grouping structure and creates several subplots for each group
To tweak other aspects of your plots them you can use the theme() function, which has 94 arguments to give you complete control over all elements of your plot
To demonstrate, we’ll use the following plot from previous slides:
Layer: Theme – panel features
p +theme(panel.grid.major =element_line(color ="black", linetype =2, size =0.25),panel.grid.minor =element_blank(),panel.background =element_rect(fill ="white"),panel.border =element_rect(color ="black", fill =NA, size =1))
Layer: Theme – axes features
p +theme(title =element_text(family ="Ubuntu Mono", face ="bold"),axis.title.y =element_text(family ="Ubuntu Mono"),axis.title.x =element_blank(),axis.text =element_text(family ="Ubuntu Mono", color ="black", size =11),axis.text.x =element_text(angle =45, hjust =1))
There is a lot to learn about ggplot2 and data visualization, but I hope you have learned something today. If it is not the case…
TidyTuesday
TidyTuesday is a weekly challenge where people use (mostly) ggplot2 to explore a new dataset.
A weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the R for Data Science textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem.