ggplot2

The grammar of graphics is the grammar to build graphics with intuitive language. ggplot2 is an implementation of the grammar of graphics.

ggplot2 enables you to compose graphs by combining different components. Specifically, you

  1. Add data using ggplot()
  2. Add layers using layer() or geom_*
  3. Adjust the graph using coordinate_*, etc

Summarized in a template:

ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(
     mapping = aes(<MAPPINGS>),
     stat = <STAT>, 
     position = <POSITION>
  ) +
  <COORDINATE_FUNCTION> +
  <FACET_FUNCTION> +
  <SCALE_FUNCTIONS> +
  <LABEL_FUNCTIONS> + 
  <THEME_FUNCTION>

An example:

library("ggplot2")
g <- ggplot(data = iris) +  # Data part
  geom_point(aes(Petal.Length, Petal.Width)) +  # layer 1 with mapping
  geom_point(aes(Sepal.Length, Sepal.Width), color="red")  # layer 2 with a different mapping
plot(g)

This template is the grammar of graphics.

Let’s look into a layer

Layer

A layer is a combination of data, stat, and geom with a potential position adjustment. Usually, layers are created using geom_* or stat_* calls but it can also be created directly using this function.

layer(
  geom = "point",               # geometric object
  stat = "identity",            # statistical transformation
  data = mpg,         
  mapping = aes(displ, hwy),    # aesthetic mapping
  position = "identity",        # position adjustment
  params = list(na.rm = FALSE), # additional parameters
  inherit.aes = TRUE,
  check.aes = TRUE,
  check.param = TRUE,
  show.legend = NA,
  key_glyph = NULL,
  layer_class = Layer
)

Function geom_*(...) is a shortcut for layer(geom="*", ...).

Here are some examples for each component

  • Aesthetic Mapping
    • shape
    • linetype
    • size
    • fill
    • color
    • alpha
    • group
  • Geom
    • point
    • bar
    • boxplot
    • line
    • histogram
    • density
    • hex
  • Statistical Transformation
    • identity
    • bin
    • boxplot
    • density
  • Position Adjustment ^569ce1
    • identity
    • stack
    • fill stretch the object to fill the space
    • dodge places overlapping objects beside one another
    • jitter add noise to objects to separate them
      • geom_jitter() is shorthand for geom_point(position = "jitter")

Geom

A geom is the geometric object to represent data, and is the most crucial part of a layer, because different geoms make totally different plots. Thus, People often describe plots by the type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms, boxplots use boxplot geoms, and so on.

By creating multiple layers, you can overlap different geoms in the same graph. In code, just add multiple geom_*() functions.

Aesthetic Mapping

  • An aesthetic is a visual property of the objects in your plot
  • Different geom has different aesthetic mappings
  • mapping always pairs with aes(), whose arguments are aesthetic-value pairs
  • x = and y = can be omitted
  • To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside aes()
  • geom_point(mapping = aes(x = displ, y = hwy, color = class))
  • x,y being aesthetics highlights a useful insight about x and y: the x and y locations of a point are themselves aesthetics, visual properties that you can map to variables to display information about the data
  • You can also set the aesthetic properties outside the mapping=aes(), treating them as standalone components; but then the RHS needs to be a specific value rather than a variable
    • geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
  • You can put mapping in the ggplot() function to get global mappings for all layers
    • ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point() + geom_smooth()
    • And then the mapping in a specific can overwrite the global mappings
    • Similarly, you can overwrite data for a specific layer

Statistical Transformation

Not all graphs plot the same information about the data. Scatter plots plot the raw value, while Histograms plots the frequencies. The algorithm used to calculate the statistics to plot is called a stat, short for statistical transformation.

Like bin, it first puts data in different bins, and then calculates the count in each bin.

stat_*(...) functions are another groups of functions short for layer(stat = "*", ...). You can generally use geoms and stats interchangeably. Every geom has a default stat; and every stat has a default geom. To see which stat a geom_* is using by default, use ?geom_*

Position Adjustment

By default, ggplot2 will plot objects “where they are”, i.e., position = "identity". However, this may cause many overlaps that hide information. So there are [[#^569ce1|four other positions]].

Scale

A scale controls how data is mapped to aesthetic attributes, so one scale for one Aesthetic Mapping. Some examples:

MappingScale (example)
xscale_x_date()
yscale_y_continuous()
colorscale_color_manual()
fillscale_fill_viridis_c()
ggplot(data = iris) +
  geom_histogram(mapping=aes(x=Petal.Length, fill=Species), stat = 'bin',position = 'stack') +
  scale_x_continuous(limits = c(0, 10)) +
  scale_y_continuous(limits = c(0, 50))

Another useful scale for Continuous Variable is the log scale. It can present log-ly related data.

  • scale_x_log10(breaks = c(1,10,100,1000,10000))

Coordinate

A coordinate controls how the axes and grid lines are drawn. One ggplot can only have one coord.

library(ggplot2)
p <- ggplot(data = iris) +
  geom_histogram(mapping=aes(x=Petal.Length, fill=Species), stat = 'bin',position = 'stack') +
  coord_polar()
plot(p)

The default coordinate is the Cartesian coordinate. Others are

  • coord_flip()
  • coord_polar()
  • coord_fixed()
  • coord_map() and coord_quickmap() useful for spatial data

Facet

Faceting can be used to split the data up into subsets of the entire dataset.

ggplot(data = iris) +
  geom_histogram(mapping=aes(x=Petal.Length), stat = 'bin') +
  facet_wrap(iris$Species)

The output graph will be three, one for each species.

To create two-dimensional facets:

  • Use option nrow=x in facet_wrap()
  • Use facet_grid() with R Type - Formulax ~ y; then x will be mapped to rows and y to columns ^b6alii
    • Use facet_grid(. ~ x) and facet_grid(x ~ .) when you only want facet on one variable

To use different adaptive scales for each facet, use option scales = "free", scales = "free_y", or scales = "free_x".

  • When comparing different facets, do not free the scales; use a consistent scale instead.

Labels

  • ggtitle()
  • xlab(), ylab(), labs()
  • annotate()