
Fitted-Model-Based Annotations :: Cheat Sheet
‘ggpmisc’ 0.7.0.9003
Pedro J. Aphalo
2026-06-05
Source:vignettes/cheat-sheet.Rmd
cheat-sheet.RmdBasics
ggpmisc follows the grammar of graphics implemented in ggplot2, based on the idea that many different data visualizations can be built by combining the same components: a data set, a coordinate system, and geoms—visual marks that represent data or summaries derived from data. These elements are complemented by stats that compute data summaries to be passed to geoms and scales that describe the mapping of data into graphical elements.
There are multiple variations of each element of the grammar, providing a vocabulary. Thus, the grammar allows us to ‘speak/write’ a graph from composable elements, instead of being limited to a predefined set of charts. ‘ggpmisc’ adds new stats and scales, expanding the vocabulary while remaining consistent with the grammar. ‘ggpmisc’ relies on geoms from packages ‘ggpmisc’ and ‘ggplot2’ for its defaults, while also compatible with geoms from other R packages including ‘ggtext’, ‘marquee’, ‘xdvir’, ‘ggrepel’ and ‘gganimate’.
If you are not already familiar with the grammar of graphics and ggplot2 you should visit the ggplot2 Cheat Sheet first, and afterwards come back to this Cheat Sheet.
Differently to ggplot2, no geometries with the new
stats as their default are provided. The plot layers described here are
always added with a stat, and when necessary, their default
geom argument can be overridden.
Most of the layer functions in ggpmisc aim at making
it easier to add to plots information derived from model fitting, tests
of significance or statistical summaries. All the stats
from ‘ggpmisc’ do computations by data group except for
stat_fit_tb() and stat_multcomp() that do
computations by plot panel.
The statistics that return predicted values for regressions return
x and y where one of the variables is a
sequence of numbers for the explanatory variables and the other contains
the predictions based on them; depending on the orientation
or formula, ymin and ymax, or
xmin and xmax, give the lower and upper
confidence limits for the fitted line or curve.
The statistics returning fitted or residual values return these
values as variables y.fitted or x.fitted,
y.resid or x.resid, weights and
robustness.weights. Variables x and
y contain the observed values. When present,
weights are the prior weights, and
robustness.weights are posterior weights, those actually
used by the model fit function, possibly computed by it.
The statistics that return text labels for annotating plots, return
in x and y the label coordinates, the values
passed as arguments to parameters label.x and
label.y, or values computed based on them. The character
strings are returned as variables with names ending in
.label. These variables can be used in mappings created
with aes() or with use_label(). The difference
is that use_label() accepts short names for the labels,
recognizes them as computed by the stats from package ‘ggpmisc’ and
pastes them into a single character string (e.g.,
use_label("eq", "R2", "n", sep = ", ") is equivalent to
aes(paste(after_stat("eq.label"), after_stat("rr.label"), after_stat("n.label), sep = ", "),
saving some typing. In most cases numeric values for the parameter
estimates are also returned, making possible assembly of labels by user
code.
Correlation
-
stat_correlation()computes parametric r or non-parametric correlation coefficients, \tau and \rho, and optionally their confidence intervals, P, and n, the number of observations, flexibly adding an annotation to the plot.
Fitted models
The statistics for fitted models come in matched pairs, one that adds
a plot layer with one or more curves and confidence band(s), and one
that annotates the plot with the fitted model equation and/or other
parameter estimates. These depend on the type of fitted model and
include R^2, F, P, AIC, BIC,
n, and in most cases also the
fitted-model equation. The curve plotting stats fulfil a similar role to
ggplot2::stat_smooth() while the ones for textual
annotations have no equivalent in ‘ggplot2’.
stat_poly_line()andstat_poly_eq()support a broad set of model fit functions: e.g., linear models (OLS, resistant and robust), general linear model (gls), linear splines, cubic splines, additive models (gam), major axis (MA) and standardised major axis (SMA) regression, etc. The fitted model equation is automatically generated for polynomials, but can be assembled manually for other model formulas.stat_quant_line(),stat_quant_band()andstat_quant_eq()support quantile regression based on both polynomials and smoothing splines (using ‘quantreg’).stat_ma_line()andstat_ma_eq()support major axis (MA), standardised major axis (SMA) and ranged major axis (RMA) regression (using ‘lmodel2’).stat_distrmix_line()andstat_distrmix_eq()support fitting of univariate Normal-distribution mixture models or a of a single Normal distribution.stat_fit_fitted()andstat_fit_deviations()can be used to highlight the fitted values and their distance to the observations in a scatter plot supporting a wide range of model fit functions.stat_fit_residuals()can be used to create consistent plots of residuals for many different model fit functions supporting a wide range of model fit functions.stat_fit_augment()works with model fit functions supported bybroom::augment()methods including non-linear models. Provides an alternative tostat_poly_line()for an even broader range of model fit functions.stat_fit_tidy()works with model fit functions supported bybroom::tidy()methods including non-linear models. Provides numeric values from which equation labels can be created for an even broader range of model fit functions than those supported bystat_poly_eq().broom::tidy()is similar to R’ssummary()for fitted models.stat_fit_glance()works with model fit functions supported bybroom::glance()methods including non-linear models. Provides an alternative tostat_poly_eq()for an even broader range of model fit functions.broom::glance()is similar to R’sanova()applied to a single fitted model object.
ANOVA or summary tables
-
stat_fit_tb()fits any model supported by abroom::tidy()method. Adds an ANOVA or Summary table. Which columns are included and their naming can be set by the user.
Multiple comparisons
-
stat_multcomp()fits a model, computes ANOVA and subsequently calls functions from package ‘multcomp’ to test the significance of Tukey, Dunnet or arbitrary sets of pairwise contrasts, with a choice of the adjustment method for the P-values. Significance of differences can be indicated with letters, asterisks or P-values. Sizes of differences are also computed and available for user-assembled labels.
Aesthetic mappings
- Function
use_label()pastes together the labels automatically generated by the stats and maps the combined string to thelabelaesthetic.
Peaks and valleys
stat_peaks()finds and labels peaks (= global or local maxima).stat_valleys()finds and labels valleys (= global or local minima).
Volcano and quadrant plots
These plots are frequently used with gene expression data, and each
of the many genes labelled based on the ternary outcome from a
statistical test. Data are usually, in addition transformed. ‘ggpmisc’
provides several variations on continuous, colour, fill and shape
scales, with defaults set as needed. Scales support log fold-change
(logFC) on multiple logarithm bases both for input and for
output, false discovery ratio (FDR), P-value
(Pvalue) and binary or ternary test outcomes
(outcome).
Discrete manual scales:
scale_colour_outcome(),scale_fill_outcome(),scale_shape_outcome().Continuous scales:
scale_x_logFC(),scale_y_logFC(),scale_colour_logFC(),scale_fill_logFC().Continuous scales:
scale_x_Pvalue(),scale_y_Pvalue(),scale_x_FDR(),scale_y_FDR().
Utility functions
Most of the functions used to generate formatted labels in layers and scales are also exported.
Learn more at docs.r4photobiology.info/ggpmisc/.