Data Dailies
💾 Updated on June 16, 2020

A tale of two packages, that is a parable of two perspectives....

  1. A short (maybe misinformed) history
  2. Plots.jl
  3. Gadfly (and the Grammar of Graphics)
  4. Comparison
  5. References and Extras

A short (maybe misinformed) history

This is my very opinionated perspective of the history of Julia visualization development... as an outsider and onlooker who hasn't actually contributed to either of these packages.

Data visualization and plotting in Julia has had a bit of a mottled history[1]. Unlike the scientific Python community which had an early, publication quality incumbent like matplotlib, Julia visualization development has been a bit more ad-hoc and grassroots.

[1] A great example of this is the list of supported backends of Plots.jl (and even the need for Plots.jl in the first place).
Initially (I believe) in the early days, the first libraries were simply wrappers to existing libraries[2] since, on the whole, the Julia ecosystem was very young. In this pre Cambrian Explosion world PyPlot.jl ruled, and being an interface to matplotlib, it could do everything the Pythonistas could do. However, as things progressed and other essential statistical/mathematical packages matured, space opened up to build out the visualization side of the package ecosystem.

[2] Writing a visualization library is far from a trivial task....
But the ease of creating and distributing packages in Julia—combined with the rapidly growing enthusiastic Julia community—led to a mulitplicity of new libraries. Many of these were created to fill a scientific visualization niche[3] (and still do). But with this sort of proliferation, for common/general visualization tasks there was naturally some overlap and it was confusing/overwhelming as a newcomer to know which package is best suited for your goals (if you don't fall into one of those exisitng niches[3]).

[3] things like creating performant 3D visualizations, publication quality LaTex figures, or interactive scientific GUIs.
In this post we will look at (what seem to be) the two frontrunners for general data visualization and plotting. Each package has a unique approach and I will highlight things to consider in general when you are considering data visualization. And we will do this by recreating a simple enough example to understand it quickly but a complex enough example to actually highlight the differences between these two libraries.

julia> ] # enter Pkg REPL
(@v1.4) pkg> activate .
(data-dailies) pkg> add Plots, Gadfly
using CSV, HTTP, DataFrames, Dates

url = "https://covidtracking.com/api/v1/states/ca/daily.csv"

res = HTTP.request("GET", url).body
columns = [:date, :totalTestResultsIncrease]
fmt = "yyyymmdd"
t = Dict(:date=>Date)

data = sort(CSV.read(res; dateformat=fmt, select=columns, types=t))
head(data)
6×2 DataFrames.DataFrame
│ Row │ date       │ totalTestResultsIncrease │
│     │ Dates.Date │ Int64                    │
├─────┼────────────┼──────────────────────────┤
│ 1   │ 2020-03-04 │ 0                        │
│ 2   │ 2020-03-05 │ 0                        │
│ 3   │ 2020-03-06 │ 7                        │
│ 4   │ 2020-03-07 │ 9                        │
│ 5   │ 2020-03-08 │ 19                       │
│ 6   │ 2020-03-09 │ 254                      │

Plots.jl

Plots.jl was developed as a common interface for many of the various existing Julia visualization packages and provides a huge convenience for end-users. Instead of needing to rewrite a visualization for every backend you wanted to render to, now you could develop and fine tune a grpahic once using the Plots.jl API and optionally export it to any of the supported environments[4].

[4] a GUI for rapid develoment, HTML/JS for interactive web plots, PDF for publications, etc.
But what you gain in flexibility of output, you necessarily have to give up in expresivity of API. Since Plots.jl needs to interface to a variety of backends, the API is something of the average of all the backend APIs. And most scientific visualization APIs can trace their lineage back to the MATLAB plotting tradition (which is also true of the matplotlib API).

While not intrinsically good or bad, know that this level of abstraction is really designed for rapidly creating scientific visualizations of tabular data (and often numeric data from experiments). As such, it should feel very easy and natural to use if you have..... mostly numeric tabular data that you want to visualize with conventional plots[5].

[5] or as Hadley Wickham would say: named graphics
using Plots, RollingFunctions

# plot daily test increase as bars/sticks
Plots.plot(data.date,
    data.totalTestResultsIncrease,
    seriestype=:sticks,
    label="Test Increase",
    title = "California Total Testing Capacity",
    lw = 2)

# compute the 7-day average
window = 7
average = rollmean(data.totalTestResultsIncrease, window)

# to add another series we mutate the existing plot
Plots.plot!(data.date,
    cat(zeros(window - 1), average, dims=1),
    label="7-day Average",
    lw=3)
Plots.jl natively operates on Arrays but it has a large ecosystem that extends the base Plots.jl package for statistical plotting, machine learning, and domain specific visualizations (among others).

Gadfly (and the Grammar of Graphics)

While Plots.jl API is a bit more high level than Gadfly.jl, it is much less expressive. I like to conceptualize Plots.jl as a plotting framework that enables customization (in the convention over configuration sense) where as Gadfly.jl is a library that provides you with visualization building blocks[6]. This distinction can be further analogized to the general difference between libraries and frameworks.

[6] This type of componetization applied to visualizations has its roots in the Grammar of Graphics and more recently in Hadley Wickham's ggplot2
using Gadfly

labels = ["Test Increase", "7-day Average"]
colors = ["deepskyblue", "tomato"]

# Gadfly can work with DataFrames directly
Gadfly.plot(data,
    layer(
        x=:date,
        y=:totalTestResultsIncrease,
        Geom.hair,
        Theme(line_width=1.5pt)
    ),
    layer(
        x=:date,
        y=:totalTestResultsIncrease,
        Geom.line,
        Stat.smooth(method=:loess, smoothing=.15),
        Theme(
            default_color=colors[2], line_width=2pt
        )
    ),
    Guide.xlabel("Date"),
    Guide.ylabel(labels[1]),
    Guide.title("California Total Testing Capacity"),
    Guide.manual_color_key("", labels, colors),
    Theme(background_color="white")
)
While a LOESS smooth is not identical to the 7-day moving average used in the plot above, using a smoothing parameter of 0.15 approximates a 7 day moving average in this example and is built into Gadfly.jl (rather than having to use RollingFunctions.jl).

As you can see by comparing the Plots.jl and Gadfly.jl examples, even though the Plots.jl API on the whole is a bit more high level, a composition of Gadfly.jl geometries is much more expressive than the Plots.jl series types.

Comparison

If you only need to make fairly common charts, Plots.jl is probably the way to go since it is easier/quicker for common plots. And it is convenient to have the flexibility of the various backends. If you are creating a more ad-hoc visualization however, Gadfly.jl is more expressive.
PackageUse if you wantWeaknessMost Similar To
Plots.jlMultiple backendsHigh level but inflexible APImatplotlib
Gadfly.jlA declarative Grammar of Graphics APINo built-in interactivityggplot2
VegaLite.jlWeb based interactivityNot designed for non web environmentsAltair

References and Extras

CC0
To the extent possible under law, Jonathan Dinu has waived all copyright and related or neighboring rights to Visualization w/ Julia: A tale of two packages.

This work is published from: United States.

🔮 A production of the hyphaebeast.club 🔮