A set of commands to visualize and compare plot metrics in structured files (JSON, YAML, CSV, or TSV): show, diff, and modify.
usage: dvc plots [-h] [-q | -v] {show,diff,modify} ...
positional arguments:
COMMAND
show Generate plot from a metrics file.
diff Plot differences in metrics between commits.
modify Modify plot properties associated with a target file.
DVC has two concepts for metrics, that represent different results of machine learning training or data processing:
dvc metrics
represent scalar numbers such as AUC, true positive rate,
etc.dvc plots
can be used to visualize data series such as AUC curves, loss
functions, confusion matrices, etc.DVC provides a set of commands to visualize certain metrics of machine learning experiments as plots. Usual plot examples are AUC curves, loss functions, confusion matrices, among others.
This type of metrics files are created by users, or generated by user data
processing code, and can be defined in dvc.yaml
(plots
field) for tracking
(optional).
DVC generates plots as HTML files that can be open with a web browser. These HTML files use Vega-Lite. Vega is a declarative grammar for defining plots using JSON. The plots can also be saved as SVG or PNG image filed from the browser.
In contrast to dvc metrics
, these metrics should be stored as data series.
Unlike its dvc metrics
counterpart, dvc plots diff
cannot calculate numeric
differences between the metrics in different experiments.
Plot metrics can be organized as data series in JSON, YAML 1.2, CSV, or TSV files. DVC expects to see an array (or multiple arrays) of objects (usually float numbers) in the file.
In tabular file formats such as CSV and TSV, each column is an array.
dvc plots
subcommands can produce plots for a specified column or a set of
them. For example, epoch
, AUC
, and loss
are the column names below:
epoch, AUC, loss
34, 0.91935, 0.0317345
35, 0.91913, 0.0317829
36, 0.92256, 0.0304632
37, 0.92302, 0.0299015
In hierarchical file formats (JSON or YAML), an array of consistent objects is expected: every object should have the same structure.
dvc plots
subcommands can produce plots for a specified field or a set of
them, from the array's objects. For example, val_loss
is one of the field
names in the train
array below:
{
"train": [
{"val_accuracy": 0.9665, "val_loss": 0.10757},
{"val_accuracy": 0.9764, "val_loss": 0.07324},
{"val_accuracy": 0.8770, "val_loss": 0.08136},
{"val_accuracy": 0.8740, "val_loss": 0.09026},
{"val_accuracy": 0.8795, "val_loss": 0.07640},
{"val_accuracy": 0.8803, "val_loss": 0.07608},
{"val_accuracy": 0.8987, "val_loss": 0.08455}
]
}
Users have the ability to change the way plots are displayed by modifying the Vega specification, thus generating plots in the style that best fits the their needs. This keeps DVC projects programming language agnostic, as it's independent from user display configuration and visualization code.
Built-in plot templates are stored in the .dvc/plots/
directory. The default
one is called default.json
. It can be changed with the --template
(-t
)
option of dvc plots show
and dvc plots diff
. For templates in the
.dvc/plots/
directory, the path and the json extension are not required: you
can specify only the base name e.g. --template scatter
.
DVC has the following built-in plot templates:
default
- linear plotscatter
- scatter plotsmooth
- linear plot with LOESS smoothing, see
exampleconfusion
- confusion matrix, see
examplePlot template files are
Vega specification files that
use predefined DVC anchors as placeholders for DVC to inject the plot values.
You can create a custom template from scratch, or modify an existing one from
.dvc/plots/
.
💡 Note that custom templates can be safely added to the template directory.
All metrics files given to dvc plots show
and dvc plots diff
as input are
combined together into a single data array for injection into a template file.
There are two important fields that DVC adds to the plot data:
index
- self-incrementing, zero-based counter for the data rows/values. In
many cases it corresponds to a machine learning training epoch or step number.rev
- Git commit hash, tag, or branch of the metrics file. This helps
distinguish between different versions when using the dvc plots diff
command.Note that in the case of CSV/TSV metrics files, column names from the table header (first row) are equivalent to field names.
<DVC_METRIC_DATA>
(required) - the plot data from any type of metrics
files is converted to a single JSON array internally, and injected instead of
this anchor. Two additional fields will be added: index
and rev
(explained
above).<DVC_METRIC_TITLE>
(optional) - a title for the plot, that can be defined
with the --title
option of the dvc plot
subcommands.<DVC_METRIC_X>
(optional) - field name of the data for the X axis. It can be
defined with the -x
option of the dvc plot
subcommands. The auto-generated
index
field (explained above) is the default.<DVC_METRIC_Y>
(optional) - field name of the data for the Y axis. It can be
defined with the -y
option of the dvc plot
subcommands. The default is the
last one found in the metrics file: the last column for CSV/TSV, or the last
field for JSON/YAML.<DVC_METRIC_X_TITLE>
(optional) - field name to display as the X axis label<DVC_METRIC_Y_TITLE>
(optional) - field name to display as the X axis label-h
, --help
- prints the usage/help message, and exit.-q
, --quiet
- do not write anything to standard output.-v
, --verbose
- displays detailed tracing information.We'll use tabular metrics file logs.csv
for this example:
epoch,accuracy,loss,val_accuracy,val_loss
0,0.9418667,0.19958884770199656,0.9679,0.10217399864746257
1,0.9763333,0.07896138601688048,0.9768,0.07310650711813942
2,0.98375,0.05241111190887168,0.9788,0.06665669009438716
3,0.98801666,0.03681169906261687,0.9781,0.06697812260198989
4,0.99111664,0.027362171787042946,0.978,0.07385754839298315
5,0.9932333,0.02069501801203781,0.9771,0.08009233058886166
6,0.9945,0.017702101902437668,0.9803,0.07830339228538505
7,0.9954,0.01396906608727198,0.9802,0.07247738889862157
Let's plot the last column (default behavior):
$ dvc plots show logs.csv
file:///Users/usr/src/plots/logs.csv.html
Difference in this metric between the current project version and the previous commit:
$ dvc plots diff -d logs.csv HEAD^
file:///Users/usr/src/plots/logs.csv.html
Visualize a specific field:
$ dvc plots show -y loss logs.csv
file:///Users/usr/src/plots/logs.html
In some cases we would like to smooth our plot. In this example we will use a plot with 1000 data points:
$ dvc plots show data.csv
file:///Users/usr/src/plots/plots.html
We can use the -t
option and smooth
template to make it less noisy:
$ dvc plots show -t smooth data.csv
file:///Users/usr/src/plots/plots.html
We'll use classes.csv
for this example:
actual,predicted
cat,cat
cat,cat
cat,cat
cat,dog
cat,dinosaur
cat,dinosaur
cat,bird
turtle,dog
turtle,cat
...
Let's visualize it:
$ dvc plots show classes.csv --template confusion -x actual -y predicted
file:///Users/usr/src/plots/classes.csv.html
A confusion matrix template is predefined in DVC (found in
.dvc/plots/confusion.json
).