Visualize the pipeline(s) in dvc.yaml
as one or more graph(s) of connected
stages.
usage: dvc dag [-h] [-q | -v] [--dot] [--full] [target]
positional arguments:
target Stage or output to show pipeline for (optional)
Uses all stages in the workspace by default.
A data pipeline, in general, is a series of data processing stages (for example, console commands that take an input and produce an output). A pipeline may produce intermediate data, and has a final result.
Data science and machine learning pipelines typically start with large raw datasets, include intermediate featurization and training stages, and produce a final model, as well as accuracy metrics.
In DVC, pipeline stages and commands, their data I/O, interdependencies, and
results (intermediate or final) are specified in dvc.yaml
, which can be
written manually or built using the helper command dvc run
. This allows DVC to
restore one or more pipelines later (see dvc repro
).
DVC builds a dependency graph (DAG) to do this.
dvc dag
command displays the stages of a pipeline up to the target stage. If
target
is omitted, it will show the full project DAG.
--full
- show full DAG that the target
stage belongs to, instead of
showing only its ancestors.--dot
- show DAG in
DOT
format. It can be passed to third party visualization utilities.-h
, --help
- prints the usage/help message, and exit.-q
, --quiet
- do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.-v
, --verbose
- displays detailed tracing information.This command's output is automatically piped to
Less, if available in the
terminal. (The exact command used is less --chop-long-lines --clear-screen
.)
If less
is not available (e.g. on Windows), the output is simply printed out.
It's also possible to enable Less paging on Windows.
It's possible to override the default pager via the DVC_PAGER
environment
variable. For example, the following command will replace the default pager with
more
, for a single run:
$ DVC_PAGER=more dvc dag
For a persistent change, define DVC_PAGER
in the shell configuration. For
example in Bash, we could add the following line to ~/.bashrc
:
export DVC_PAGER=more
Visualize the prepare, featurize, train, and evaluate stages of a pipeline as
defined in dvc.yaml
:
$ dvc dag
+---------+
| prepare |
+---------+
*
*
*
+-----------+
| featurize |
+-----------+
** **
** *
* **
+-------+ *
| train | **
+-------+ *
** **
** **
* *
+----------+
| evaluate |
+----------+