Show changes in the project pipelines, as well as file mismatches either between the cache and workspace, or between the cache and remote storage.
usage: dvc status [-h] [-v] [-j <number>] [-q] [-c] [-r <name>] [-a] [-T]
[--all-commits] [-d] [-R] [--show-json]
[targets [targets ...]]
positional arguments:
targets Limit command scope to these tracked files/directories,
.dvc files, or stage names.
Searches for changes in the existing tracked data and pipelines, either showing
which files or directories have changed in the workspace and should
be added or reproduced again (with dvc add
or dvc repro
); or differences
between cache vs. remote storage
(implying dvc push
or dvc pull
should be run to synchronize them). The
remote mode is triggered by using the --cloud
or --remote
options:
Mode | Option | Description |
---|---|---|
local | none | Comparisons are made between data files in the workspace and corresponding files in the cache directory (e.g. .dvc/cache ) |
remote | --remote | Comparisons are made between the cache, and the given remote. Remote storage is defined using the dvc remote command. |
remote | --cloud | Comparisons are made between the cache, and the default remote (typically defined with dvc remote --default ). |
Without arguments, this command checks all stages (defined in dvc.yaml
) and
.dvc
files, and compares the hash values of their outputs (found
in dvc.lock
for stages) against the actual data files or directories in the
workspace. The --all-branches
, --all-tags
, and --all-commits
options
enable checking data for multiple Git commits.
The targets
given to this command (if any) limit what to check. It accepts
paths to tracked files or directories (including paths inside tracked
directories), .dvc
files, and stage names (found in dvc.yaml
).
If no differences are detected, dvc status
prints
Data and pipelines are up to date.
or
Cache and remote 'myremote' are in sync
(if using the -c
or -r
options are
used). If differences are detected, the changes in dependencies
and/or outputs for each stage that differs are listed. For each
item listed, either the file name or hash is shown, along with a state
description, as detailed below:
.dvc
file hash has changed (e.g. someone
manually edited it)..dvc
file with no dependencies (see
dvc add
) or that the stage in dvc.yaml
has the always_changed: true
value set (see --always-changed
option in dvc run
).changed deps or changed outs means that there are changes in dependencies
or outputs tracked by the stage or .dvc
file. Depending on the use case,
commands like dvc commit
, dvc repro
, or dvc run
can be used to update
the file. Possible states are:
dvc.lock
or .dvc
file
yet.dvc.lock
or .dvc
file is not up
to date.dvc.lock
or .dvc
file, but does not exist in the workspace.dvc.lock
or .dvc
file is up to date, but there is no
corresponding cache file or directory.dvc update
.For new and deleted data, the cache is different from remote storage.
Bringing the two into sync requires dvc pull
or dvc push
. For the typical
process to update the workspace, see
Sharing Data And Model Files.
For missing data, there's nothing to retrieve from storage. This can happen
for example in fresh DVC repository clones if the data wasn't
uploaded from the original repo, or after certain uses of dvc gc
. You can try
dvc repro
to regenerate the output locally, and dvc push
remotely after
that.
-c
, --cloud
- enables comparison against a remote (see dvc remote
). If
the --remote
option is not used, DVC will compare against the default remote
(specified in the core.remote
config option).-a
, --all-branches
- compares cache content against all Git branches
instead of just the current workspace. This basically runs the same status
command in every branch of this repo. The corresponding branches are shown in
the status output. Applies only if --cloud
or a -r
remote is specified.
Note that this can be combined with -T
below, for example using the -aT
flag.-T
, --all-tags
- same as -a
above, but applies to Git tags as well as
the workspace. Note that both options can be combined, for example using the
-aT
flag.-R
, --recursive
- determines the files to check status for by searching
each target directory and its subdirectories for stages (in dvc.yaml
) and
.dvc
files to inspect. If there are no directories among the targets, this
option is ignored.--show-json
- prints the command's output in easily parsable JSON format,
instead of a human-readable table.--all-commits
- same as -a
or -T
above, but applies to all Git commits
as well as the workspace. This compares the cache content for the entire
commit history of the project.-d
, --with-deps
- determines files to check by tracking dependencies to
the targets
. If none are provided, this option is ignored. By traversing all
stage dependencies, DVC searches backward from the target stages in the
corresponding pipelines. This means DVC will not show changes occurring in
later stages than the targets
. Applies whether or not --cloud
is
specified.-r <name>
, --remote <name>
- specifies which remote storage (see
dvc remote list
) to compare against. Implies --cloud
.-j <number>
, --jobs <number>
- parallelism level for DVC to retrieve
information from remote storage. This only applies when the --cloud
option
is used, or a --remote
is given. The default value is 4 * cpu_count()
. For
SSH remotes, the default is 4
. Note that the default value can be set using
the jobs
config option with dvc remote modify
. Using more jobs may speed
up the operation.-h
, --help
- prints the usage/help message, and exit.-q
, --quiet
- do not write anything to standard output. Exit with 0 if
data and pipelines are up to date, otherwise 1.-v
, --verbose
- displays detailed tracing information.$ dvc status
dofoo:
changed deps:
modified: baz
changed outs:
modified: foo
dobar:
changed deps:
modified: foo
changed outs:
deleted: bar
baz.dvc:
changed outs:
modified: baz
This shows that for stage dofoo
, the dependency baz
and the output foo
have changed. Likewise for stage dobar
, the dependency foo
has changed and
the output bar
doesn't exist in the workspace. For baz.dvc
, the file baz
tracked by it has changed.
dvc status
only checks the tracked data corresponding to any given targets
:
$ dvc status foo.dvc dobar
foo.dvc:
changed outs:
modified: foo
changed checksum
dobar:
changed deps:
modified: foo
changed outs:
not in cache: bar
In this case, the target
foo.dvc
is a.dvc
file to track thefoo
file, whiledobar
is the name of a stage defined indvc.yaml
.
Note that you can check data within directories tracked, such as the data/raw
directory (tracked with data/raw.dvc
):
$ tree data
data
├── raw
│ ├── partition.1.dat
│ ├── ...
│ └── partition.n.dat
└── raw.dvc
$ dvc fetch data/raw/partition.1.dat
new: data/raw
$ vi code/featurization.py
... edit the code
$ dvc status model.p
Data and pipelines are up to date.
$ dvc status model.p --with-deps
matrix-train.p:
changed deps:
modified: code/featurization.py
The dvc status
command may be limited to a target that had no changes, but by
adding --with-deps
, any change in a preceding stage will be found.
Let's now assume that we have a shared remote on S3: and would like to check what files we have generated but haven't pushed to the remote yet:
$ dvc remote list
storage s3://bucket/path
And would like to check what files we have generated but haven't pushed to the remote yet:
$ dvc status --remote storage
...
new: data/model.p
new: data/eval.txt
new: data/matrix-train.p
new: data/matrix-test.p
The output shows where the location of the remote storage is, as well as any
differences between the cache and storage
remote.
Let's import a data file (data.csv
) from a different DVC repository
into our current project using dvc import
.
$ dvc import different/repo/location data.csv
The resulting data.csv.dvc
file is called an import stage. If the
original file or directory changes later, dvc status
will show
update available
as output:
$ dvc status
data.csv.dvc:
changed deps:
update available: data.csv (different/repo/location)
The imported data can be brought to its latest version by using dvc update
.