To add dependencies or outputs to a
stage, edit the dvc.yaml file (by hand or using
dvc run with the -f --no-exec flags). dvc repro will execute it and
cache the output files when ready.
If the stage has already been executed it and the desired outputs are present in
the workspace, you can avoid dvc repro (which can be expensive
and is unnecessary) and use dvc commit instead.
Note that both alternatives update
dvc.locktoo.
We start with an example prepare stage, which has a single dependency and
output. To add a missing dependency (data/raw.csv) as well as a missing output
(data/validate), we can edit dvc.yaml like this:
stages:
prepare:
cmd: python src/prepare.py
deps:
+ - data/raw.csv
- src/prepare.py
outs:
- data/train
+ - data/validateWe could also use
dvc runwith-fand--no-execto add another dependency/output to the stage:$ dvc run -f --no-exec \ -n prepare \ -d data/raw.csv \ -d src/prepare.py \ -o data/train \ -o data/validate \ python src/prepare.py
-foverwrites the stage indvc.yaml, while--no-execupdates the stage without executing it.
If the data/raw.csv or data/validate files already exist, we can use
dvc commit to cache the newly specified outputs (and to update the deps and
outs file hashes in dvc.lock):
$ dvc commit