Initialize a DVC project in the current working directory.
usage: dvc init [-h] [-q | -v] [--no-scm] [-f] [--subdir]
DVC works best in a Git repository. This enables all features, providing the
most value. For this reason, dvc init
(without flags) expects to run in a Git
repository root (a .git/
directory should be present).
At DVC initialization, a new .dvc/
directory is created for internal
configuration and cache
files and directories,
that are hidden from the user. This directory is automatically staged with
git add
, so it can be easily committed with Git.
The command options can be used to start an alternative workflow for advanced scenarios:
--subdir
) - for monorepos and nested DVC projects--no-scm
) -
for very simple projects, version control systems other than Git, deployment
automation, among other uses--subdir
must be provided to initialize DVC in a subdirectory of a Git
repository. DVC still expects to find a Git root (will check all directories up
to the system root to find .git/
). This options does not affect any config
files, .dvc/
directory is created the same way as in the default mode. This
way multiple DVC projects can be initialized in a single Git
repository, providing isolation between projects.
This option is mostly useful in the scenario of a
monorepo (Git repo split into several
project directories), but can also be used with other patterns when such
isolation is needed. dvc init --subdir
mitigates possible limitations of
initializing DVC in the Git repo root:
.dvc/
directory,
especially if DVC is already being used by several sub-projects (monorepo).dvc pull
and dvc repro
explore the whole
DVC repository to find DVC-tracked data and pipelines to work
with. This can be inefficient for large monorepos.dvc status
and dvc metrics show
would produce unexpected
results if not constrained to a single project scope.The project root is found by DVC by looking for .dvc/
from the
current working directory, up. It defines the scope of action for most DVC
commands (e.g. dvc repro
, dvc pull
, dvc metrics diff
, etc.) meaning that
only dvc.yaml
, .dvc
files, etc. inside the project are usable by the
commands.
With --subdir
, the project root will be found before the Git root, causing the
scope of DVC commands run here is constrained to this project alone.
If there are multiple --subdir
projects, but not nested, e.g.:
. # git init
├── .git
├── project-A
│ ├── .dvc # dvc init --subdir
│ ...
├── project-B
│ ├── .dvc # dvc init --subdir
│ ...
DVC considers A and B separate projects. Any DVC command run in project-A
is
not aware of project-B
. However, commands that involve versioning (like
dvc diff
, among others) access the commit history from the Git root (.
).
.
is not a DVC project in this case, so most DVC commands can't be run there.
If there are nested --subdir
projects e.g.:
project-A
├── .dvc # git init && dvc init
├── .git
├── dvc.yaml
├── ...
├── project-B
│ ├── .dvc # dvc init --subdir
│ ├── data-B.dvc
│ ...
Nothing changes for the inner projects. And any DVC command run in the outer one
actively ignores the nested project directories. For example, using dvc pull
in project-A
wouldn't download data for the data-B.dvc
file.
In rare cases, the --no-scm
option might be desirable: to initialize DVC in a
directory that is not part of a Git repo, or to make DVC ignore Git. Examples
include:
dvc.yaml
files to manage
pipelines, data, etc, they can be added into any version control
system, thus providing large data files and directories versioning.cron
.In this mode, DVC features related to versioning are not available. For example
automatic creation and updating of .gitignore
files on dvc add
or dvc run
,
as well as dvc diff
and dvc metrics diff
, which require Git revisions to
compare.
DVC sets the core.no_scm
config option value to true
in the DVC
config when initialized this way. This means
that even if the project is tracked by Git, or if Git is initialized in it
later, DVC will keep operating detached from Git in this project.
-f
, --force
- remove .dvc/
if it exists before initialization. Will
remove any existing local cache. Useful when a previous dvc init
has been
corrupted.--subdir
- initialize the DVC project in the current working directory,
even if it's not the Git repository root. (If run in a project root, this
option is ignored.) It affects how other DVC commands behave afterwards,
please see
Initializing DVC in subdirectories for
more details.--no-scm
- initialize the DVC project detached from Git. It means that DVC
doesn't try to find or use Git in the directory it's initialized in. Certain
DVC features are not available in this mode, please see
Initializing DVC without Git for more
details.-h
, --help
- prints the usage/help message, and exit.-q
, --quiet
- do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.-v
, --verbose
- displays detailed tracing information.Create a new DVC repository (requires to be run in the Git repository root):
$ mkdir example && cd example
$ git init
$ dvc init
$ git status
...
new file: .dvc/.gitignore
new file: .dvc/config
$ git commit -m "Init DVC"
Note that the cache directory (among others) is not tracked with Git. It contains data and model files, and will be managed by DVC.
$ cat .dvc/.gitignore
/state
/lock
...
/cache
Create a new DVC repository in a subdirectory of a Git repository:
$ mkdir repo && cd repo
$ git init
$ mkdir project-a && cd project-a
$ dvc init --subdir
In this case, Git repository is inside repo
directory, while DVC
repository is inside repo/project-a
.
$ tree repo -a
repo
├── .git
.
.
.
└── project-a
└── .dvc