Download a file or directory from a supported URL (for example s3://
,
ssh://
, and other protocols) into the local file system.
See
dvc get
to download data/model files or directories from other DVC repositories (e.g. hosted on GitHub).
usage: dvc get-url [-h] [-q | -v] url [out]
positional arguments:
url (See supported URLs in the description.)
out Destination path to put files in.
In some cases it's convenient to get a file or directory from a remote location
into the local file system. The dvc get-url
command helps the user do just
that.
Note that unlike
dvc import-url
, this command does not track the downloaded data files (does not create a.dvc
file). For that reason, this command doesn't require an existing DVC project to run in.
The url
argument should provide the location of the data to be downloaded,
while out
can be used to specify the directory and/or file name desired for
the downloaded data. If an existing directory is specified, then the file or
directory will be placed inside.
DVC supports several types of (local or) remote locations (protocols):
Type | Description | url format example |
---|---|---|
s3 | Amazon S3 | s3://bucket/data |
azure | Microsoft Azure Blob Storage | azure://container/data |
gdrive | Google Drive | gdrive://<folder-id>/data |
gs | Google Cloud Storage | gs://bucket/data |
ssh | SSH server | ssh://user@example.com/path/to/data |
hdfs | HDFS to file* | hdfs://user@example.com/path/to/data.csv |
http | HTTP to file* | https://example.com/path/to/data.csv |
webdav | WebDav to file* | webdavs://example.com/enpoint/path |
webhdfs | HDFS REST API* | webhdfs://user@example.com/path/to/data.csv |
local | Local path | /path/to/local/data |
If you installed DVC via
pip
and plan to use cloud services as remote storage, you might need to install these optional dependencies:[s3]
,[azure]
,[gdrive]
,[gs]
,[oss]
,[ssh]
. Alternatively, use[all]
to include them all. The command should look like this:pip install "dvc[s3]"
. (This example installsboto3
library along with DVC to support S3 storage.)
* Notes on remote locations:
remote://myremote/path/to/file
notation just means that a DVC
remote myremote
is defined and when DVC is
running. DVC automatically expands this URL into a regular S3, SSH, GS, etc
URL by appending /path/to/file
to the myremote
's configured base path.Another way to understand the dvc get-url
command is as a tool for downloading
data files. On GNU/Linux systems for example, instead of dvc get-url
with
HTTP(S) it's possible to instead use:
$ wget https://example.com/path/to/data.csv
-h
, --help
- prints the usage/help message, and exit.-q
, --quiet
- do not write anything to standard output. Exit with 0 if no
problems arise, otherwise 1.-v
, --verbose
- displays detailed tracing information.