Returns the contents of a tracked file.
def read(path: str,
repo: str = None,
rev: str = None,
remote: str = None,
mode: str = "r",
encoding: str = None)
import dvc.api
modelpkl = dvc.api.read(
'model.pkl',
repo='https://github.com/example/project.git',
mode='rb')
This function wraps dvc.api.open()
, for a simple way to return the complete
contents of a file tracked in a DVC project. The file can be
tracked by DVC (as an output) or by Git.
This is similar to the
dvc get
command in our CLI.
The returned contents can be a string or a bytearray. These are loaded to memory directly (without using any disc space).
The type returned depends on the
mode
used. For more details, please refer to Python'sopen()
built-in, which is used under the hood.
path
(required) - location and file name of the target to read, relative
to the root of the project (repo
).repo
- specifies the location of the DVC project. It can be a URL or a file
system path. Both HTTP and SSH protocols are supported for online Git repos
(e.g. [user@]server:project.git
). Default: The current project is used
(the current working directory tree is walked up to find it).rev
- Git commit (any revision such as
a branch or tag name, or a commit hash). If repo
is not a Git repo, this
option is ignored. Default: HEAD
.remote
- name of the DVC remote to look for
the target data. Default: The
default remote of repo
is used if a
remote
argument is not given. For local projects, the cache is
tied before the default remote.mode
- specifies the mode in which the file is opened. Defaults to "r"
(read). Mirrors the namesake parameter in builtin
open()
.encoding
-
codec used
to decode the file contents to a string. This should only be used in text
mode. Defaults to "utf-8"
. Mirrors the namesake parameter in builtin
open()
.dvc.exceptions.FileMissingError
- file in path
is missing from repo
.dvc.exceptions.PathMissingError
- path
cannot be found in repo
.dvc.exceptions.NoRemoteError
- no remote
is found.Any file tracked in a DVC project (and stored remotely) can be loaded directly in your Python code with this API. For example, let's say that you want to load and unserialize a binary model from a repo on GitHub:
import pickle
import dvc.api
model = pickle.loads(
dvc.api.read(
'model.pkl',
repo='https://github.com/example/project.git'
mode='rb'
)
)
We're using
'rb'
mode here for compatibility withpickle.loads()
.