Returns the contents of a tracked file.
def read(path: str,
repo: str = None,
rev: str = None,
remote: str = None,
mode: str = "r",
encoding: str = None)import dvc.api
modelpkl = dvc.api.read(
'model.pkl',
repo='https://github.com/example/project.git',
mode='rb')This function wraps dvc.api.open(), for a simple way to return the complete
contents of a file tracked in a DVC project. The file can be
tracked by DVC (as an output) or by Git.
This is similar to the
dvc getcommand in our CLI.
The returned contents can be a string or a bytearray. These are loaded to memory directly (without using any disc space).
The type returned depends on the
modeused. For more details, please refer to Python'sopen()built-in, which is used under the hood.
path (required) - location and file name of the target to read, relative
to the root of the project (repo).repo - specifies the location of the DVC project. It can be a URL or a file
system path. Both HTTP and SSH protocols are supported for online Git repos
(e.g. [user@]server:project.git). Default: The current project is used
(the current working directory tree is walked up to find it).rev - Git commit (any revision such as
a branch or tag name, or a commit hash). If repo is not a Git repo, this
option is ignored. Default: HEAD.remote - name of the DVC remote to look for
the target data. Default: The
default remote of repo is used if a
remote argument is not given. For local projects, the cache is
tied before the default remote.mode - specifies the mode in which the file is opened. Defaults to "r"
(read). Mirrors the namesake parameter in builtin
open().encoding -
codec used
to decode the file contents to a string. This should only be used in text
mode. Defaults to "utf-8". Mirrors the namesake parameter in builtin
open().dvc.exceptions.FileMissingError - file in path is missing from repo.dvc.exceptions.PathMissingError - path cannot be found in repo.dvc.exceptions.NoRemoteError - no remote is found.Any file tracked in a DVC project (and stored remotely) can be loaded directly in your Python code with this API. For example, let's say that you want to load and unserialize a binary model from a repo on GitHub:
import pickle
import dvc.api
model = pickle.loads(
dvc.api.read(
'model.pkl',
repo='https://github.com/example/project.git'
mode='rb'
)
)We're using
'rb'mode here for compatibility withpickle.loads().