DeeVee avoids the summer sun at Mount Rainier National Park.
Welcome to our August roundup of cool news, new releases, and recommended reading in the MLOps world!
At the beginning of July, we went live with a new project: Continuous Machine Learning, or CML for short. If you hadven't heard, CML is an open-source toolkit for adapting popular continuous integration systems like GitHub Actions and GitLab CI for machine learning and data science. This release marks a new stage for our organization: while CML can work with DVC, and both are built around Git, CML is designed for standalone use. That means we're supporting TWO projects now!
Luckily, we received plenty of encouraging and helpful feedback following the CML release. CML was on the front page of Hacker News for most of release day! We also got covered on Heise, a popular German IT news source. I (Elle, a proud part of the CML team!) also gave a talk presenting our approach as part of the MLOps World meeting, which is now available for online viewing.
Of course, we're fielding lots of questions too! We've compiled some of the most common questions (and their answers!) in our last Community Gems post, and CML developer David G. Ortega has written a tutorial for a much-asked-for use case: doing continuous integration with on-demand GPUs.
If you have comments, questions, or feature requests about CML, we really want to hear from you. A few ways to be in touch:
Last week, we had another meetup! DVC Ambassador Marcel kicked us off with a short talk about how he's using DVC as part of his causal modeling approach to bioinformatics. It's cool stuff. Then, I talked a bit about CML and did some live-coding. The beauty of live-coding is getting to answer questions in real-time, and if you're totally new to the idea of continuous integration (or want to understand how CML works with GitHub Actions/GitLab CI) seeing a project in-action is one of the best ways to learn.
You can watch a recording of the meetup online now (it's lightly edited to remove some pesky Zoom trolls), and join our Meetup group to get updates for the next one. In future meetups, we'd love to support community members sharing their work, so get in touch if you'd like to present.
We're starting up some new YouTube features! If you haven't seen our channel, check it out and consider subscribing for hands-on tutorials and demos. Our first video introduced continuous integration and GitHub Actions, and the second showed how to use DVC and free Google Drive storage to add external data storage to a GitHub project.
In the coming weeks, we'll be covering:
We're huge fans of a recent Python Bytes episode featuring Ines Montani, founder of Explosion and one of the makers of the incredible SpaCy library for NLP (seriously, I have the highest recommendations for SpaCy).
My @PythonBytes episode is out now!
— Ines Montani 〰️ (@_inesmontani) July 23, 2020
🎙️ Listen here: https://t.co/fHLF2hR4cM
My picks of the week are:
🐙 TextAttack by @jxmorris12: https://t.co/jySYrtzzp8
🦉 Data Version Control (DVC) @DVCorg: https://t.co/3610F6kv8v
🐍 Built-in generic types in 3.9
Ines' episode discussed DVC, and DVC is going to be integrated with SpaCy in their 3.0 release. SpaCy + DVC is going to be a powerhouse and we can't wait.
Another cool software project:
Casper da Costa-Luis, DVC contributor and
creator of the popular tqdm library, has
published a tab-completion script generator for Python applications! shtab
, as
it's called, was originally designed for DVC, but Casper developed it into a
generic tool that can be used for virtually any Python CLI application. Check
out shtab
on GitHub and read the release
blog.
Our friends at DAGsHub have released a script to help DVC users upgrade their pipelines to the new DVC 1.0 format! Says Simon, a DAGsHub engineer, in his tutorial:
In this post, I'll walk you through the process of migrating your existing project from DVC ≤ 0.94 to DVC 1.X using a single automated script, and then demonstrate a way to check that your migration was successful.
Read the blog and get migrating (but don't worry if you can't; DVC 1.0 is
backwards compatible).
Automatically migrate your project from DVC≤ 0.94 to DVC 1.x
Here are some of our favorite blogs from around the internet 🌏.
Using Continuous Machine Learning to Run Your ML Pipeline
Ryan Gross, a VP at Pariveda Solutions, blogged about the future of data governance and the lessons from DevOps that might save the day. Honestly, you should probably start reading for this cover image alone.
DataOps is accurately depicted as a badass flaming eagle. Check out the blog here:
And, there's a noteworthy counterpoint by Michael Kaminsky. Read them both!
Thanks everyone, that's it for this month. We hope you're staying safe and making cool things!