Our CEO Dmitry Petrov was interviewed on the much-beloved Software Engineering Daily podcast! Host Jeff Meyerson kicked off the discussion:
Code is version controlled through Git, the version control system originally built to manage the Linux codebase. For decades, software has been developed using git for version control. More recently, data engineering has become an unavoidable facet of software development. It is reasonable to ask–why are we not version controlling our data?
For the rest of the episode, listen here!
Data Version Control with Dmitry Petrov
Last week, we held a meetup for contributors to DVC! Core maintainer Ruslan Kupriev hosted a get-together for folks who contribute new features, bug fixes, and more to the community. If you missed it, you can watch it on YouTube.
We've released several new videos to our growing YouTube channel- and cool news, we passed 1,000 subscribers! The support has been surprising in the best way possible. We're seeing a lot of repeat commenters and folks from the DVC meetups! It's been so rewarding to get positive feedback from the community and we're planning to build our YouTube presence even more.
Even Skeletor finds joy in this.
We now have 4 tutorials in our MLOps series. In the latest, we cover how to use your own GPU (on-premise or in the cloud) to run GitHub Actions workflows. Check it out and give it a try, the code examples are freely available :)
We also made our first ever "explainer" video to talk through how DVC works in five minutes.
As always, video requests are welcome! Reach out and let us know what topics and tutorials you want to see covered. And we appreciate any likes, shares, and subscribes on our growing YouTube channel.
DVC ambassador Marcel Ribeiro-Dantas has published two of three tutorial blogs in a series on CML! Marcel's use case is especially cool because he's using R, plus some causal modeling related to his work in bioinformatics, with GitHub Actions.
In Part I, Marcel introduces his project and how he uses DVC, CML and GitHub Actions together (with his custom R library).
Continuous Machine Learning - Part I
In Part II, Marcel takes a deeper dive into Docker. He explains how to create a your own Docker image and test it. This case should be helpful for folks who want to include the CML library in their own Docker container.
Continuous Machine Learning - Part II
Kristijan Ivancic of Real Python, a library of online Python tutorials and lessons, created a seriously impressive DVC tutorial (this thing is a beast 🐺- it has a table of contents!)
And, the Real Python podcast discussed their DVC tutorial (plus the joys of version control for data!) on a recent episode.
There's a lot of cool stuff happening out there in the data science world 🌏!
4 reasons why data scientists should version data
Data Versioning for CD4ML
Too many data and technology implementations start with poor or no problem statements and with inadequate time, tools, and subject matter expertise to ensure adequate data quality. Organizations must first start with asking smart questions about big data, investing in dataops, and then using agile methodologies in data science to iterate toward solutions.
Read the rest here:
Thanks everyone, that's a wrap for this month. Be safe, stay in touch, and get ready for pumpkin spice latte season 🎃.