Industrial ML, Building Tools for Data and Model Monitoring, and Ensuring Reproducibility

Feb 19, 2021 4:01 am

An exciting announcement: I'm collaborating with the amazing ML Ops Community to organize a paper reading group focused on ML engineering and MLOps. We'll be reading case studies of applied ML systems, operations best practices, and more!


I've put together a Notion page that has more details here.


And if you'd like to participate, you can fill out the survey here.


In this week's edition:

  • New Podcast 🎙 with Evidently AI on Data and Model Monitoring
  • Using Evidently for Production Model Analytics
  • 12 Factors of Reproducible Machine Learning in Production
  • Lecture Notes: Intro to ML System Design
  • The Difference Between Data Science, ML, and AI


New Podcast 🎙 with Evidently AI on Data and Model Monitoring

image


Elena Samuylova and Emeli Dral are the co-founders of Evidently AI, where they build open source tools to analyze and monitor machine learning models. Elena was previously the head of the startup ecosystem at Yandex, director of business development at their data factory and chief product officer at Mechanica AI. Emeli was previously a data scientist at Yandex, chief data scientist at the data factory and Mechanica AI in addition to teaching machine learning both online and at multiple universities.


In this episode, they discuss what they've learned applying ML across a wide variety of industries, including manufacturing and industrial process improvement, and then go into why they've started building tools for data and ML monitoring as well as how teams can do it better.


Click here to listen to the episode, or find it in your podcast player of choice: https://www.mlengineered.com/listen


If you prefer to read instead of listen, I've also written out the major takeaways from the episode on my blog, which you can find here.


Using Evidently for Production Model Analytics

image


Elena and Emeli also just released an incredible tutorial using Evidently to monitor data drift's effect on model performance on a well-known public dataset (Kaggle Bike Sharing Demand). It really shows off the dashboards they've built and is sure to give you some ideas of how to better monitor your own models.


If you haven't come across their blog before, you should totally also check out their series on ML Monitoring.


12 Factors of Reproducible Machine Learning in Production

"Can you reproduce the results you’ve had two months ago, now, fast? Can you compare today’s results against historic one’s? Can you give provenance over your data throughout training? And what happens if your model goes stale?"


Forthcoming podcast guest Benedikt Koller wrote a blog post inspired by the 12 Factor App laying out the dozen things needed to be able to run reproducible ML models in production.


Reading it gave me a ton of ideas for how a current system I'm working on could be improved, one of which I started implementing just this week. I highly recommend you check it out!


Lecture Notes: Intro to ML System Design

image


In a previous newsletter, I featured Chip Huyen's new Stanford course on ML System Design. Since then, the first few lecture notes and slides have been posted online. As expected, they are amazing!


Lecture 2 went over the goals of ML system design, the data science process, and how to scope a project. It also had a list at the bottom of a ton of really good case studies of production ML systems.


I've been following along with the course and will post my notes and summary at the end of it. Hopefully in the future the lecture videos will be available as well.


The Difference Between Data Science, ML, and AI

"So in this post, I’m proposing an oversimplified definition of the difference between the three fields:
Data science produces insights
Machine learning produces predictions
Artificial intelligence produces actions"


To be honest, before reading this post, I didn't have a good way to differentiate between the three fields. The author, David Robinson, does a really nice job of clearly explaining where the lines are and even gives a case study at the end.


Hat-tip to @svpino on Twitter for surfacing this article to me.


Thanks for reading and I hope you have a wonderful rest of your week!

Charlie

Comments