Rodrigo Rivera's Journey from Data Science Leader to ML Researcher (PLUS: Karpathy on AI at TSLA, Andrew Ng on ML Ops, and more...)

Mar 26, 2021 2:18 am

HI ,


Next Tuesday at 5pm PT I'll be joining Adithya and Krish in their Clubhouse room to discuss deep learning in production!


Click here to view the event and get notified when it starts.


Onto the newsletter!


Going from Data Science Leader to ML Researcher

image


Rodrigo Rivera is a machine learning researcher at the Advanced Data Analytics in Science and Engineering Group at Skoltech and technical director of Samsung Next. He's previously been in data science and research leadership roles at companies all around the world including Rocket Internet and Philip-Morris.


In this episode, Rodrigo details his journey from selling a company to leading data science teams at top companies to researching machine learning. He also touches on his research interests in time series data and topological data analysis.


Listen to the episode!


Andrej Karpathy on the Visionary AI in Tesla's Autonomous Driving

Peter Abbeel is, among many notable achievements, the co-director of the UC Berkeley AI Research lab and co-founder of covariant.ai. Last week, he released the first episode of his new podcast, "Robot Brains"featuring Andrej Karpathy, director of AI at Tesla.


Andrej talks about how he got to where he is now, his current work at Tesla, and the future of AI and software. There were SO many interesting quotes and insights:


"As long as the data set is improving, there's no real upper bound on the performance of these [deep neural networks]. So most of the engineering is on the dataset, [not writing algorithms]. And primarily it comes from sourcing examples where it's not working well."


I wrote out 11 of my favorite takeaways from the episode in a Twitter thread.


Andrew Ng on ML Ops: From Model-Centric to Data-Centric AI

image 


Yesterday, ML education company deeplearning.ai held an event featuring Andrew Ng talking about the skills he sees as fundamental to the next generation of machine learning practitioners. He argues that teams building both apps and tools should focus more on the data behind their models rather than the code.


This aligns exactly with much of what I've learned on the job and heard from podcast guests. I highly recommend you watch the replay.


Nature Paper Concludes: Applying ML in Medicine is (Really) Hard

"Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 62 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases."


recent open-access Nature paper reviewed all studies that could be found applying machine learning to COVID detection and found that NONE of them were rigorous enough to be used clinically.


So what was wrong with all these studies? Here are the most common reasons:

  • Lack of data provenance
  • Data quality issues
  • High risk of data bias
  • Lack of external validation
  • Lack of reproducibility


Sound familiar?


These are the same issues that ML practitioners face every day. This technology holds obviously-high promise for use in and out of medicine, but only if mistakes like the above can be avoided by following best practices.


Thanks for reading and have a great rest of your week!

Charlie


Comments