How can I use applied-ml to implement a feature store like the ones at Uber or Airbnb?

Browse the Feature Stores category for articles on platforms like Michelangelo Palette or Zipline, which detail architecture and lessons learned, then cross-reference with open-source tools like Feast or Amundsen linked in the repository.

What are some real-world examples of ML fails from applied-ml?

Check the 'Fails' category for case studies on unsuccessful techniques or deployments, such as those from companies sharing lessons on what didn't work in production environments.

Applied ML vs Awesome ML repositories: which is better for production insights?

Applied ML focuses exclusively on curated industry case studies with practical details, while Awesome ML is a broader list of tools and papers; choose Applied ML for deep dives into real-world implementations and Awesome ML for a wider range of resources.

How to contribute new articles or papers to the applied-ml repository?

Follow the contribution guidelines in the linked CONTRIBUTING.md file, which typically involves submitting pull requests with relevant, high-quality links from reputable companies, ensuring they fit the categories.

Are the links in applied-ml peer-reviewed or just blog posts?

It includes a mix of peer-reviewed papers, tech blogs, and conference slides from companies like Google and Netflix, so users should evaluate each source's rigor based on context and references provided.

Can applied-ml help me design an ML system for a startup?

Yes, by exploring categories like Practices and Team Structure for scalability insights, but note that many examples are from large companies, so adapt lessons to resource-constrained environments.

eugeneyan/applied-ml GitHub repository — Real-World ML & Data Science Papers

What is eugeneyan/applied-ml GitHub repository?

Applied ML is a curated GitHub repository that aggregates papers, articles, and blog posts from major tech companies about their real-world applications of data science and machine learning in production. It helps practitioners understand how ML projects are implemented at scale, covering problem framing, technique selection, scientific rationale, and business outcomes. The collection addresses the gap between academic theory and industrial practice by providing concrete examples.

Target Audience

Data scientists, ML engineers, researchers, and technical leaders who need to learn from documented industry experiences to design, implement, and scale their own ML systems. It's particularly valuable for those transitioning models from research to production.

Value Proposition

It offers a centralized, organized, and vetted source of practical ML knowledge from top companies, saving practitioners time searching for quality case studies. The focus on production details—including failures and ROI—provides insights often missing from academic papers or generic tutorials.

📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.

Use Cases

Best For

Researching how specific ML techniques (e.g., recommendation systems, forecasting) are applied at companies like Netflix or Uber
Understanding end-to-end ML project lifecycles and MLOps practices in industry
Finding references and methodologies for framing new ML problems
Learning about real-world results and ROI of ML implementations
Staying updated on industry trends and best practices across different domains
Gaining insights into data engineering, feature stores, and model management at scale

Not Ideal For

Developers seeking plug-and-play code libraries or ready-to-run tutorials
Teams needing the latest academic papers or pre-print research for cutting-edge theory
Individuals who prefer interactive, video-based learning over reading articles
Projects focused exclusively on a single company's proprietary tech stack without comparative insights

Pros & Cons

Pros

Curated Real-World Insights

Aggregates case studies from top companies like Google and Netflix, focusing on practical implementations beyond academic theory, as evidenced by the detailed breakdowns in categories like Recommendation and Feature Stores.

Broad Industry Coverage

Organized into 30+ categories spanning Data Quality to MLOps, providing diverse examples from e-commerce, social media, finance, and healthcare, as listed in the README's table of contents.

Actionable Production Details

Emphasizes the 'how', 'what', and 'why' of ML in production, including techniques that worked or didn't and ROI metrics, helping users learn from documented successes and failures.

Time-Saving Centralized Resource

Saves practitioners from scouring the internet by vetting and linking to quality articles, blogs, and papers from leading tech companies, as highlighted in the project's philosophy.

Cons

No Original Analysis

The repository is solely a collection of external links without synthesized summaries or critical commentary, limiting its value as a standalone learning tool beyond curation.

Potential Staleness and Link Rot

As a static list, it may not be frequently updated, risking outdated links or missing recent advancements, and lacks mechanisms for community-driven validation or updates.

Variable Article Quality

Relies on external sources that can range from deep technical blogs to marketing pieces, so users must independently assess the credibility and depth of each linked resource.

Limited Interactive Elements

Offers no search functionality, filtering, or discussion forums, making it less suitable for dynamic exploration or peer interaction compared to platforms like GitHub Discussions.

Frequently Asked Questions

What is eugeneyan/applied-ml GitHub repository?

Target Audience

Value Proposition

Use Cases

Best For

Researching how specific ML techniques (e.g., recommendation systems, forecasting) are applied at companies like Netflix or Uber
Understanding end-to-end ML project lifecycles and MLOps practices in industry
Finding references and methodologies for framing new ML problems
Learning about real-world results and ROI of ML implementations
Staying updated on industry trends and best practices across different domains
Gaining insights into data engineering, feature stores, and model management at scale

Not Ideal For

Developers seeking plug-and-play code libraries or ready-to-run tutorials
Teams needing the latest academic papers or pre-print research for cutting-edge theory
Individuals who prefer interactive, video-based learning over reading articles
Projects focused exclusively on a single company's proprietary tech stack without comparative insights

Pros & Cons

Pros

Curated Real-World Insights

Broad Industry Coverage

Organized into 30+ categories spanning Data Quality to MLOps, providing diverse examples from e-commerce, social media, finance, and healthcare, as listed in the README's table of contents.

Actionable Production Details

Emphasizes the 'how', 'what', and 'why' of ML in production, including techniques that worked or didn't and ROI metrics, helping users learn from documented successes and failures.

Time-Saving Centralized Resource

Saves practitioners from scouring the internet by vetting and linking to quality articles, blogs, and papers from leading tech companies, as highlighted in the project's philosophy.

Cons

No Original Analysis

The repository is solely a collection of external links without synthesized summaries or critical commentary, limiting its value as a standalone learning tool beyond curation.

Potential Staleness and Link Rot

As a static list, it may not be frequently updated, risking outdated links or missing recent advancements, and lacks mechanisms for community-driven validation or updates.

Variable Article Quality

Relies on external sources that can range from deep technical blogs to marketing pieces, so users must independently assess the credibility and depth of each linked resource.

Limited Interactive Elements

Offers no search functionality, filtering, or discussion forums, making it less suitable for dynamic exploration or peer interaction compared to platforms like GitHub Discussions.

Frequently Asked Questions

eugeneyan/applied-ml GitHub repository

What is eugeneyan/applied-ml GitHub repository?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?

eugeneyan/applied-ml GitHub repository

What is eugeneyan/applied-ml GitHub repository?

Overview

Use Cases

Best For

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions

Related Projects

Found a gem we're missing?