A Scala/Spark library for measuring fairness and mitigating bias in large-scale machine learning workflows.
LiFT (LinkedIn Fairness Toolkit) is a Scala/Spark library for measuring and mitigating fairness and bias in machine learning workflows. It provides tools to evaluate biases in training data, assess model performance across demographic subgroups, and apply post-processing techniques to promote equality of opportunity. The library is designed to handle large-scale datasets efficiently within existing ML pipelines.
Machine learning engineers, data scientists, and researchers working on large-scale ML systems who need to audit and improve fairness, particularly in enterprise or production environments.
LiFT offers a production-ready, scalable solution for fairness analysis integrated directly with Apache Spark. Unlike many research-oriented tools, it provides configuration-driven Spark jobs and APIs that fit seamlessly into existing ML workflows, with robust support for both measurement and mitigation.
The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages Apache Spark for distributed computation on large datasets with strategic caching, enabling efficient fairness analysis on big data as highlighted in the features section.
Provides ready-to-use Spark jobs for scheduled deployments with support for custom metrics via UDFs, allowing plug-and-play integration into existing ML pipelines without extensive coding.
Exposes APIs at various levels for building custom jobs or exploratory analysis in notebooks, enabling both high-level usage and deep customization as described in the usage examples.
Implements post-processing methods like equality of opportunity for rankings that can be applied without retraining models, offering a pragmatic approach to bias reduction in production systems.
The provided Spark jobs assume uniform data formats and perform no preprocessing, requiring users to handle data inconsistencies manually, as admitted in the usage section with warnings about join key assumptions.
Tied to specific Scala and Spark versions (e.g., Scala 2.11.8 and Spark 2.3.0 recommended), limiting flexibility for teams using other ML stacks or newer versions, as seen in the build instructions.
Focuses primarily on post-processing for equality of opportunity in rankings, lacking support for pre-processing or in-processing bias mitigation methods common in fairness literature, which may not cover all use cases.