How do I use REVISE with my own image dataset?

Create a custom dataloader based on the template in datasets.py, test it with tester_script.py, then run main_measure.py with your dataset name. The README provides detailed steps for integration and measurement collection.

Is REVISE better than IBM's AI Fairness 360 for visual data?

REVISE is specialized for visual datasets with multi-axis analysis like object and geography biases, while AI Fairness 360 is broader for general ML fairness. For image-specific audits, REVISE offers deeper insights into visual patterns and representations.

Can REVISE detect biases in real-time during model inference?

No, REVISE is designed for offline dataset analysis before training, not for real-time monitoring. It focuses on auditing image collections to identify biases that could affect model performance.

What does it cost to use Amazon Rekognition with REVISE?

Costs depend on Amazon's pay-per-use pricing for facial detection, so expenses scale with the number of images processed. The README notes this charge and suggests free alternatives like cv2 by changing the FACE_DETECT variable.

How to fix the PROJ_LIB error when importing basemap?

Set the PROJ_LIB environment variable to the location of the epsg file, as described in the Potential Environment Issues section. You may need to download the file manually from the provided GitHub link and adjust the path accordingly.

Does REVISE work without Jupyter notebooks?

Core measurements can be run via Python scripts like main_measure.py, but analysis and visualization require Jupyter notebooks from the analysis_notebooks folder, limiting automation for script-only workflows.

Open-Awesome

REVISE: REvealing VIsual biaSEs

MITJupyter Notebook

A tool for automatically detecting and suggesting mitigation for object, attribute, and geography-based biases in visual datasets.

GitHub

110 stars17 forks0 contributors

What is REVISE: REvealing VIsual biaSEs?

REVISE is a research tool for measuring and mitigating bias in visual datasets. It automatically detects potential biases along object-based, attribute-based, and geography-based patterns, providing actionable insights to improve dataset fairness. The tool helps identify imbalances in representation, attribute distribution, and geographic coverage that could lead to skewed computer vision models.

Target Audience

Computer vision researchers, data scientists, and AI ethics practitioners who need to audit visual datasets for fairness before model training. It is particularly useful for teams building or curating large-scale image datasets for machine learning.

Value Proposition

REVISE offers a comprehensive, multi-axis approach to bias detection that goes beyond simple demographic checks. Its automated measurement pipelines and visual summaries enable systematic dataset auditing, reducing manual effort and providing clear pathways for mitigation.

Overview

REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets --- https://arxiv.org/abs/2004.07999

Use Cases

Best For

Auditing image datasets for representation biases before model training
Identifying geographic imbalances in crowdsourced visual data
Analyzing gender or attribute distributions in person-centric datasets
Research on fairness and ethics in computer vision
Comparing bias metrics across different dataset versions
Educational use in teaching AI ethics and dataset curation

Not Ideal For

Projects requiring real-time bias monitoring during model inference
Teams with tight budgets avoiding cloud service costs
Environments where manual Jupyter notebook interaction is not feasible for automation

Pros & Cons

Pros

Multi-Axis Bias Detection

Analyzes biases along object, attribute, and geography axes, providing a holistic view as emphasized in the philosophy section for comprehensive dataset auditing.

Automated Visual Reporting

Generates summary PDFs with visualizations and interpretations for each bias axis, reducing manual effort in bias analysis as shown in the sample_summary_pdfs folder.

Flexible Integration Options

Supports custom datasets via a template dataloader and allows switching between facial detection backends, including free tools like cv2, as detailed in the setup instructions.

Research-Backed Methodology

Based on peer-reviewed publications from ECCV and IJCV, ensuring rigorous and validated bias measurement techniques for reliable results.

Cons

Complex Setup and Dependencies

Requires conda environment creation, model downloads, and troubleshooting for issues like PROJ_LIB errors, as noted in the Potential Environment Issues section, increasing initial overhead.

Costly Proprietary Reliance

Recommends Amazon Rekognition for facial detection, which incurs charges and introduces vendor lock-in, though free alternatives are available but may require code changes.

Manual Analysis Workflow

Involves running Jupyter notebooks for exploring biases, as per the steps to perform analysis, which may not be fully automated for continuous integration or production pipelines.

Frequently Asked Questions

Related Projects

PyTorch Lightning

Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.

Stars31,246

Forks3,767

Last commit4 days ago

Label Studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Stars27,907

Forks3,637

Last commit22 hours ago

Great Expectations

Always know what to expect from your data.

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Stars4,764

Forks866

Last commit4 months ago

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub