Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Data Visualization
  3. LargeVis

LargeVis

Apache-2.0C++

Official implementation of the LargeVis algorithm for visualizing large-scale, high-dimensional data and networks.

GitHubGitHub
711 stars168 forks0 contributors

What is LargeVis?

LargeVis is an open-source tool for visualizing large-scale, high-dimensional data and networks. It implements the LargeVis algorithm to efficiently reduce data dimensionality to 2D or 3D coordinates, making complex datasets interpretable through plots. It solves the problem of exploring massive, multidimensional data common in machine learning, scientific computing, and network analysis.

Target Audience

Data scientists, machine learning researchers, and analysts working with high-dimensional datasets (e.g., embeddings, feature vectors) or network/graph data who need scalable visualization. It is also suitable for developers building visualization pipelines into data platforms.

Value Proposition

Developers choose LargeVis for its official implementation by the original authors, its efficiency in handling large datasets, and its dual support for both feature vectors and network data. Its optimized K-NNG construction and configurable parameters provide fine-grained control over the visualization quality and performance.

Overview

LargeVis is an open-source implementation of the LargeVis model, designed for visualizing large-scale and high-dimensional datasets. It provides an efficient approach to dimensionality reduction, transforming complex data into 2D or 3D representations suitable for exploration and analysis.

Key Features

  • High-Dimensional Data Visualization — Reduces feature vectors (e.g., from machine learning models) into 2D/3D coordinates for plotting.
  • Network Visualization — Embeds graph or network data into low-dimensional spaces while preserving structural properties.
  • Efficient K-NNG Construction — Includes a highly optimized algorithm for building K-nearest neighbor graphs, a critical step in many visualization pipelines.
  • Multi-Platform Support — Offers C++ source code and a Python wrapper compatible with Linux, OS X, and Windows.
  • Configurable Parameters — Allows fine-tuning of learning rates, neighbor counts, sampling, and other algorithmic parameters for optimal results.

Philosophy

LargeVis prioritizes scalability and efficiency, enabling visualization of datasets with millions of points through optimized neighbor graph construction and parallel processing.

Use Cases

Best For

  • Visualizing high-dimensional machine learning embeddings (e.g., word2vec, image features)
  • Exploring large-scale network or graph data with millions of nodes
  • Reducing dimensionality of scientific datasets for publication-ready plots
  • Building custom visualization pipelines for research or analytics platforms
  • Comparing clustering or classification results in low-dimensional space
  • Creating interactive visualizations from complex, multidimensional data

Not Ideal For

  • Projects requiring browser-based interactive visualizations without backend processing
  • Small datasets where lightweight tools like sklearn's t-SNE suffice with less setup
  • Teams without C++ compilation skills or access to external libraries like GSL/BOOST
  • Applications needing real-time or streaming data visualization, as LargeVis is batch-oriented

Pros & Cons

Pros

Official Author Implementation

Direct from the original researchers, ensuring algorithmic accuracy and reliability as per the cited paper, which is a key trust factor.

Efficient Large-Scale Handling

Optimized K-nearest neighbor graph construction enables visualization of datasets with millions of points, addressing scalability issues mentioned in the README.

Dual Data Type Support

Handles both high-dimensional feature vectors and network data, making it versatile for diverse data sources like embeddings or graphs.

Highly Configurable Parameters

Offers fine-tuning with options like learning rates and neighbor counts, allowing users to optimize for specific visualization quality and performance.

Cons

Complex Compilation Setup

Requires installing external libraries (GSL on Linux/OS X, BOOST on Windows) and manual code modifications (e.g., in annoylib.h), which can be error-prone.

No Built-in Visualization

Only outputs 2D/3D coordinates; users must rely on separate scripts like plot.py for actual plotting, adding extra steps to the workflow.

Rigid Input Format Requirements

Networks must be represented with directed edges (two lines per undirected edge), necessitating preprocessing that can be cumbersome for some datasets.

Frequently Asked Questions

Quick Stats

Stars711
Forks168
Contributors0
Open Issues22
Last commit3 years ago
CreatedSince 2016

Tags

#scientific-computing#dimensionality-reduction#c-plus-plus#python#high-dimensional-data#data-visualization#network-embedding#machine-learning#k-nearest-neighbors#graph-layout

Built With

G
GSL
P
Python
B
Boost
C
C++

Included in

Data Visualization4.3k
Auto-fetched 1 day ago

Related Projects

PlotJugglerPlotJuggler

The Time Series Visualization Tool that you deserve.

Stars5,966
Forks794
Last commit1 month ago
Visualization Toolkit (VTK)Visualization Toolkit (VTK)

open-source library for 3d Graphics, image processing and visualization

Stars0
Forks0
Last commit
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub