Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Password Cracking
  3. duplicut

duplicut

GPL-3.0C++v2.4

A fast, memory-optimized C tool to remove duplicates from massive wordlists while preserving order, designed for password cracking.

GitHubGitHub
968 stars96 forks0 contributors

What is duplicut?

Duplicut is a high-performance, memory-optimized command-line tool written in C that removes duplicate lines from massive wordlists without sorting them. It solves the specific problem in password cracking where wordlist order must be preserved to keep the most probable passwords at the front for efficient cracking, while still handling files larger than available RAM.

Target Audience

Security researchers, penetration testers, and password cracking enthusiasts who work with large, combined wordlists and need efficient deduplication without losing the strategic order of passwords.

Value Proposition

Developers choose Duplicut because it uniquely combines order preservation with the ability to process wordlists exceeding system memory, using optimized C code and multithreading for speed—addressing a gap left by general-purpose deduplication tools.

Overview

Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)

Use Cases

Best For

  • Deduplicating combined password wordlists for hashcat or John the Ripper
  • Processing multi-gigabyte wordlists on systems with limited RAM
  • Maintaining password probability order in cracking dictionaries
  • Filtering wordlists by line length or printable characters during deduplication
  • Security research workflows involving large text corpus cleaning
  • Pre-processing wordlists for brute-force or dictionary attacks

Not Ideal For

  • General data cleaning tasks where sorting the output is acceptable or desired
  • Processing small files where the overhead of compilation and optimization isn't necessary
  • Scenarios requiring deduplication based on fuzzy matching or semantic similarity
  • Environments where easy installation via package managers is preferred over compiling from source

Pros & Cons

Pros

Massive File Handling

Can process wordlists larger than available RAM by splitting them into virtual chunks, enabling handling of multi-gigabyte files without system limits.

Order Preservation

Deduplicates without altering line order, which is critical for password cracking efficiency where probable passwords must remain at the front.

Memory Optimization

Uses compressed hashmap items and tagged pointers to minimize memory footprint, as detailed in the technical implementation for efficient large-scale processing.

Multithreading Support

Leverages multiple threads for faster processing on modern hardware, speeding up deduplication for performance-intensive workflows.

Additional Filtering

Includes options to filter by line length, ASCII printable characters, and case conversion, adding utility beyond basic deduplication for tailored wordlist preparation.

Cons

Line Size Limitation

The --line-max-size option cannot exceed 4095 characters, which may be restrictive for use cases with very long lines or non-standard data.

Performance Trade-off with Dupfile

Using the --dupfile option to save duplicates slows down processing, as admitted in the README, making it less ideal for time-sensitive operations.

Niche ASCII Focus

Optimized for ASCII text and password cracking; lacks built-in support for non-ASCII encodings or binary files, limiting general-purpose applicability.

Compilation Barrier

Requires compilation from C source, which can be a hurdle for users without development tools or those seeking quick, package-based installation.

Frequently Asked Questions

Quick Stats

Stars968
Forks96
Contributors0
Open Issues7
Last commit6 months ago
CreatedSince 2014

Tags

#cracking#deduplication#wordlist#command-line-tool#hash-cracking#c#password#cybersecurity-tools#password-cracking#c-programming#hashcat#dedupe#wordlist-processing#wordlists#memory-optimization

Built With

C
C++

Included in

Password Cracking913
Auto-fetched 6 hours ago

Related Projects

CUPPCUPP

Common User Passwords Profiler (CUPP)

Stars5,903
Forks1,968
Last commit4 months ago
StringZillaStringZilla

Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖

Stars3,444
Forks123
Last commit1 month ago
MentalistMentalist

Mentalist is a graphical tool for custom wordlist generation. It utilizes common human paradigms for constructing passwords and can output the full wordlist as well as rules compatible with Hashcat and John the Ripper.

Stars1,974
Forks258
Last commit19 days ago
bopscrkbopscrk

Generate smart and powerful wordlists

Stars1,076
Forks123
Last commit1 year ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub