Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Web Archiving
  3. tikalinkextract

tikalinkextract

HTMLtle-0.0.3

Extracts hyperlinks from files using Apache Tika for batch processing and web archiving workflows.

GitHubGitHub
11 stars0 forks0 contributors

Overview

Tika based link (URL) extractor for httpreserve

Quick Stats

Stars11
Forks0
Contributors0
Open Issues6
Last commit1 year ago
CreatedSince 2017

Tags

#text-extraction#code4lib#link-extraction#batch-processing#digital-preservation#web-archiving#webarchiving#archives

Built With

A
Apache Tika

Included in

Web Archiving2.5k
Auto-fetched 18 hours ago

Related Projects

wikiteamwikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2026, WikiTeam has preserved more than 600,000 wikis.

Stars852
Forks175
Last commit5 months ago
warcdbwarcdb

WarcDB: Web crawl data as SQLite databases.

Stars406
Forks10
Last commit1 year ago
Go Get CrawlGo Get Crawl

Extract web archive data using Wayback Machine and Common Crawl

Stars181
Forks17
Last commit1 year ago
MemGatorMemGator

A Memento Aggregator CLI and Server in Go

Stars80
Forks13
Last commit2 months ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub