A Go library and command-line tool to extract URLs from text using regular expressions.
xurls is a Go library and command-line tool designed to extract URLs from plain text using regular expressions. It provides both strict and relaxed matching modes to handle URLs with or without explicit schemes, solving the problem of reliably identifying web addresses in unstructured text data.
Go developers who need to process text data and extract URLs, particularly those working on web scrapers, log analyzers, text processing pipelines, or command-line utilities.
Developers choose xurls because it offers well-tested, production-ready regular expressions for URL extraction with a simple API that integrates directly with Go's standard regexp package, eliminating the need to write and maintain complex URL matching patterns.
Extract urls from text
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Returns compiled *regexp.Regexp objects that integrate seamlessly with Go's standard library, allowing use of all regexp methods like FindString or FindAllIndex.
Provides both strict and relaxed matching, enabling extraction of URLs with or without explicit schemes, as shown in the README examples for handling varied text.
Uses lazy compilation where regular expressions are compiled only once on first use, optimizing performance for repeated calls in text processing tasks.
Includes a standalone xurls utility installable via 'go install', allowing quick URL extraction from terminal pipes, enhancing usability for ad-hoc text analysis.
Focuses on reliable, production-ready regular expressions, reducing the burden on developers to write and maintain complex URL matching logic from scratch.
Regular expressions may not perfectly handle all URL edge cases, such as unconventional schemes or embedded markup, potentially leading to missed matches or false positives.
Limited to Go projects, making it unsuitable for multi-language environments or developers using other programming languages without a similar library.
Only extracts URL strings without parsing them into components like host, path, or query parameters, requiring additional steps for detailed analysis or validation.