Html Parsing

36 projects

Showing 36 of 36 projects

GoQueryGo

A jQuery-like HTML manipulation and traversal library for Go, built on net/html and cascadia CSS selectors.

#jquery#css-selectors#net-html

Stars15.0k

Forks935

Last commit6 days ago

webmagicJava

A scalable Java framework for building web crawlers, covering downloading, URL management, content extraction, and persistence.

#distributed-systems#crawler#html-parsing

Stars11.7k

Forks4.1k

Last commit7 months ago

trafilaturaPython

A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.

#text-extraction#readability#article-extractor

Stars6.3k

Forks397

Last commit5 days ago

parse5TypeScript

A fast, spec-compliant HTML parsing and serialization toolset for Node.js.

#dom-manipulation#whatwg-html#html5

A sensible XML and HTML parsing library for iOS and macOS, offering a modern DOM-style API with XPath and CSS query support.

#ios#libxml2#objective-c

Stars2.6k

Forks195

Last commit6 years ago

scraperRust

A Rust library for parsing HTML and querying elements using CSS selectors.

#dom-manipulation#hacktoberfest#css-selectors

Stars2.4k

Forks127

Last commit3 days ago

HTML to MarkdownPHP

A PHP library that converts HTML to Markdown with configurable options for clean, editable output.

#dom-manipulation#hacktoberfest#composer

Stars1.9k

Forks215

Last commit1 month ago

rvest <img class="emoji" alt="heart" src="https://cdn.jsdelivr.net/gh/qinwf/awesome-R@3c66da6e291bcc0520b1649125b0bed750896a9a/heart.png" height="20" align="absmiddle" width="20">R

A tidyverse package for web scraping in R, inspired by Beautiful Soup and designed for data extraction workflows.

#r-package#r-language#html-parsing

A Swift library to build NSAttributedString from HTML-like text with clickable tags, links, hashtags, and mentions.

#ios#clickable-links#textkit

Stars1.5k

Forks156

Last commit2 years ago

WombatRuby

A lightweight Ruby web crawler and scraper with an elegant DSL for extracting structured data from web pages.

#dsl#crawler#ruby-gem

Stars1.4k

Forks128

Last commit3 months ago

FuziSwift

A fast and lightweight XML/HTML parser for Swift with XPath and CSS query support.

#ios#libxml2#css-selectors

Stars1.1k

Forks166

Last commit2 years ago

MetaInspectorRuby

A Ruby gem for web scraping that extracts titles, meta tags, links, images, and structured data from URLs.

#link-extraction#nokogiri#metadata-extraction

Stars1.1k

Forks166

Last commit2 months ago

select.rsRust

A Rust library for extracting structured data from HTML documents, designed for web scraping tasks.

#css-selectors#dom-traversal#html-parsing

Stars1.0k

Forks68

Last commit1 year ago

OxC

A fast Ruby XML parser, object marshaller, and SAX parser designed as a high-performance alternative to Nokogiri and Marshal.

#object-marshalling#sax-parser#ruby-gem

Stars911

Forks80

Last commit10 days ago

Draft.js: Export ContentState to HTMLJavaScript

A monorepo of utilities for importing and exporting DraftJS ContentState to and from HTML and Markdown.

#draftjs#content-management#html-parsing

Stars881

Forks229

Last commit3 years ago

F# DataF#

F# library providing type providers and helpers for accessing CSV, JSON, XML, HTML, and WorldBank data.

#typeprovider#type-providers#strongly-typed

F# type providers and utilities for accessing structured data formats (CSV, HTML, JSON, XML) and WorldBank data.

#typeprovider#type-providers#http

A command-line tool to extract data from HTML/XML pages and JSON APIs using CSS, XPath, XQuery, JSONiq, and pattern matching.

#rest#css-selectors#http

Stars840

Forks46

Last commit1 year ago

htmlqueryGo

A Go package for querying HTML documents using XPath expressions with built-in caching for performance.

#caching#xpath-selector#html-parsing

Stars784

Forks80

Last commit21 days ago

xpathGo

A Go package for querying XML, HTML, and JSON documents using XPath expressions.

#xpath-query#selects-descendants#document-query

Stars743

Forks98

Last commit3 days ago

HickoryClojure

A Clojure/ClojureScript library that parses HTML into Clojure data structures for analysis, transformation, and serialization.

#dom-manipulation#hiccup#clojurescript

Stars678

Forks55

Last commit3 months ago

ErikSwift

A Swift headless browser based on WebKit for functional testing and webpage manipulation via JavaScript.

#dom-manipulation#ios#html-parsing

Stars613

Forks47

Last commit4 years ago

shivaRust

A Rust library for parsing and generating documents across 13+ formats using a unified Common Document Model.

#http-server#document-generator#common-document-model

Stars436

Forks23

Last commit1 year ago

readabilityElixir

An Elixir library for extracting and curating the primary readable content from webpages.

#readability#hacktoberfest#elixir

Stars283

Forks64

Last commit8 months ago

Internet ToolsPascal

XPath/XQuery 3.1 interpreter for Pascal with HTTP/S, JSON, HTML, and web scraping capabilities.

#library#interpreter#pascal

Stars135

Forks38

Last commit22 days ago

ngx-dynamic-hooksTypeScript

Automatically insert live Angular components into dynamic strings or HTML structures using selectors or custom patterns.

#aot-compilation#server-side-rendering#html-parsing

Stars126

Forks8

Last commit1 year ago

emlElixir

A library for writing, parsing, and manipulating HTML markup as first-class Elixir data structures.

#functional-programming#elixir#code-as-data

Stars114

Forks12

Last commit2 years ago

WC-LoaderJavaScript

A webpack loader that enables seamless integration of web components (Polymer, x-tags) with hot code reload support.

#web-components#webcomponents-webpack-loader#wc-loader

Stars99

Forks14

Last commit9 years ago

lxml-stubsPython

External type annotations (stubs) for the lxml Python package, enabling static type checking.

#lxml#static-typing#mypy-stubs

Stars51

Forks30

Last commit2 months ago

modest_exElixir

An Elixir library for pipeable HTML transformations using CSS selectors, powered by the Lexbor C library.

#dom-manipulation#functional-programming#css-selector

Stars34

Forks5

Last commit7 months ago

phoenix_html_sanitizerElixir

A Phoenix library for sanitizing HTML user input by stripping or allowing specific tags and attributes.

#user-input#elixir#web-security

Stars27

Forks12

Last commit1 year ago

sax-tsJavaScript

A SAX-style parser for XML and HTML written in TypeScript, designed for browser compatibility and large data streams.

#event-driven#sax-parser#sax

Stars20

Forks1

Last commit2 years ago

myhtmlexHTML

Elixir/Erlang bindings for the myhtml C library, providing fast and safe HTML parsing.

#safe-html-parsing#c-bindings#beam-ecosystem

Stars15

Forks4

Last commit6 years ago

tidy_exC

Elixir bindings for tidy-html5 to correct and clean up HTML content by fixing markup errors.

#elixir#markup-correction#c-node

Stars9

Forks2

Last commit2 years ago

htreeGo

A Go package for traversing, navigating, filtering, and processing trees of html.Node objects.

#html-nodes#go-package#dom-traversal

Stars5

Forks0

Last commit7 months ago

universal_htmlDart

A cross-platform implementation of dart:html for parsing, manipulating, and querying HTML/XML documents across browsers, mobile, desktop, and server-side environments.

#dom-manipulation#dart#html-parsing

Stars0

Forks0

Last commit2 years ago

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub