Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Go
  3. gojieba

gojieba

MITGov1.4.7

A high-performance Golang port of the Jieba Chinese text segmentation library.

GitHubGitHub
2.6k stars304 forks0 contributors

What is gojieba?

GoJieba is a Golang port of the Jieba library for Chinese text segmentation. It splits Chinese text into meaningful words, which is essential for tasks like search indexing, text analysis, and natural language processing. It solves the problem of processing Chinese, a language without spaces between words, by providing accurate and efficient segmentation.

Target Audience

Go developers building applications that require Chinese text processing, such as search engines, NLP pipelines, content analysis tools, or chatbots targeting Chinese-speaking users.

Value Proposition

Developers choose GoJieba for its high performance (thanks to C++ core), multiple segmentation modes tailored for different scenarios, and ease of integration into Go projects without external dependencies. It's a battle-tested library with a focus on accuracy and speed.

Overview

"结巴"中文分词的Golang版本

Use Cases

Best For

  • Building search engines that need to index Chinese content efficiently
  • Implementing NLP pipelines for Chinese text analysis and preprocessing
  • Adding Chinese language support to Go-based chatbots or virtual assistants
  • Performing keyword extraction from Chinese documents or web content
  • Developing text mining tools for Chinese social media or news articles
  • Creating applications that require part-of-speech tagging for Chinese text

Not Ideal For

  • Projects requiring easy cross-compilation without C++ toolchains
  • Applications needing multilingual text segmentation beyond Chinese
  • Environments where pure Go solutions are preferred to avoid cgo overhead

Pros & Cons

Pros

High Performance Core

Core algorithms are implemented in C++ for speed, with benchmarks linked in the README showing excellent efficiency for Chinese text segmentation.

Flexible Segmentation Modes

Supports maximum probability, HMM-based new word discovery, search engine, and full modes, catering to different use cases like precise analysis or search indexing.

Self-Contained Dependencies

C++ dependencies are bundled in the deps/ directory, requiring no submodule initialization and allowing quick setup with go get, as stated in the README.

Extended NLP Features

Includes keyword extraction, part-of-speech tagging, and tokenization beyond basic segmentation, providing a comprehensive toolkit for Chinese text processing.

Cons

Cross-Compilation Complexity

The README explicitly warns that cross-compilation requires CGO_ENABLED=1 and target C/C++ toolchains, making deployment to different platforms cumbersome and error-prone.

C++ Dependency Overhead

Relies on C++ libraries, which adds build complexity, portability issues, and potential security concerns compared to pure Go solutions.

Chinese-Only Focus

Designed solely for Chinese text segmentation, lacking support for other languages, which limits its utility in multilingual applications.

Frequently Asked Questions

Quick Stats

Stars2,638
Forks304
Contributors0
Open Issues0
Last commit1 month ago
CreatedSince 2015

Tags

#part-of-speech-tagging#cgo#cpp-bindings#natural-language-processing#golang-library#text-segmentation#keyword-extraction#chinese-nlp

Built With

G
Go
c
cgo
C
C++

Included in

Go169.1k
Auto-fetched 1 day ago

Related Projects

gsegse

Go efficient multilingual NLP and text segmentation; support English, Chinese, Japanese and others.

Stars2,831
Forks228
Last commit1 month ago
sentencessentences

A multilingual command line sentence tokenizer in Golang

Stars470
Forks42
Last commit2 years ago
segmentsegment

A Go library for performing Unicode Text Segmentation as described in Unicode Standard Annex #29

Stars89
Forks15
Last commit3 years ago
textcattextcat

A Go package for n-gram based text categorization, with support for utf-8 and raw text

Stars73
Forks11
Last commit1 year ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub