Utilities for processing Wikipedia dumps in Go
A Go package providing utilities for processing Wikipedia dumps.
Features:
- Supports Wikidata entities JSON dumps.
- Supports Wikimedia Enterprise HTML dumps.
- Decompression and JSON decoding is parallelized for maximum throughput on a single machine.
- Parses into idiomatic Go structs.
- Can download and process a dump at the same time.
- Caches downloaded files locally.
Installation
This is a Go package. You can add it to your project using go get
:
go get gitlab.com/tozd/go/mediawiki
There is also a read-only GitHub mirror available, if you need to fork the project there.
Usage
See full package documentation with examples on pkg.go.dev.