A light libxml wrapper for Go

Gokogiri

LibXML bindings for the Go programming language.

By Zhigang Chen and Hampton Catlin

This is a major rewrite from v0 in the following places:

  • Separation of XML and HTML
  • Put more burden of memory allocation/deallocation on Go
  • Fragment parsing -- no more deep-copy
  • Serialization
  • Some API adjustment

Installation

# Linux
sudo apt-get install libxml2-dev
# Mac
brew install libxml2

go get github.com/moovweb/gokogiri

Running tests

go test github.com/moovweb/gokogiri/...

Basic example

package main

import (
  "net/http"
  "io/ioutil"
  "github.com/moovweb/gokogiri"
)

func main() {
  // fetch and read a web page
  resp, _ := http.Get("http://www.google.com")
  page, _ := ioutil.ReadAll(resp.Body)

  // parse the web page
  doc, _ := gokogiri.ParseHtml(page)

  // perform operations on the parsed page -- consult the tests for examples

  // important -- don't forget to free the resources when you're done!
  doc.Free()
}
Owner
Moovweb
Moovweb XDN delivers unparalleled site speeds via progressive web apps with server-side rendering, auto AMP creation, and CDN-as-code.
Moovweb
Comments
  • memory leak under heavy load

    memory leak under heavy load

    i am parsing around 200-300 3kb html snippets per second. which in itself proves how cool your lib is ;) sadly it's leaking memory at around 1-2 mb/min. not constantly though, so i am guessing it could be some kind of error while parsing.

    if i can help you to fix this let me know

    thx,

    paul

  • can't seem to easily build on OS X

    can't seem to easily build on OS X

    The README is obviously outdated since the makefile is gone, but I still didn't manage to build/install on Mountain Lion:

    https://gist.github.com/4383203

    I installed libxml2 from homebrew, updated the xpath import statement to reflect the path of the brew files. Tried to build and go some error.

    An updated readme would be very appreciated since this lib seems very useful.

    Thanks,

    • Matt
  • change import paths; fix `go get`

    change import paths; fix `go get`

    Currently this package refers to imports as a path gokogiri/... that doesn't exist. This makes it imposible to install as any other path (ie: "github.com/moovweb/gokogiri"). I think this can be resolved by making all local references to gokogiri components as relative imports.

    ie: in gokogiri/html/document.go the reference to util should be "../util"

    ~$ pkg-config --cflags libxml-2.0 libxml-2.0
    -I/usr/include/libxml2  
    ~$ go get github.com/moovweb/gokogiri
    package gokogiri/html: unrecognized import path "gokogiri/html"
    package gokogiri/xml: unrecognized import path "gokogiri/xml"
    $ go get github.com/moovweb/gokogiri/html
    package gokogiri/util: unrecognized import path "gokogiri/util"
    package gokogiri/xml: unrecognized import path "gokogiri/xml"
    
  • TestDisableOutputEscaping fails in Darwin

    TestDisableOutputEscaping fails in Darwin

    Not sure why, seems to work fine on other platforms (windows and linux included).

    Below is the output:

    gokogiri/xml $ go test .
    
    Testing: Basic Parsing [....]
    
    All (4) tests passed!
    
    Testing: Buffered Parsing [....]
    
    All (4) tests passed!
    --- FAIL: TestDisableOutputEscaping (0.00 seconds)
        node_test.go:364: TestDisableOutputEscaping (escaping disabled) Expected: <br/>
            Actual: &lt;br/&gt;
    FAIL
    FAIL    github.com/moovweb/gokogiri/xml 0.134s
    
  • clang: error: argument unused during compilation: '-fno-eliminate-unused-debug-types'

    clang: error: argument unused during compilation: '-fno-eliminate-unused-debug-types'

    I seem to get this both when trying to use Gokogiri and when I tried to go get gokogiri again. :S Hope this isn't just me being stupid haha.

    Cheers

    George

  • Better XPath support

    Better XPath support

    This pull request addresses both #42 and #39.

    Node.EvalXPath handles evaluating an XPath that returns a string or number instead of a nodeset. Unhandled return types are now coerced into a string.

    Node.SearchWithVariables and Node.EvalXPath both take a VariableScope that allows XPath expressions to resolve any variable names. This is specifically needed for my XSLT processor and may be useful in other contexts.

  • Inject HTML into a node

    Inject HTML into a node

    There should be a way to inject HTML into a node. For instance,

    node.String() // ""
    node.Inject("<div />")
    node.String() // "<div />"
    

    And, furthermore, this new div has to be properly doc'd.

    node.FirstElement().Doc() == node.Doc()
    // and ensure this happens in C-world too!
    
  • Encoding support

    Encoding support

    Gokogiri doesn't seem to support the encoding of some pages, although http://www.xmlsoft.org/encoding.html claims libxml will use iconv on unix systems. Here's a small test:

    package main
    
    import (
        "fmt"
        "io/ioutil"
        "net/http"
        "github.com/moovweb/gokogiri"
    )
    
    func get(url string) []byte {
        r, err := http.Get(url)
        if err != nil { panic(err) }
        body, err := ioutil.ReadAll(r.Body)
        if err != nil { panic(err) }
        return body
    }
    
    func main() {
        buf := get("http://bbs.chinaunix.net/thread-4080291-1-1.html")
        doc, err := gokogiri.ParseHtml(buf)
        if err != nil { panic(err) }
        fmt.Println("MetaEncoding:", doc.MetaEncoding())
        title, _ := doc.Search("//title")
        fmt.Println(title[0].Content())
    }
    

    Output:

    ~/gtest > go run gokogiritest.go
    MetaEncoding: gbk
    AIXÉÏlibxml2²»֧³Ögb2312±àÂë-AIX-ChinaUnix.net
    ~/gtest > go run gokogiritest.go | iconv -f gbk
    MetaEncoding: gbk
    AIX上libxml2不支持gb2312编码-AIX-ChinaUnix.net
    

    Any idea why it's not working? Did I misunderstand the libxml page?

  • cannot build, test, or install gokogiri

    cannot build, test, or install gokogiri

    I've tried several avenues, including what's detailed in the README. Here's the steps I took:

    hobbsc@ea:~/incoming/gokogiri 1014:0% make test
    make: *** No rule to make target `test'.  Stop.
    
    hobbsc@ea:~/incoming/gokogiri 1015:2% go build
    gokogiri.go:4:2: import "gokogiri/html": cannot find package
    gokogiri.go:5:2: import "gokogiri/xml": cannot find package
    
    hobbsc@ea:~/incoming/gokogiri 1016:1% go get github.com/moovweb/gokogiri
    # pkg-config --cflags libxml-2.0 libxml-2.0
    exec: "pkg-config": executable file not found in $PATH
    
    hobbsc@ea:~/incoming/gokogiri 1017:2% make install
    make: *** No rule to make target `install'.  Stop.
    
    hobbsc@ea:~/incoming/gokogiri 1018:2% go test
    gokogiri.go:4:2: import "gokogiri/html": cannot find package
    gokogiri.go:5:2: import "gokogiri/xml": cannot find package
    
  • Make gokogiri compile with go 1.6

    Make gokogiri compile with go 1.6

    In Go 1.6 it is basically forbidden to pass a Go pointer to Go functions that are used as callbacks from C.

    Fix this by funneling those pointers through global variables.

    Fixes #92

  • Node.Search() uses the wrong XPath context

    Node.Search() uses the wrong XPath context

    Node.Search() should create a new XPath context using the current node instead of using the document context to allow searching from the current node.

  • Get error when start Docker container

    Get error when start Docker container

    Intall libxml2 in Dockerfile

    RUN apt-get update && apt-get install -y build-essential libxml2 libxml2-dev libxmlsec1-dev

    When start container, getting error: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory. How can i fix it?

  • identifier

    identifier "_Ctype_struct__xmlDoc" may conflict with identifiers generated by cgo

    I'm trying to install gokogiri on a macOS 10.14.4 (Mojave) and Go 1.12.3. I've installed libxml2 using brew. Installing gokogiri with:

    LDFLAGS="-L/usr/local/opt/libxml2/lib" CPPFLAGS="-I/usr/local/opt/libxml2/include" PKG_CONFIG_PATH="/usr/local/opt/libxml2/lib/pkgconfig" go get github.com/moovweb/gokogiri

    Outputs the error:

    # github.com/moovweb/gokogiri/xml
    ../../github.com/moovweb/gokogiri/xml/document.go:330:19: identifier "_Ctype_struct__xmlDoc" may conflict with identifiers generated by cgo
    

    How may I compile gokogiri?

  • pkg-config: exec:

    pkg-config: exec: "pkg-config": executable file not found in %PATH%

    go get github.com/moovweb/gokogiri

    pkg-config --cflags libxml-2.0

    pkg-config: exec: "pkg-config": executable file not found in %PATH%

    pkg-config --cflags libxml-2.0 libxml-2.0

    pkg-config: exec: "pkg-config": executable file not found in %PATH%

    Is it normal for a go get statement to require a dependency in the environment path?

  • build constraints exclude all Go files in /moovweb/gokogiri/help, failed to build with arch=386

    build constraints exclude all Go files in /moovweb/gokogiri/help, failed to build with arch=386

    $ GOOS=windows GOARCH=386 go build -o anan
    go build github.com/moovweb/gokogiri/help: build constraints exclude all Go files in /home/javier/go/src/github.com/moovweb/gokogiri/help
    go build github.com/moovweb/gokogiri/xpath: build constraints exclude all Go files in /home/javier/go/src/github.com/moovweb/gokogiri/xpath
    

    any thoughts?

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types.

Package set is a small wrapper around the official reflect package that facilitates loose type conversion and assignment into native Go types. Read th

Dec 27, 2022
A light wrapper around R.

arr A light wrapper around R. Install go get github.com/devOpifex/arr or go install github.com/devOpifex/arr@latest Help arr -h Completion See documen

Dec 11, 2021
Go-video-preview-ffmpeg-wrapper - A simple helper wrapper to generate small webm video previews using ffmpeg, useful for web previews.

Go-video-preview-ffmpeg-wrapper A simple helper wrapper to generate small webm video previews using ffmpeg, useful for web previews. Getting Started u

Jan 5, 2022
Via Cep Wrapper is a api wrapper used to find address by zipcode (Brazil only)
Via Cep Wrapper is a api wrapper used to find address by zipcode (Brazil only)

Viacep Wrapper Viacep Wrapper is an API wrapper built with Golang used to find address by zipcode (Brazil only). This project was developed for study

Jan 25, 2022
Light weight, extensible configuration management library for Go. Built in support for JSON, TOML, YAML, env, command line, file, S3 etc. Alternative to viper.
Light weight, extensible configuration management library for Go. Built in support for JSON, TOML, YAML, env, command line, file, S3 etc. Alternative to viper.

koanf (pronounced conf; a play on the Japanese Koan) is a library for reading configuration from different sources in different formats in Go applicat

Jan 8, 2023
A simple and light excel file reader to read a standard excel as a table faster | 一个轻量级的Excel数据读取库,用一种更`关系数据库`的方式解析Excel。

Intro | 简介 Expect to create a reader library to read relate-db-like excel easily. Just like read a config. This library can read all xlsx file correct

Dec 19, 2022
A light package for generating and comparing password hashing with argon2 in Go

argon2-hashing argon2-hashing provides a light wrapper around Go's argon2 package. Argon2 was the winner of the Password Hashing Competition that make

Sep 27, 2022
Muxie is a modern, fast and light HTTP multiplexer for Go. Fully compatible with the http.Handler interface. Written for everyone.
Muxie is a modern, fast and light HTTP multiplexer for Go. Fully compatible with the http.Handler interface. Written for everyone.

Muxie ?? ?? ?? ?? ?? ?? Fast trie implementation designed from scratch specifically for HTTP A small and light router for creating sturdy backend Go a

Dec 8, 2022
🦄🌈 YoyoGo is a simple, light and fast , dependency injection based micro-service framework written in Go.
🦄🌈 YoyoGo is a simple, light and fast , dependency injection based micro-service framework written in Go.

???? YoyoGo is a simple, light and fast , dependency injection based micro-service framework written in Go. Support Nacos ,Consoul ,Etcd ,Eureka ,kubernetes.

Jan 4, 2023
topolvm operator provide kubernetes local storage which is light weight and high performance

Topolvm-Operator Topolvm-Operator is an open source cloud-native local storage orchestrator for Kubernetes, which bases on topolvm. Supported environm

Nov 24, 2022
A Light Golang RPC Framework

Glory Glory框架为一款Go语言的轻量级RPC框架,您可以使用它快速开发你的服务实例。如果您希望在微服务场景下使用gRPC进行网络通信,那么Glory会使您的开发、运维工作量减轻不少。 欢迎访问Glory主页: glory-go.github.io 示例仓库:github.com/glory

Oct 28, 2022
EasyTCP is a light-weight and less painful TCP server framework written in Go (Golang) based on the standard net package.

EasyTCP is a light-weight TCP framework written in Go (Golang), built with message router. EasyTCP helps you build a TCP server easily fast and less painful.

Jan 7, 2023
Easy to use, light enough, good performance Golang library
 Easy to use, light enough, good performance Golang library

指令使用 特性 简单易用、足够轻量,避免过多的外部依赖,最低兼容 Window 7 等老系统 快速上手 安装 $ go get github.com/sohaha/zlsgo HTTP 服务 // main.go

Dec 29, 2022
Light JSON API for storing user ratings of NASA's Astronomy Picture of the Day (APOD).
Light JSON API for storing user ratings of NASA's Astronomy Picture of the Day (APOD).

nasa-apod-api-go Light JSON API for storing user ratings of NASA's Astronomy Picture of the Day (APOD). To run this server you must have access to a N

Oct 26, 2021
Fastest light-weight Discord server joiner written in GO
Fastest light-weight Discord server joiner written in GO

DiscordInviterGO! Fastest light-weight Discord server joiner written in GO Disclaimer For Educational purposes only. Use at your own risk. Automation

Jan 3, 2023
Light weight http rate limiting proxy

Introduction Light weight http rate limiting proxy. The proxy will perform rate limiting based on the rules defined in the configuration file. If no r

Dec 23, 2022
Attempt to plot light sensor data from lunarsensor.
Attempt to plot light sensor data from lunarsensor.

lightsensor Attempt to plot light sensor data from lunarsensor. Buy the components, install firmware on Ambient Light Sensor. Build the go app that po

Nov 10, 2022
Handshake Query is a cross-platform library to trustlessly resolve and verify Handshake names using a p2p light client

Handshake Query ⚠️ Usage of this library is not currently recommended in your application as the API will likely change. Handshake Query is a cross-pl

Dec 22, 2022
GOLF(Go Light Filter), golf dependents Gorm and Gin.

GOLF (WIP) GOLF(Go Light Filter), golf dependents Gorm and Gin. golf can help you build model query as fast as,build model query like Django Rest Fram

Dec 12, 2021
Light weight Terminal User Interface (TUI) to pick material colors written by Go.
Light weight Terminal User Interface (TUI) to pick material colors written by Go.

mcpick Light weight Terminal User Interface (TUI) to pick material colors. You do NOT need to take your hands off the keyboard to pick colors. Getting

Dec 27, 2022