pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks

Last update: Dec 30, 2022

Comments: 15

leaves

Introduction

leaves is a library implementing prediction code for GBRT (Gradient Boosting Regression Trees) models in pure Go. The goal of the project - make it possible to use models from popular GBRT frameworks in Go programs without C API bindings.

NOTE: Before 1.0.0 release the API is a subject to change.

Features

General Features:
- support parallel predictions for batches
- support sigmoid, softmax transformation functions
- support getting leaf indices of decision trees
Support LightGBM (repo) models:
- read models from text format and from JSON format
- support gbdt, rf (random forest) and dart models
- support multiclass predictions
- addition optimizations for categorical features (for example, one hot decision rule)
- addition optimizations exploiting only prediction usage
Support XGBoost (repo) models:
- read models from binary format
- support gbtree, gblinear, dart models
- support multiclass predictions
- support missing values (nan)
Support scikit-learn (repo) tree models (experimental support):
- read models from pickle format (protocol 0)
- support sklearn.ensemble.GradientBoostingClassifier

Usage examples

In order to start, go get this repository:

go get github.com/dmitryikh/leaves

Minimal example:

package main

import (
	"fmt"

	"github.com/dmitryikh/leaves"
)

func main() {
	// 1. Read model
	useTransformation := true
	model, err := leaves.LGEnsembleFromFile("lightgbm_model.txt", useTransformation)
	if err != nil {
		panic(err)
	}

	// 2. Do predictions!
	fvals := []float64{1.0, 2.0, 3.0}
	p := model.PredictSingle(fvals, 0)
	fmt.Printf("Prediction for %v: %f\n", fvals, p)
}

In order to use XGBoost model, just change leaves.LGEnsembleFromFile, to leaves.XGEnsembleFromFile.

Documentation

Documentation is hosted on godoc (link). Documentation contains complex usage examples and full API reference. Some additional information about usage examples can be found in leaves_test.go.

Compatibility

Most leaves features are tested to be compatible with old and coming versions of GBRT libraries. In compatibility.md one can found detailed report about leaves correctness against different versions of external GBRT libraries.

Some additional information on new features and backward compatibility can be found in NOTES.md.

Benchmark

Below are comparisons of prediction speed on batches (~1000 objects in 1 API call). Hardware: MacBook Pro (15-inch, 2017), 2,9 GHz Intel Core i7, 16 ГБ 2133 MHz LPDDR3. C API implementations were called from python bindings. But large batch size should neglect overhead of python bindings. leaves benchmarks were run by means of golang test framework: go test -bench. See benchmark for mode details on measurments. See testdata/README.md for data preparation pipelines.

Single thread:

Test Case	Features	Trees	Batch size	C API	leaves
LightGBM MS LTR	137	500	1000	49ms	51ms
LightGBM Higgs	28	500	1000	50ms	50ms
LightGBM KDD Cup 99*	41	1200	1000	70ms	85ms
XGBoost Higgs	28	500	1000	44ms	50ms

4 threads:

Test Case	Features	Trees	Batch size	C API	leaves
LightGBM MS LTR	137	500	1000	14ms	14ms
LightGBM Higgs	28	500	1000	14ms	14ms
LightGBM KDD Cup 99*	41	1200	1000	19ms	24ms
XGBoost Higgs	28	500	1000	?	14ms

(?) - currenly I'm unable to utilize multithreading form XGBoost predictions by means of python bindings

(*) - KDD Cup 99 problem involves continuous and categorical features simultaneously

Limitations

LightGBM models:
- limited support of transformation functions (support only sigmoid, softmax)
XGBoost models:
- limited support of transformation functions (support only sigmoid, softmax)
- could be slight divergence between C API predictions vs. leaves because of floating point convertions and comparisons tolerances
scikit-learn tree models:
- no support transformations functions. Output scores is raw scores (as from GradientBoostingClassifier.decision_function)
- only pickle protocol 0 is supported
- could be slight divergence between sklearn predictions vs. leaves because of floating point convertions and comparisons tolerances

Contacts

In case if you are interested in the project or if you have questions, please contact with me by email: khdmitryi at gmail.com

Owner

Dmitry Khominich

Software & Machine Learning Engineer

https://github.com/dmitryikh/leaves

Comments

Support outputting leaf indices for all the `predict*` functions

Thanks to the popular paper https://research.fb.com/wp-content/uploads/2016/11/practical-lessons-from-predicting-clicks-on-ads-at-facebook.pdf

Many people use GBDT to extract features from a dataset and then instead of predicting results directly. The extracted features are the leaf indices from each estimator which makes the decision.

This is achieved by setting predleaf with lightGBM https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.predict

I am adding the same support in this pull request. Basically, the user can set this parameter and then predict function will return the leaf indices.

The test data are generated by lightGBM to make sure that the indices are generated in the same way.
Skips transformation for regression for LightGBM models

This patch fixes the behaviour of LGEnsembleFromFile function. If a regression model is loaded then loadTransformation parameter will be autoset to false.
Understanding the output of Predict
Hi,

I'm not sure I fully understand the output of the Predict() methods.

I have a fully trained model with 9 classes and 100 estimators. I then run:

predictions := make([]float64, 9) err = model.Predict(values, 100, predictions) util.SigmoidFloat64SliceInplace(predictions) log.Infof("Prediction for %v:\n %v", values, predictions)

That yields:

Prediction for [110 0 12 0 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]: [0.2276 0.1822 0.2664 0.0594 0.0682 0.9859 0.1283 0.6349 0.0706]

I understand those are the probabilities for EACH of the 9 classes being the right one. However, how am I able to get the actual value of the class? In python if I do y_pred = model.predict(values), it will correctly show me the expected class values. E.g. my class values look like this: 1242, 1152, 1552, 6662, etc. How can I map the prediction output from above to the class values? I haven't provided any specific order of it to the model
Return explicit error for predictSingle.

The PredictSingle returns 0.0 when there is an error, which could be sometimes confusing what is going on unless you read the code.

Maybe it's a good idea to just return the explicit errors, in order to be more friendly for the client code.
xgboost consistency failed

i build xgb model by python, and then run the results of test dataset. but when i use leaves to load model and predict, the results is inconsistent with python results.

and i test lgb model with the same dataset, the results are consistent.
Allow v3 as well

AFAIK v3 "just" contains additional values for debugging purposes. So at least for the time being it should be possible to allow v3 as well IMHO.
xgEnsemble prediction results are different from xgboost in python

I traning and testing data with xgboost in python, then use leaves in production env. The more infos are there:

In Python xgb testing, The data structure that I set up with pd.DataFrame is [0:value1 1:v2, 2:v3, ... , n:v(n+1)] the value1 is any value int type. and v2, ... , v(n+1) is float64 type. The 0 is prediction value. This result is testing result.

And this structure: [feature1:v2, f2:v3, ... , f(n):v(n+1)] This result is NOT testing result.

In Golang and I use leaves XGEnsembleFromFile->model.PredictCSR() also the result is NOT testing result.

I have tried to find how to solve it for over 5h like add {0:0} to first features group, but for my ridiculous low English level and Math level I can't find it. What's wrong with my testing data = =
Lightgbm dart support

#25

It seems that lightgbm DART works out of the box because of the lightgbm model format generalization.

Here I added doc&test on lightgbm DART support.
adds ability to support multi:softprob for xgboost

allows for use of multi:softprob for xgboost by defining a Softprob transformation and making the appropriate registration in transformation.go. Lastly, adds the transformation loading check for "multi:softprob" when loading an xgboost model.
Compatibility tests

This PR add scripts to perform compatibility tests: run leaves against models built in different versions of third party libraries (lightgbm & xgboost for now).

See compatibility.md for results

Unexpected objective field: 'lambdarank'

leaves.LGEnsembleFromFile() failed when load an objective=lambdarank model (lightGBM)

Error message: unexpected objective field: 'lambdarank', model:

tree
version=v2
num_class=1
num_tree_per_iteration=1
label_index=0
max_feature_idx=24
objective=lambdarank
feature_names=t quality freshness navboost pctr video_type lctr_1_3 lctr_4_7 lctr_8_30 sctr_1_3 sctr_4_7 sctr_8_30 ctr_1_3 ctr_4_7 ctr_8_30 loglclick_1_3 logclick_1_3 logsclick_1_3 lctr_ins ctr_ins sctr_ins loglclick_ins logclick_ins logsclick_ins instant_navboost
feature_infos=[0:1.3200000524520874] [3.3299998904112726e-05:1] [0.36787900328636169:1] [0.36790001392364502:0.9999966025352478] [0:0.98189848661422729] [1:200] [0:10.87989330291748] [0:9.3969650268554688] [0:11.486390113830566] [0:4.2822332382202148] [0:3.7750816345214844] [0:2.8636219501495361] [0:8.7641057968139648] [0:7.1885638236999512] [0:10.401005744934082] [0:6.4371075630187988] [0:6.4220900535583496] [0:5.9216046333312988] [0:5.5910482406616211] [0:3.3260509967803955] [0:1.3753839731216431] [0:5.3813371658325195] [0:5.3396997451782227] [0:4.9291071891784668] [0.36790001392364502:0.9999929666519165]
tree_sizes=1308 911 993 1073 1235 1154 992 1316 1234 1151 997 1163 1234 1077 1090 1244 1237 1152 1400 1228 1246 1310 1240 1072 1327 1068 1242 1081 1312 1082 1162 1000 1330 1310 1408 1253 1165 1328 1082 1004 1172 1328 1161 1081 1151 1323 1325 1321 1410 1166 1073 1403 996 1242 991 1336 1232 1250 995 1309

Is there any way to support tweedie regression models?
I have a model trained withtweedie regression in light gbm. With leaves I got panic:

panic: unexpected objective field: 'tweedie'

It works perfectly in python's lightGBM

lgbmodel_ww.txt
Support the use of sklearn pipelines with prediction model
I found this super handy, will be great if we can not just predict based on trained model but can also used a sklearn pipeline including the transformation steps before actual prediction
Question: support for objective:quantile

I have a model trained with quantile regression in light gbm. I get an error that this is not a valid option for objective when I used my model. Is there a workaround to get it working?
Support for newer versions of XGBoost

Something has changed in XGBoost model's binary format. The highest versions I've managed to make leaves work with is 1.0. Starting from 1.1+ I keep getting "panic: unexpected EOF". Is support for newer versions planned? Moreover, they've started to save models in JSON format and it looks like they're going to deprecate binaries altogether.

Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Jan 9, 2023

Example of Neural Network models of social and personality psychology phenomena

SocialNN Example of Neural Network models of social and personality psychology phenomena This repository gathers a collection of neural network models

Dec 5, 2022

Ensembles of decision trees in go/golang.

CloudForest Google Group Fast, flexible, multi-threaded ensembles of decision trees for machine learning in pure Go (golang). CloudForest allows for a

Dec 1, 2022

k-modes and k-prototypes clustering algorithms implementation in Go

go-cluster GO implementation of clustering algorithms: k-modes and k-prototypes. K-modes algorithm is very similar to well-known clustering algorithm

Nov 29, 2022

A native Go clean room implementation of the Porter Stemming algorithm.

Go Porter Stemmer A native Go clean room implementation of the Porter Stemming Algorithm. This algorithm is of interest to people doing Machine Learni

Jan 3, 2023

An implementation of Neural Turing Machines

Neural Turing Machines Package ntm implements the Neural Turing Machine architecture as described in A.Graves, G. Wayne, and I. Danihelka. arXiv prepr

Sep 13, 2022

Golang implementation of the Paice/Husk Stemming Algorithm

##Golang Implementation of the Paice/Husk stemming algorithm This project was created for the QUT course INB344. Details on the algorithm can be found

Sep 27, 2022

Fast (linear time) implementation of the Gaussian Blur algorithm in Go.

Song2 Fast (linear time) implementation of the Gaussian Blur algorithm in Go.

Oct 25, 2022

Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication.

Dec 23, 2022

A high performance go implementation of Wappalyzer Technology Detection Library

wappalyzergo A high performance port of the Wappalyzer Technology Detection Library to Go. Inspired by https://github.com/rverton/webanalyze. Features

Jan 8, 2023

Go implementation of the yolo v3 object detection system

Go YOLO V3 This repository provides a plug and play implementation of the Yolo V3 object detection system in Go, leveraging gocv. Prerequisites Since

Dec 14, 2022

k-means clustering algorithm implementation written in Go

kmeans k-means clustering algorithm implementation written in Go What It Does k-means clustering partitions a multi-dimensional data set into k cluste

Dec 6, 2022

Golang k-d tree implementation with duplicate coordinate support

Nov 9, 2022

GoDS (Go Data Structures). Containers (Sets, Lists, Stacks, Maps, Trees), Sets (HashSet, TreeSet, LinkedHashSet), Lists (ArrayList, SinglyLinkedList, DoublyLinkedList), Stacks (LinkedListStack, ArrayStack), Maps (HashMap, TreeMap, HashBidiMap, TreeBidiMap, LinkedHashMap), Trees (RedBlackTree, AVLTree, BTree, BinaryHeap), Comparators, Iterators, Enumerables, Sort, JSON

GoDS (Go Data Structures) Implementation of various data structures and algorithms in Go. Data Structures Containers Lists ArrayList SinglyLinkedList

Jan 9, 2023

pure Go implementation of prediction part for GBRT (Gradient Boosting Regression Trees) models from popular frameworks

leaves

Introduction

Features

Usage examples

Documentation

Compatibility

Benchmark

Limitations

Contacts

Owner

Dmitry Khominich

Comments

Support outputting leaf indices for all the `predict*` functions

Skips transformation for regression for LightGBM models

Understanding the output of Predict

Return explicit error for predictSingle.

xgboost consistency failed

Allow v3 as well

xgEnsemble prediction results are different from xgboost in python

Lightgbm dart support

adds ability to support multi:softprob for xgboost

Compatibility tests

Unexpected objective field: 'lambdarank'

Is there any way to support tweedie regression models?

Support the use of sklearn pipelines with prediction model

Question: support for objective:quantile

Support for newer versions of XGBoost

Related tags

Standard machine learning models

Example of Neural Network models of social and personality psychology phenomena

Ensembles of decision trees in go/golang.

k-modes and k-prototypes clustering algorithms implementation in Go

A native Go clean room implementation of the Porter Stemming algorithm.

An implementation of Neural Turing Machines

Golang implementation of the Paice/Husk Stemming Algorithm

Fast (linear time) implementation of the Gaussian Blur algorithm in Go.

Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

A high performance go implementation of Wappalyzer Technology Detection Library

Go implementation of the yolo v3 object detection system

k-means clustering algorithm implementation written in Go

Golang k-d tree implementation with duplicate coordinate support

A prediction program which analyzes given numbers and calculates new values

An implementation of the popular game Codenames created with Go and React.

Implementation of a popular graphics benchmark written on Ebiten.

A TUI implementation of the popular word quiz wordle!

A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

Golang implementation of Radix trees