Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

m2cgen

GitHub Actions Status Coverage Status License: MIT Python Versions PyPI Version Downloads

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code (Python, C, Java, Go, JavaScript, Visual Basic, C#, PowerShell, R, PHP, Dart, Haskell, Ruby, F#, Rust).

Installation

Supported Python version is >= 3.6.

pip install m2cgen

Supported Languages

  • C
  • C#
  • Dart
  • F#
  • Go
  • Haskell
  • Java
  • JavaScript
  • PHP
  • PowerShell
  • Python
  • R
  • Ruby
  • Rust
  • Visual Basic (VBA-compatible)

Supported Models

Classification Regression
Linear
  • scikit-learn
    • LogisticRegression
    • LogisticRegressionCV
    • PassiveAggressiveClassifier
    • Perceptron
    • RidgeClassifier
    • RidgeClassifierCV
    • SGDClassifier
  • lightning
    • AdaGradClassifier
    • CDClassifier
    • FistaClassifier
    • SAGAClassifier
    • SAGClassifier
    • SDCAClassifier
    • SGDClassifier
  • scikit-learn
    • ARDRegression
    • BayesianRidge
    • ElasticNet
    • ElasticNetCV
    • GammaRegressor
    • HuberRegressor
    • Lars
    • LarsCV
    • Lasso
    • LassoCV
    • LassoLars
    • LassoLarsCV
    • LassoLarsIC
    • LinearRegression
    • OrthogonalMatchingPursuit
    • OrthogonalMatchingPursuitCV
    • PassiveAggressiveRegressor
    • PoissonRegressor
    • RANSACRegressor(only supported regression estimators can be used as a base estimator)
    • Ridge
    • RidgeCV
    • SGDRegressor
    • TheilSenRegressor
    • TweedieRegressor
  • StatsModels
    • Generalized Least Squares (GLS)
    • Generalized Least Squares with AR Errors (GLSAR)
    • Generalized Linear Models (GLM)
    • Ordinary Least Squares (OLS)
    • [Gaussian] Process Regression Using Maximum Likelihood-based Estimation (ProcessMLE)
    • Quantile Regression (QuantReg)
    • Weighted Least Squares (WLS)
  • lightning
    • AdaGradRegressor
    • CDRegressor
    • FistaRegressor
    • SAGARegressor
    • SAGRegressor
    • SDCARegressor
    • SGDRegressor
SVM
  • scikit-learn
    • LinearSVC
    • NuSVC
    • OneClassSVM
    • SVC
  • lightning
    • KernelSVC
    • LinearSVC
  • scikit-learn
    • LinearSVR
    • NuSVR
    • SVR
  • lightning
    • LinearSVR
Tree
  • DecisionTreeClassifier
  • ExtraTreeClassifier
  • DecisionTreeRegressor
  • ExtraTreeRegressor
Random Forest
  • ExtraTreesClassifier
  • LGBMClassifier(rf booster only)
  • RandomForestClassifier
  • XGBRFClassifier
  • ExtraTreesRegressor
  • LGBMRegressor(rf booster only)
  • RandomForestRegressor
  • XGBRFRegressor
Boosting
  • LGBMClassifier(gbdt/dart/goss booster only)
  • XGBClassifier(gbtree(including boosted forests)/gblinear booster only)
    • LGBMRegressor(gbdt/dart/goss booster only)
    • XGBRegressor(gbtree(including boosted forests)/gblinear booster only)

    You can find versions of packages with which compatibility is guaranteed by CI tests here. Other versions can also be supported but they are untested.

    Classification Output

    Linear / Linear SVM / Kernel SVM

    Binary

    Scalar value; signed distance of the sample to the hyperplane for the second class.

    Multiclass

    Vector value; signed distance of the sample to the hyperplane per each class.

    Comment

    The output is consistent with the output of LinearClassifierMixin.decision_function.

    SVM

    Outlier detection

    Scalar value; signed distance of the sample to the separating hyperplane: positive for an inlier and negative for an outlier.

    Binary

    Scalar value; signed distance of the sample to the hyperplane for the second class.

    Multiclass

    Vector value; one-vs-one score for each class, shape (n_samples, n_classes * (n_classes-1) / 2).

    Comment

    The output is consistent with the output of BaseSVC.decision_function when the decision_function_shape is set to ovo.

    Tree / Random Forest / Boosting

    Binary

    Vector value; class probabilities.

    Multiclass

    Vector value; class probabilities.

    Comment

    The output is consistent with the output of the predict_proba method of DecisionTreeClassifier / ExtraTreeClassifier / ExtraTreesClassifier / RandomForestClassifier / XGBRFClassifier / XGBClassifier / LGBMClassifier.

    Usage

    Here's a simple example of how a linear model trained in Python environment can be represented in Java code:

    from sklearn.datasets import load_boston
    from sklearn import linear_model
    import m2cgen as m2c
    
    boston = load_boston()
    X, y = boston.data, boston.target
    
    estimator = linear_model.LinearRegression()
    estimator.fit(X, y)
    
    code = m2c.export_to_java(estimator)

    Generated Java code:

    public class Model {
    
        public static double score(double[] input) {
            return (((((((((((((36.45948838508965) + ((input[0]) * (-0.10801135783679647))) + ((input[1]) * (0.04642045836688297))) + ((input[2]) * (0.020558626367073608))) + ((input[3]) * (2.6867338193449406))) + ((input[4]) * (-17.76661122830004))) + ((input[5]) * (3.8098652068092163))) + ((input[6]) * (0.0006922246403454562))) + ((input[7]) * (-1.475566845600257))) + ((input[8]) * (0.30604947898516943))) + ((input[9]) * (-0.012334593916574394))) + ((input[10]) * (-0.9527472317072884))) + ((input[11]) * (0.009311683273794044))) + ((input[12]) * (-0.5247583778554867));
        }
    }

    You can find more examples of generated code for different models/languages here.

    CLI

    m2cgen can be used as a CLI tool to generate code using serialized model objects (pickle protocol):

    $ m2cgen 
       
         --language 
        
          [--indent 
         
          ] [--function_name 
          
           ]
             [--class_name 
           
            ] [--module_name 
            
             ] [--package_name 
             
              ] [--namespace 
              
               ] [--recursion-limit 
               
                ] 
               
              
             
            
           
          
         
        
       

    Don't forget that for unpickling serialized model objects their classes must be defined in the top level of an importable module in the unpickling environment.

    Piping is also supported:

    $ cat 
       
         | m2cgen --language 
        
    
        
       

    FAQ

    Q: Generation fails with RecursionError: maximum recursion depth exceeded error.

    A: If this error occurs while generating code using an ensemble model, try to reduce the number of trained estimators within that model. Alternatively you can increase the maximum recursion depth with sys.setrecursionlimit( ) .

    Q: Generation fails with ImportError: No module named error while transpiling model from a serialized model object.

    A: This error indicates that pickle protocol cannot deserialize model object. For unpickling serialized model objects, it is required that their classes must be defined in the top level of an importable module in the unpickling environment. So installation of package which provided model's class definition should solve the problem.

    Q: Generated by m2cgen code provides different results for some inputs compared to original Python model from which the code were obtained.

    A: Some models force input data to be particular type during prediction phase in their native Python libraries. Currently, m2cgen works only with float64 (double) data type. You can try to cast your input data to another type manually and check results again. Also, some small differences can happen due to specific implementation of floating-point arithmetic in a target language.

    Comments
    • Code generated for XGBoost models returns invalid scores when tree_method is set to

      Code generated for XGBoost models returns invalid scores when tree_method is set to "hist"

      I have trained xgboost models in Python and am using the CLI interface to convert the serialized models to pure python. However, when I use the pure python, the results differ from the predictions using the model directly.

      Python 3.7 xgboost 0.90

      My model has a large number of parameters (somewhat over 500). Here are predicted class probabilities from the original model: image

      Here are the same predicted probabilities using the generated python code via m2cgen: image

      We can see that the results are similar but not the same. The result is a significant number of cases that are moved into different classes between the two sets of predictions.

      I have also tested this with binary classification models and have the same issues.

    • In Java interpreter ignore subroutines and perform code split based on the AST size

      In Java interpreter ignore subroutines and perform code split based on the AST size

      After investigating possible solutions for https://github.com/BayesWitnesses/m2cgen/issues/152, I came to a conclusion that with the existing design it's extremely hard to come up with the optimal algorithm to split code into subroutines on the interpreter side (and not in assemblers). The primary reason for that is that since we always interpret one expression at a time it's hard to predict both the depth of the current subtree and the number of expressions that are left to interpret in other branches. I've achieved some progress by splitting expressions into separate subroutines based on the size of the code generated so far (i.e. code size threshold), but more often than not I'll get some stupid subroutines like this one:

      public static double subroutine2(double[] input) {
          return 22.640634908349323;
      }
      

      That's why I took a simpler approach and attempted to optimize an interpreter that caused trouble in the first place - the R one. I slightly modified its behavior: when the binary expressions count threshold is exceeded, it no longer split them into separate variable assignments, but moves them into their own subroutines. Although it might not be the most optimal way for simpler models (like linear ones), it helps tremendously with gradient boosting and random forest models. Since those models are summation of independent estimators, we end up putting every N (5 by default) estimators into their own subroutine, improving this way the execution time. @StrikerRUS please let me know what you think.

    • added possibility to write generated code into file

      added possibility to write generated code into file

      Closed #110.

      Real-life frustrating example:

      import sys
      
      from sklearn.datasets import load_boston
      
      import lightgbm as lgb
      import m2cgen as m2c
      
      X, y = load_boston(True)
      est = lgb.LGBMRegressor(n_estimators=1000).fit(X, y)
      
      sys.setrecursionlimit(1<<30)
      print(m2c.export_to_python(est))
      
      IOPub data rate exceeded.
      The notebook server will temporarily stop sending output
      to the client in order to avoid crashing it.
      To change this limit, set the config variable
      `--NotebookApp.iopub_data_rate_limit`.
      
      Current values:
      NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
      NotebookApp.rate_limit_window=3.0 (secs)
      

      m2c.export_to_python(est, 'test.txt') works fine in this scenario.

    • Dart language support

      Dart language support

      For those building Flutter apps that would like to be able to utilize static models trained in scikit on-device, this tool would be a perfect fit. And if the Flutter dev team decides to add a hot code push feature to the framework, models from m2cgen could be updated on the fly.

    • added support for PowerShell

      added support for PowerShell

      With this PR Windows users will be able to execute ML models from "command line" without the need to install any programming language (PowerShell is already installed in Windows).

    • Handle missing values replacement in LightGBM

      Handle missing values replacement in LightGBM

      Sometimes exported LGBMRegressor model's prediction doesn't match predictions from the original model. This happens when model encounters values missing during training. More detailed discussion could be found here https://github.com/microsoft/LightGBM/issues/2921

      This is by no means a complete fix for the problem, it only addresses this part of the LightGBM behavior: "for numerical features, if not missing is seen in training, the missing value will be converted to zero, and then check it with the threshold. So it is not always the left side."

      Fix has also being tested on fairly big regression model with numerical features and it works as expected.

      How to reproduce:

      import numpy as np
      import lightgbm as lgb
      import m2cgen as m2c
      from sklearn.datasets import load_diabetes
      
      dataset = load_diabetes()
      
      gbm = lgb.LGBMRegressor(num_leaves=51,
                              learning_rate=0.05,
                              n_estimators=100)
      gbm.fit(dataset['data'], dataset['target'])
      
      
      test = np.array([-2.175, 0.797, np.NaN, 1.193, 0.0, 0.0, 0.0, np.NaN, np.NaN, np.NaN])
      
      print(gbm.predict(np.array([test]))[0])
      
      code = m2c.export_to_python(gbm)
      
      with open('model.py', 'w') as fp:
          fp.write(code)
      
      import model as m
      
      print(m.score(test))
      
    • Code generated from XGBoost model includes

      Code generated from XGBoost model includes "None"

      When transpiling XGBRegressor and XGBClassifier models such as the following basic example:

      from xgboost import XGBRegressor
      from sklearn import datasets
      import m2cgen as m2c
      
      iris_data = datasets.load_iris(return_X_y=True)
      
      mod = XGBRegressor(booster="gblinear", max_depth=2)
      X, y = iris_data
      mod.fit(X[:120], y[:120])
      
      code = m2c.export_to_c(mod)
      
      print(code)
      

      the resulting c-code includes a Pythonesque None :

      double score(double * input) {
          return (None) + (((((-0.391196) + ((input[0]) * (-0.0196191))) + ((input[1]) * (-0.11313))) + ((input[2]) * (0.137024))) + ((input[3]) * (0.645197)));
      }
      

      Probably I am missing some basic step?

    • added Visual Basic code generator

      added Visual Basic code generator

      The motivation behind this PR is allowing users with poor programming skills access to strong ML models inside Office applications (mainly in Excel).

      Also, if I'm not mistaken, VBA projects can be used in SOLIDWORKS.

      After merging this PR users will be able to use ML models inside Excel in the following way.

      Usage Example

      As usual, generate a model via supported ML algorithm:

      from sklearn.datasets import load_boston
      from sklearn.svm import SVR
      
      import m2cgen as m2c
      
      X, y = load_boston(True)
      X = X[:4, :2]
      y = y[:4]
      
      reg = SVR()
      reg.fit(X, y)
      

      After that output VBA code representation of the model via the m2cgen Python package:

      print(m2c.export_to_vba(reg))
      
      Function score(ByRef input_vector() As Double) As Double
          Dim var0 As Double
          var0 = (0) - (0.3333333333333333)
          score = ((((28.70000000001455) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.00632) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((18.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((2.31) - (input_vector(2)), 2))))) * (-1.0))) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.02731) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((0.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((7.07) - (input_vector(2)), 2))))) * (-1.0))) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.02729) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((0.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((7.07) - (input_vector(2)), 2))))) * (1.0))) + ((Exp((var0) * (((Application.WorksheetFunction.Power((0.03237) - (input_vector(0)), 2)) + (Application.WorksheetFunction.Power((0.0) - (input_vector(1)), 2))) + (Application.WorksheetFunction.Power((2.18) - (input_vector(2)), 2))))) * (1.0))
      End Function
      

      Create empty Visual Basic file example_module.bas and paste the copied output there.

      Now open Excel, enable Developer tab and click Developer -> Visual Basic (Alt + F11). In VBA editor click File -> Import File and choose previously created example_module.bas file.

      After doing that, one more required action is writing a proxy function that will convert Excel Range object to Array and call the model. For instance, such function for regression, for row-based features placed inside Excel can be:

      Function SCOREROW(features As Range) As Double
          Dim arr() As Double
          ReDim Preserve arr(features.Columns.Count - 1)
          Dim i As Integer
          For i = 0 To UBound(arr)
              arr(i) = features(1, i + 1)
          Next i
          SCOREROW = score(arr)
      End Function
      

      Now this proxy function can be used on Excel sheet as any built-in Excel functions:

      image

      Let's compare Excel predictions with ones from the native Python model:

      reg.predict(X)
      
      array([27.7       , 28.70034543, 28.70034543, 29.7       ])
      

      Seems that everything is fine!

    • Fix #168. Enforce float32 type for split condition values for GBT models created using XGBoost

      Fix #168. Enforce float32 type for split condition values for GBT models created using XGBoost

      As it turns out the issue reported in https://github.com/BayesWitnesses/m2cgen/issues/168 is not unique to the "hist" tree construction algorithm. It seems that with "hist" method the likelihood of reprdocue is much higher due to relying on feature histograms. I was able to reproduce the same discrepancy with non-hist methods on a larger sample of test data.

      The issue occurs due to a double precision error and reproduces every time when the feature value matches the split condition in one of the tree's nodes.

      Example: feature value = 0.671, split condition = 0.671000004. When we hit this condition in the generated code the outcome of 0.671 < 0.671000004 is "true" (or "yes" branch). While in XGBoost the same condition leads to the "no" branch.

      After some investigation I noticed that the XGBoost's DMatrix forces all values to be float32 (https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L565). At the same time in our assemblers we rely on default 64-bit floats. Forcing the split condition to be float32 seem to address the issue. At least I couldn't reproduce it so far.

    • add option to save generated code into file

      add option to save generated code into file

      I'm sorry if I missed this functionality, but CLI version hasn't it for sure (I saw the related code only in generate_code_examples.py). I guess it will be very useful to eliminate copy-paste phase, especially for large models.

      Of course, piping is a solution, but not for development in Jupyter Notebook, for example.

    • add: Make function_name parametrized

      add: Make function_name parametrized

      Hello everyone,

      First of all, thanks a ton for putting this tool/library together -- especially in resource-stranded environments, it does have a potential to literally save lives!

      One small problem I was fighting with while using it was the score function it uses in the generated modules. When they are used as drop-in replacements for trained models, using score is a bit strange, as the API generally provides function like predict or predict_proba. It would therefore be of great help to me if this name could be dynamically changed and I would not have to do so manually.

      Please do let me know if something like this sounds like a sensible addition. I'd be happy to update the code so that it reflect your vision, so please feel free to just let me know whenever that may be the case.

      Thanks!


      • Currently m2cgen generates a module in various languages that has a "score"/"Score" function/method. This is not always desirable, as many of the trained modules that are to be exported may provide their prediction via API functions with different names (such as predict).

      • This commit adds a way of specifying the name of the function both via the CLI and in the exporters (that is, in the export_to_ funcitons) by specifying the function_name option/parameter while keeping the default set to "score"/"Score" for backwards compatilibity.

      Signed-off-by: mr.Shu [email protected]

    • Bump lightgbm from 3.3.2 to 3.3.4

      Bump lightgbm from 3.3.2 to 3.3.4

      Bumps lightgbm from 3.3.2 to 3.3.4.

      Release notes

      Sourced from lightgbm's releases.

      v3.3.4

      Changes

      This is a special release, put up to prevent the R package from being archived on CRAN.

      See #5618 and #5619 for context.

      This release only contains the changes, relative to v3.3.3, necessary to prevent removal of the R package from CRAN.

      💡 New Features

      None

      🔨 Breaking

      None

      🚀 Efficiency Improvement

      None

      🐛 Bug Fixes

      📖 Documentation

      None

      🧰 Maintenance

      v3.3.3

      Changes

      This is a special release, put up to prevent the R package from being archived on CRAN.

      See microsoft/LightGBM#5502 and microsoft/LightGBM#5525 for context.

      This release only contains the changes, relevant to v3.3.2, necessary to prevent removal of the R package from CRAN.

      💡 New Features

      🔨 Breaking

      None

      ... (truncated)

      Commits
      • 8d68f34 fix detection of QEMU in pinning dependencies
      • 8431c38 remove pin on scikit-learn and skip all the load_boston() tests
      • b95c865 looser scikit-learn pin to try to get QEMU builds working
      • a47d7c7 fix QEMU
      • 70aa002 more pinning to old versions
      • 581a7fa fix numpy constraint
      • 940022c try capping python version
      • d721f4e ceiling on scikit-learn
      • cb9962a ceiling on dask too
      • caecafd try pinning dependencies
      • Additional commits viewable in compare view

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • Bump scipy from 1.9.1 to 1.10.0

      Bump scipy from 1.9.1 to 1.10.0

      Bumps scipy from 1.9.1 to 1.10.0.

      Release notes

      Sourced from scipy's releases.

      SciPy 1.10.0 Release Notes

      SciPy 1.10.0 is the culmination of 6 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Before upgrading, we recommend that users check that their own code does not use deprecated SciPy functionality (to do so, run your code with python -Wd and check for DeprecationWarning s). Our development attention will now shift to bug-fix releases on the 1.10.x branch, and on adding new features on the main branch.

      This release requires Python 3.8+ and NumPy 1.19.5 or greater.

      For running on PyPy, PyPy3 6.0+ is required.

      Highlights of this release

      • A new dedicated datasets submodule (scipy.datasets) has been added, and is now preferred over usage of scipy.misc for dataset retrieval.
      • A new scipy.interpolate.make_smoothing_spline function was added. This function constructs a smoothing cubic spline from noisy data, using the generalized cross-validation (GCV) criterion to find the tradeoff between smoothness and proximity to data points.
      • scipy.stats has three new distributions, two new hypothesis tests, three new sample statistics, a class for greater control over calculations involving covariance matrices, and many other enhancements.

      New features

      scipy.datasets introduction

      • A new dedicated datasets submodule has been added. The submodules is meant for datasets that are relevant to other SciPy submodules ands content (tutorials, examples, tests), as well as contain a curated set of datasets that are of wider interest. As of this release, all the datasets from scipy.misc have been added to scipy.datasets (and deprecated in scipy.misc).
      • The submodule is based on Pooch (a new optional dependency for SciPy), a Python package to simplify fetching data files. This move will, in a subsequent release, facilitate SciPy to trim down the sdist/wheel sizes, by decoupling the data files and moving them out of the SciPy repository, hosting them externally and

      ... (truncated)

      Commits
      • dde5059 REL: 1.10.0 final [wheel build]
      • 7856f28 Merge pull request #17696 from tylerjereddy/treddy_110_final_prep
      • 205b624 DOC: add missing author
      • 1ab9f1b DOC: update 1.10.0 relnotes
      • ac2f45f MAINT: integrate._qmc_quad: mark as private with preceding underscore
      • 3e0ae1a REV: integrate.qmc_quad: delay release to SciPy 1.11.0
      • 34cdf05 MAINT: FFT pybind11 fixups
      • 843500a Merge pull request #17689 from mdhaber/gh17686
      • 089924b REL: integrate.qmc_quad: remove from release notes
      • 3e47110 REL: 1.10.0rc3 unreleased
      • Additional commits viewable in compare view

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • Bump numpy from 1.23.3 to 1.24.1

      Bump numpy from 1.23.3 to 1.24.1

      Bumps numpy from 1.23.3 to 1.24.1.

      Release notes

      Sourced from numpy's releases.

      v1.24.1

      NumPy 1.24.1 Release Notes

      NumPy 1.24.1 is a maintenance release that fixes bugs and regressions discovered after the 1.24.0 release. The Python versions supported by this release are 3.8-3.11.

      Contributors

      A total of 12 people contributed to this release. People with a "+" by their names contributed a patch for the first time.

      • Andrew Nelson
      • Ben Greiner +
      • Charles Harris
      • Clément Robert
      • Matteo Raso
      • Matti Picus
      • Melissa Weber Mendonça
      • Miles Cranmer
      • Ralf Gommers
      • Rohit Goswami
      • Sayed Adel
      • Sebastian Berg

      Pull requests merged

      A total of 18 pull requests were merged for this release.

      • #22820: BLD: add workaround in setup.py for newer setuptools
      • #22830: BLD: CIRRUS_TAG redux
      • #22831: DOC: fix a couple typos in 1.23 notes
      • #22832: BUG: Fix refcounting errors found using pytest-leaks
      • #22834: BUG, SIMD: Fix invalid value encountered in several ufuncs
      • #22837: TST: ignore more np.distutils.log imports
      • #22839: BUG: Do not use getdata() in np.ma.masked_invalid
      • #22847: BUG: Ensure correct behavior for rows ending in delimiter in...
      • #22848: BUG, SIMD: Fix the bitmask of the boolean comparison
      • #22857: BLD: Help raspian arm + clang 13 about __builtin_mul_overflow
      • #22858: API: Ensure a full mask is returned for masked_invalid
      • #22866: BUG: Polynomials now copy properly (#22669)
      • #22867: BUG, SIMD: Fix memory overlap in ufunc comparison loops
      • #22868: BUG: Fortify string casts against floating point warnings
      • #22875: TST: Ignore nan-warnings in randomized out tests
      • #22883: MAINT: restore npymath implementations needed for freebsd
      • #22884: BUG: Fix integer overflow in in1d for mixed integer dtypes #22877
      • #22887: BUG: Use whole file for encoding checks with charset_normalizer.

      Checksums

      ... (truncated)

      Commits
      • a28f4f2 Merge pull request #22888 from charris/prepare-1.24.1-release
      • f8fea39 REL: Prepare for the NumPY 1.24.1 release.
      • 6f491e0 Merge pull request #22887 from charris/backport-22872
      • 48f5fe4 BUG: Use whole file for encoding checks with charset_normalizer [f2py] (#22...
      • 0f3484a Merge pull request #22883 from charris/backport-22882
      • 002c60d Merge pull request #22884 from charris/backport-22878
      • 38ef9ce BUG: Fix integer overflow in in1d for mixed integer dtypes #22877 (#22878)
      • bb00c68 MAINT: restore npymath implementations needed for freebsd
      • 64e09c3 Merge pull request #22875 from charris/backport-22869
      • dc7bac6 TST: Ignore nan-warnings in randomized out tests
      • Additional commits viewable in compare view

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • Bump xgboost from 1.6.2 to 1.7.2

      Bump xgboost from 1.6.2 to 1.7.2

      Bumps xgboost from 1.6.2 to 1.7.2.

      Release notes

      Sourced from xgboost's releases.

      1.7.2 Patch Release

      v1.7.2 (2022 Dec 8)

      This is a patch release for bug fixes.

      • Work with newer thrust and libcudacxx (#8432)

      • Support null value in CUDA array interface namespace. (#8486)

      • Use getsockname instead of SO_DOMAIN on AIX. (#8437)

      • [pyspark] Make QDM optional based on a cuDF check (#8471)

      • [pyspark] sort qid for SparkRanker. (#8497)

      • [dask] Properly await async method client.wait_for_workers. (#8558)

      • [R] Fix CRAN test notes. (#8428)

      • [doc] Fix outdated document [skip ci]. (#8527)

      • [CI] Fix github action mismatched glibcxx. (#8551)

      Artifacts

      You can verify the downloaded packages by running this on your Unix shell:

      echo "<hash> <artifact>" | shasum -a 256 --check
      
      15be5a96e86c3c539112a2052a5be585ab9831119cd6bc3db7048f7e3d356bac  xgboost_r_gpu_linux_1.7.2.tar.gz
      0dd38b08f04ab15298ec21c4c43b17c667d313eada09b5a4ac0d35f8d9ba15d7  xgboost_r_gpu_win64_1.7.2.tar.gz
      

      1.7.1 Patch Release

      v1.7.1 (2022 November 3)

      This is a patch release to incorporate the following hotfix:

      • Add back xgboost.rabit for backwards compatibility (#8411)

      Release 1.7.0 stable

      Note. The source distribution of Python XGBoost 1.7.0 was defective (#8415). Since PyPI does not allow us to replace existing artifacts, we released 1.7.0.post0 version to upload the new source distribution. Everything in 1.7.0.post0 is identical to 1.7.0 otherwise.

      v1.7.0 (2022 Oct 20)

      We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

      PySpark

      XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

      Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

      Development of categorical data support

      More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

      ... (truncated)

      Changelog

      Sourced from xgboost's changelog.

      XGBoost Change Log

      This file records the changes in xgboost library in reverse chronological order.

      v1.7.0 (2022 Oct 20)

      We are excited to announce the feature packed XGBoost 1.7 release. The release note will walk through some of the major new features first, then make a summary for other improvements and language-binding-specific changes.

      PySpark

      XGBoost 1.7 features initial support for PySpark integration. The new interface is adapted from the existing PySpark XGBoost interface developed by databricks with additional features like QuantileDMatrix and the rapidsai plugin (GPU pipeline) support. The new Spark XGBoost Python estimators not only benefit from PySpark ml facilities for powerful distributed computing but also enjoy the rest of the Python ecosystem. Users can define a custom objective, callbacks, and metrics in Python and use them with this interface on distributed clusters. The support is labeled as experimental with more features to come in future releases. For a brief introduction please visit the tutorial on XGBoost's document page. (#8355, #8344, #8335, #8284, #8271, #8283, #8250, #8231, #8219, #8245, #8217, #8200, #8173, #8172, #8145, #8117, #8131, #8088, #8082, #8085, #8066, #8068, #8067, #8020, #8385)

      Due to its initial support status, the new interface has some limitations; categorical features and multi-output models are not yet supported.

      Development of categorical data support

      More progress on the experimental support for categorical features. In 1.7, XGBoost can handle missing values in categorical features and features a new parameter max_cat_threshold, which limits the number of categories that can be used in the split evaluation. The parameter is enabled when the partitioning algorithm is used and helps prevent over-fitting. Also, the sklearn interface can now accept the feature_types parameter to use data types other than dataframe for categorical features. (#8280, #7821, #8285, #8080, #7948, #7858, #7853, #8212, #7957, #7937, #7934)

      Experimental support for federated learning and new communication collective

      An exciting addition to XGBoost is the experimental federated learning support. The federated learning is implemented with a gRPC federated server that aggregates allreduce calls, and federated clients that train on local data and use existing tree methods (approx, hist, gpu_hist). Currently, this only supports horizontal federated learning (samples are split across participants, and each participant has all the features and labels). Future plans include vertical federated learning (features split across participants), and stronger privacy guarantees with homomorphic encryption and differential privacy. See Demo with NVFlare integration for example usage with nvflare.

      As part of the work, XGBoost 1.7 has replaced the old rabit module with the new collective module as the network communication interface with added support for runtime backend selection. In previous versions, the backend is defined at compile time and can not be changed once built. In this new release, users can choose between rabit and federated. (#8029, #8351, #8350, #8342, #8340, #8325, #8279, #8181, #8027, #7958, #7831, #7879, #8257, #8316, #8242, #8057, #8203, #8038, #7965, #7930, #7911)

      The feature is available in the public PyPI binary package for testing.

      Quantile DMatrix

      Before 1.7, XGBoost has an internal data structure called DeviceQuantileDMatrix (and its distributed version). We now extend its support to CPU and renamed it to QuantileDMatrix. This data structure is used for optimizing memory usage for the hist and gpu_hist tree methods. The new feature helps reduce CPU memory usage significantly, especially for dense data. The new QuantileDMatrix can be initialized from both CPU and GPU data, and regardless of where the data comes from, the constructed instance can be used by both the CPU algorithm and GPU algorithm including training and prediction (with some overhead of conversion if the device of data and training algorithm doesn't match). Also, a new parameter ref is added to QuantileDMatrix, which can be used to construct validation/test datasets. Lastly, it's set as default in the scikit-learn interface when a supported tree method is specified by users. (#7889, #7923, #8136, #8215, #8284, #8268, #8220, #8346, #8327, #8130, #8116, #8103, #8094, #8086, #7898, #8060, #8019, #8045, #7901, #7912, #7922)

      Mean absolute error

      The mean absolute error is a new member of the collection of objectives in XGBoost. It's noteworthy since MAE has zero hessian value, which is unusual to XGBoost as XGBoost relies on Newton optimization. Without valid Hessian values, the convergence speed can be slow. As part of the support for MAE, we added line searches into the XGBoost training algorithm to overcome the difficulty of training without valid Hessian values. In the future, we will extend the line search to other objectives where it's appropriate for faster convergence speed. (#8343, #8107, #7812, #8380)

      XGBoost on Browser

      With the help of the pyodide project, you can now run XGBoost on browsers. (#7954, #8369)

      Experimental IPv6 Support for Dask

      With the growing adaption of the new internet protocol, XGBoost joined the club. In the latest release, the Dask interface can be used on IPv6 clusters, see XGBoost's Dask tutorial for details. (#8225, #8234)

      Optimizations

      We have new optimizations for both the hist and gpu_hist tree methods to make XGBoost's training even more efficient.

      • Hist Hist now supports optional by-column histogram build, which is automatically configured based on various conditions of input data. This helps the XGBoost CPU hist algorithm to scale better with different shapes of training datasets. (#8233, #8259). Also, the build histogram kernel now can better utilize CPU registers (#8218)

      • GPU Hist GPU hist performance is significantly improved for wide datasets. GPU hist now supports batched node build, which reduces kernel latency and increases throughput. The improvement is particularly significant when growing deep trees with the default depthwise policy. (#7919, #8073, #8051, #8118, #7867, #7964, #8026)

      Breaking Changes

      ... (truncated)

      Commits

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • Bump flake8 from 5.0.4 to 6.0.0

      Bump flake8 from 5.0.4 to 6.0.0

      Bumps flake8 from 5.0.4 to 6.0.0.

      Commits

      Dependabot compatibility score

      Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


      Dependabot commands and options

      You can trigger Dependabot actions by commenting on this PR:

      • @dependabot rebase will rebase this PR
      • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
      • @dependabot merge will merge this PR after your CI passes on it
      • @dependabot squash and merge will squash and merge this PR after your CI passes on it
      • @dependabot cancel merge will cancel a previously requested merge and block automerging
      • @dependabot reopen will reopen this PR if it is closed
      • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
      • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
      • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • Feature Request: support for multioutput regression

      Feature Request: support for multioutput regression

      Nice library thanks!

      Perhaps I missed something, but it looks like multi-output regression is unsupported? If so, is it on the roadmap? Happy to help if needed.

    Example of Neural Network models of social and personality psychology phenomena

    SocialNN Example of Neural Network models of social and personality psychology phenomena This repository gathers a collection of neural network models

    Dec 5, 2022
    A collection of basic tools that make working with polynomials easier in Go

    PolyGo A collection of basic tools that make working with polynomials easier in

    Dec 8, 2022
    A native Go clean room implementation of the Porter Stemming algorithm.

    Go Porter Stemmer A native Go clean room implementation of the Porter Stemming Algorithm. This algorithm is of interest to people doing Machine Learni

    Jan 3, 2023
    A Kubernetes Native Batch System (Project under CNCF)
    A Kubernetes Native Batch System (Project under CNCF)

    Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

    Jan 9, 2023
    Katib is a Kubernetes-native project for automated machine learning (AutoML).
    Katib is a Kubernetes-native project for automated machine learning (AutoML).

    Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architec

    Jan 2, 2023
    Turn shell commands into web services
    Turn shell commands into web services

    webify Turn functions and commands into web services For a real world example, see turning a Python function into a web service. Overview webify is a

    Dec 22, 2022
    a cheat-sheet for mathematical notation in code form

    math-as-code Chinese translation (中文版) Python version (English) This is a reference to ease developers into mathematical notation by showing compariso

    Dec 27, 2022
    Advent of Code 2016 in Go using only GitHub Copilot

    Advent of Gopilot Solutions to Advent of Code 2016 in Go using only GitHub Copilot. "Rules" The idea is to have GitHub Copilot generate all the actual

    Nov 16, 2021
    atwhy is a tool to describe your decisions inside the code where they are actually made and still get a readable documentation.

    atwhy What is atwhy atwhy can be used to generate a documentation out of comments in the code. That way you can for example describe all available opt

    Oct 30, 2022
    Quiz master - Code Submission for Testing Backend Skills

    Quiz Master Code Submission for Testing Backend Skills Running App Setting up ./

    Jan 12, 2022
    PHP functions implementation to Golang. This package is for the Go beginners who have developed PHP code before. You can use PHP like functions in your app, module etc. when you add this module to your project.

    PHP Functions for Golang - phpfuncs PHP functions implementation to Golang. This package is for the Go beginners who have developed PHP code before. Y

    Dec 30, 2022
    Solutions to AlgoExpert Problems in Six Programming Languages: Python, Java, Go, C++, C#, JavaScript/TypeScript
    Solutions to AlgoExpert Problems in Six Programming Languages: Python, Java, Go, C++, C#, JavaScript/TypeScript

    Solutions to AlgoExpert Problems in Six Programming Languages: Python, Java, Go, C++, C#, JavaScript/TypeScript Discover solutions to AlgoExpert probl

    Dec 11, 2022
    Go API backed by the native Dart Sass Embedded executable.

    This is a Go API backed by the native Dart Sass Embedded executable. The primary motivation for this project is to provide SCSS support to Hugo. I wel

    Jan 5, 2023
    XSD (XML Schema Definition) parser and Go/C/Java/Rust/TypeScript code generator

    xgen Introduction xgen is a library written in pure Go providing a set of functions that allow you to parse XSD (XML schema definition) files. This li

    Jan 1, 2023
    Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package.
    Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package.

    Goridge is high performance PHP-to-Golang codec library which works over native PHP sockets and Golang net/rpc package. The library allows you to call Go service methods from PHP with a minimal footprint, structures and []byte support.

    Dec 28, 2022
    A Cloud Native Buildpack that contributes SDKMAN and uses it to install dependencies like the Java Virtual Machine

    gcr.io/paketo-buildpacks/sdkman A Cloud Native Buildpack that contributes SDKMAN and uses it to install dependencies like the Java Virtual Machine. Be

    Jan 8, 2022
    Haskell-flavoured functions for Go :smiley:

    Hasgo Coverage status: gocover.io Our report card: Hasgo is a code generator with functions influenced by Haskell. It comes with some types out-of-the

    Dec 25, 2022
    Transform Go code into it's AST

    Welcome to go2ast ?? Transform Go code into it's AST Usage echo "a := 1" | go run main.go Example output []ast.Stmt { &ast.AssignStmt {

    Dec 13, 2022