Katib is a Kubernetes-native project for automated machine learning (AutoML).

logo

Build Status Coverage Status Go Report Card Releases Slack Status

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architecture Search.

Katib is the project which is agnostic to machine learning (ML) frameworks. It can tune hyperparameters of applications written in any language of the users’ choice and natively supports many ML frameworks, such as TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.

Katib can perform training jobs using any Kubernetes Custom Resources with out of the box support for Kubeflow Training Operator, Argo Workflows, Tekton Pipelines and many more.

Katib stands for secretary in Arabic.

Search Algorithms

Katib supports several search algorithms. Follow the Kubeflow documentation to know more about each algorithm and check the Suggestion service guide to implement your custom algorithm.

Hyperparameter Tuning Neural Architecture Search Early Stopping
Random Search ENAS Median Stop
Grid Search DARTS
Bayesian Optimization
TPE
Multivariate TPE
CMA-ES
Sobol's Quasirandom Sequence
HyperBand

To perform above algorithms Katib supports the following frameworks:

Installation

For the various Katib installs check the Kubeflow guide. Follow the next steps to install Katib standalone.

Prerequisites

This is the minimal requirements to install Katib:

  • Kubernetes >= 1.17
  • kubectl >= 1.21

Latest Version

For the latest Katib version run this command:

kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"

Release Version

For the specific Katib release (for example v0.11.1) run this command:

kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.11.1"

Make sure that all Katib components are running:

$ kubectl get pods -n kubeflow

NAME                                READY   STATUS      RESTARTS   AGE
katib-cert-generator-rw95w          0/1     Completed   0          35s
katib-controller-566595bdd8-hbxgf   1/1     Running     0          36s
katib-db-manager-57cd769cdb-4g99m   1/1     Running     0          36s
katib-mysql-7894994f88-5d4s5        1/1     Running     0          36s
katib-ui-5767cfccdc-pwg2x           1/1     Running     0          36s

For the Katib Experiments check the complete examples list.

Documentation

Community

We are always growing our community and invite new users and AutoML enthusiasts to contribute to the Katib project. The following links provide information about getting involved in the community:

Contributing

Please feel free to test the system! Developer guide is a good starting point for our developers.

Blog posts

Events

Citation

If you use Katib in a scientific publication, we would appreciate citations to the following paper:

A Scalable and Cloud-Native Hyperparameter Tuning System, George et al., arXiv:2006.02085, 2020.

Bibtex entry:

@misc{george2020katib,
    title={A Scalable and Cloud-Native Hyperparameter Tuning System},
    author={Johnu George and Ce Gao and Richard Liu and Hou Gang Liu and Yuan Tang and Ramdoot Pydipaty and Amit Kumar Saha},
    year={2020},
    eprint={2006.02085},
    archivePrefix={arXiv},
    primaryClass={cs.DC}
}
Owner
Kubeflow
Kubeflow is an open, community driven project to make it easy to deploy and manage an ML stack on Kubernetes
Kubeflow
Comments
  • how to collect the indicator of training results???

    how to collect the indicator of training results???

    /kind bug

    After completion of bayesianoptimization automated training, the corresponding indicator results cannot be collected. Could you please tell me how to collect the indicator of training results. My yaml file is as follows: apiVersion: "kubeflow.org/v1alpha3" kind: Experiment metadata: namespace: kubeflow labels: controller-tools.k8s.io: "1.0" name: bayesianoptimization-example spec: objective: type: maximize goal: 0.99 objectiveMetricName: Validation-accuracy additionalMetricNames: - accuracy algorithm: algorithmName: bayesianoptimization algorithmSettings: - name: "random_state" value: "10" parallelTrialCount: 3 maxTrialCount: 12 maxFailedTrialCount: 3 MetricsCollectorSpec: Collector: Kind: stdOut parameters: - name: --lr parameterType: double feasibleSpace: min: "0.01" max: "0.03" - name: --num-layers parameterType: int feasibleSpace: min: "2" max: "5" - name: --optimizer parameterType: categorical feasibleSpace: list: - sgd - adam - ftrl trialTemplate: goTemplate: rawTemplate: |- apiVersion: batch/v1 kind: Job metadata: name: {{.Trial}} namespace: {{.NameSpace}} spec: template: spec: containers: - name: {{.Trial}} image: docker.io/katib/mxnet-mnist-example command: - "python" - "/mxnet/example/image-classification/train_mnist.py" - "--batch-size=64" {{- with .HyperParameters}} {{- range .}} - "{{.Name}}={{.Value}}" {{- end}} {{- end}} restartPolicy: Never

    What steps did you take and what happened: [A clear and concise description of what the bug is.]

    What did you expect to happen:

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

    Environment:

    • Kubeflow version:0.7.0
    • Minikube version:
    • Kubernetes version: (use kubectl version):1.15.5
    • OS (e.g. from /etc/os-release):CentOS Linux release 7.7.1908
  • Disable dynamic creation for admission hooks and update dependencies

    Disable dynamic creation for admission hooks and update dependencies

    Fixes: https://github.com/kubeflow/katib/issues/1405.

    This PR introduces new mechanism to get certificate for webhooks. I updated YAMLs for our webhooks. I added initContainer to Katib controller which executes cert-generator.sh script. This script creates CertificateSigningRequest, katib-webhook-cert secret and patches webhooks configurations with appropriate caBundle. Since we have katib-webhook-cert secret in the manifest, cleanup process should delete everything.

    So we don't need to deploy cert-manager for Katib.

    @gaocegege @johnugeorge @yanniszark @kuikuikuizzZ @knkski What do you think about this approach ?

    Also I updated controller-runtime to v0.8.2 and k8s.io deps to v0.20.4. That requires some changes:

    • Change some packages location
    • Change the arguments for client calls (List, Get, etc.)
    • In the newer Kubernetes versions we can't add owner reference for cluster-scoped objects (e.g. PV) with namespace-scoped object (e.g. Suggestion). Thus, I have to disable owner reference for the PV which is created when Experiment has FromVolume resume policy. For that reason, I added PersistentVolumeReclaimPolicy: Delete for the PV and once PVC is garbage collected, PV should also be deleted.
    • I removed PyTorch operator from the dependencies because of this problem.

    I still need to make some tests and create new image for cert generator. It would be great if you can start to review this.

    /cc @gaocegege @johnugeorge

  • [feature] Reconsider the design of Trial Template

    [feature] Reconsider the design of Trial Template

    /kind feature

    Describe the solution you'd like [A clear and concise description of what you want to happen.]

    We need to marshal the TFJob to JSON string then use it to create experiments if we are using K8s client-go. It is not good. And, go template is ugly, too.

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

  • Switch to AWS CI/CD

    Switch to AWS CI/CD

    Related: https://github.com/kubeflow/katib/issues/1332. I will debug the infra in this PR.

    I also made few changes to improve CI/CD quality.

    /cc @gaocegege @johnugeorge /cc @Jeffwan @PatrickXYS @jlewi @Bobgy

  • Katib v1alpha2 API for CRDs

    Katib v1alpha2 API for CRDs

    @YujiOshima @gaocegege @johnugeorge @alexandraj777 @hougangliu @xyhuang

    This is an initial proposal for the Katib v1alpha2 API. The changes here reflect the discussion in https://github.com/kubeflow/katib/issues/370.

    Comments and suggestions are welcome.

    Please note that the NAS APIs are not included here since the feature is still in early phase.


    This change is Reviewable

  • Studyctl crd

    Studyctl crd

    Add StudyController CRD: studycontroller.kubeflow.org Operator: StudyController

    Update examples. This implementation is polling workers status in go process of StudyController. Though I understand this is not an elegant implementation, this is the least impact to existing codes.

    Next step we should make worker CRD and its controller and support multi-type jobs (k8s, TF-Job..). Assign @gaocegege


    This change is Reviewable

  • Population based training

    Population based training

    What this PR does / why we need it:

    Support the discovery of modulated hyperparameters rather than attempting to find a fixed set over the entire training process. The paper has more details about the technique.

    Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

    This PR provides some initial support for PBT within Katib (#1382).

    Checklist:

    • [ ] Docs included if any changes are user facing
  • Improve Katib README

    Improve Katib README

    Related: #1332. I will debug the infra in this PR.

    • [x] This is the PR to see if we can trigger AWS Presubmit.
    • [x] This is the PR to see if Github UI integrate aws-kf-ci-bot
  • can't set up CRD

    can't set up CRD "Experiment"

    when I deploy katib_v1alpha3 with scripts/v1alpha3/deploy.sh, the katib-controller pod gives the following error: {"level":"info","ts":1578296376.3173876,"logger":"entrypoint","msg":"Config:","experiment-suggestion-name":"default","cert-local-filesystem":false} {"level":"info","ts":1578296376.375878,"logger":"entrypoint","msg":"Registering Components."} {"level":"info","ts":1578296376.3765948,"logger":"entrypoint","msg":"Setting up controller"} {"level":"info","ts":1578296376.3766346,"logger":"experiment-controller","msg":"Using the default suggestion implementation"} {"level":"info","ts":1578296376.3767953,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"experiment-controller","source":"kind source: /, Kind="} {"level":"error","ts":1578296376.3768966,"logger":"kubebuilder.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":{"Group":"kubeflow.org","Kind":"Experiment"},"error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:89\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Watch\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.addWatch\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:119\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:107\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.377135,"logger":"experiment-controller","msg":"Experiment watch failed","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.addWatch\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:121\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:107\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.3772092,"logger":"experiment-controller","msg":"Trial watch failed","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:108\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.377267,"logger":"entrypoint","msg":"unable to register controllers to the manager","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:91\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

    And the ui pod gives the following error: 2020/01/06 06:56:46 CreateExperiment from YAML failed: no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"

  • Katib experiments run indefintely without completing a single trial

    Katib experiments run indefintely without completing a single trial

    /kind bug

    Hi, I'm setting a Katib job through the Kale deployment panel - after creating a Kale pipeline. The pipeline builds successfully but the Katib experiments run forever and don't complete a single trial.

    I expect the Katib jobs to run successfully, but to no avail.

    Any way/suggestion to go about this?

    Environment:

    • Kubeflow version (kfctl version):
    • Minikube version (minikube version):
    • Kubernetes version: (use kubectl version):
    • OS (e.g. from /etc/os-release):
  • ERROR:grpc._server:Exception calling application: Method not implemented!

    ERROR:grpc._server:Exception calling application: Method not implemented!

    /kind bug

    Hi, I'm having trouble using katib v1alpha3. First, I installed katib by the followings

    1. git clone https://github.com/kubeflow/katib
    2. sh katib/scripts/v1alpha3/deploy.sh

    And I tried to apply random-example.yaml kubectl apply -f random-example.yaml (example in katib/examples/v1alpha3)

    Results: kubectl get pods -n kubeflow NAME READY STATUS RESTARTS AGE katib-controller-6c6974678d-zsnlc 1/1 Running 1 24m katib-db-558f649dc6-8cd9t 1/1 Running 0 24m katib-manager-5f74bdff84-4d78z 1/1 Running 0 24m katib-ui-6568bd6b44-qbq5k 1/1 Running 0 24m random-example-random-846dc99654-bxb8j 1/1 Running 0 23m

    kubectl get trials -n kubeflow NAME TYPE STATUS AGE random-example-drpkvb4b Running True 23m random-example-k7xv6ktt Running True 23m random-example-w6jlwdp2 Running True 23m

    kubectl get experiment -n kubeflow -oyaml apiVersion: v1 items:

    • apiVersion: kubeflow.org/v1alpha3 kind: Experiment metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubeflow.org/v1alpha3","kind":"Experiment","metadata":{"annotations":{},"labels":{"controller-tools.k8s.io":"1.0"},"name":"random-example","namespace":"kubeflow"},"spec":{"algorithm":{"algorithmName":"random"},"maxFailedTrialCount":3,"maxTrialCount":12,"objective":{"additionalMetricNames":["accuracy"],"goal":0.99,"objectiveMetricName":"Validation-accuracy","type":"maximize"},"parallelTrialCount":3,"parameters":[{"feasibleSpace":{"max":"0.03","min":"0.01"},"name":"--lr","parameterType":"double"},{"feasibleSpace":{"max":"5","min":"2"},"name":"--num-layers","parameterType":"int"},{"feasibleSpace":{"list":["sgd","adam","ftrl"]},"name":"--optimizer","parameterType":"categorical"}],"trialTemplate":{"goTemplate":{"rawTemplate":"apiVersion: batch/v1\nkind: Job\nmetadata:\n name: {{.Trial}}\n namespace: {{.NameSpace}}\nspec:\n template:\n spec:\n containers:\n - name: {{.Trial}}\n image: docker.io/kubeflowkatib/mxnet-mnist-example\n command:\n - "python"\n - "/mxnet/example/image-classification/train_mnist.py"\n - "--batch-size=64"\n {{- with .HyperParameters}}\n {{- range .}}\n - "{{.Name}}={{.Value}}"\n {{- end}}\n {{- end}}\n restartPolicy: Never"}}}} creationTimestamp: "2019-12-20T07:58:52Z" finalizers:
      • update-prometheus-metrics generation: 2 labels: controller-tools.k8s.io: "1.0" name: random-example namespace: kubeflow resourceVersion: "11682124" selfLink: /apis/kubeflow.org/v1alpha3/namespaces/kubeflow/experiments/random-example uid: 9005bab0-22fe-11ea-8cf0-0679676001a5 spec: algorithm: algorithmName: random algorithmSettings: null maxFailedTrialCount: 3 maxTrialCount: 12 metricsCollectorSpec: collector: kind: StdOut objective: additionalMetricNames:
        • accuracy goal: 0.99 objectiveMetricName: Validation-accuracy type: maximize parallelTrialCount: 3 parameters:
      • feasibleSpace: max: "0.03" min: "0.01" name: --lr parameterType: double
      • feasibleSpace: max: "5" min: "2" name: --num-layers parameterType: int
      • feasibleSpace: list:
        • sgd
        • adam
        • ftrl name: --optimizer parameterType: categorical trialTemplate: goTemplate: rawTemplate: |- apiVersion: batch/v1 kind: Job metadata: name: {{.Trial}} namespace: {{.NameSpace}} spec: template: spec: containers: - name: {{.Trial}} image: docker.io/kubeflowkatib/mxnet-mnist-example command: - "python" - "/mxnet/example/image-classification/train_mnist.py" - "--batch-size=64" {{- with .HyperParameters}} {{- range .}} - "{{.Name}}={{.Value}}" {{- end}} {{- end}} restartPolicy: Never status: conditions:
      • lastTransitionTime: "2019-12-20T07:58:52Z" lastUpdateTime: "2019-12-20T07:58:52Z" message: Experiment is created reason: ExperimentCreated status: "True" type: Created
      • lastTransitionTime: "2019-12-20T08:00:22Z" lastUpdateTime: "2019-12-20T08:00:22Z" message: Experiment is running reason: ExperimentRunning status: "True" type: Running currentOptimalTrial: observation: metrics: null parameterAssignments: null startTime: "2019-12-20T07:58:52Z" trials: 3 trialsRunning: 3 kind: List metadata: resourceVersion: "" selfLink: ""

    kubectl logs -n kubeflow random-example-random-846dc99654-bxb8j INFO:hyperopt.utils:Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support. INFO:hyperopt.fmin:Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support. ERROR:grpc._server:Exception calling application: Method not implemented! Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/grpc/_server.py", line 434, in _call_behavior response_or_iterator = behavior(argument, context) File "/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1alpha3/python/api_pb2_grpc.py", line 135, in ValidateAlgorithmSettings raise NotImplementedError('Method not implemented!') NotImplementedError: Method not implemented!

    What can I do to fix it? Thank you for your help in solving this problem.

    • Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5+icp", GitCommit:"903c3b31caddc675ce2d8bddf62aa0f875c2a3bc", GitTreeState:"clean", BuildDate:"2019-05-08T06:16:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5+icp", GitCommit:"903c3b31caddc675ce2d8bddf62aa0f875c2a3bc", GitTreeState:"clean", BuildDate:"2019-05-08T06:16:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

    • OS (e.g. from /etc/os-release): CentOS Linux release 7.7.1908 (Core)

  • Support Kubernetes v1.26

    Support Kubernetes v1.26

    /kind feature

    Describe the solution you'd like [A clear and concise description of what you want to happen.] We need to support Kubernetes v1.26 since that version was released on 2022-12-9.

    https://kubernetes.io/releases/#release-v1-26

    Maybe, we can support that version after the next katib release. This means supporting v1.26 is out of scope in katib v0.15.0.

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]


    Love this feature? Give it a πŸ‘ We prioritize the features with the most πŸ‘

  • The `operators` directory is a bit out of date

    The `operators` directory is a bit out of date

    /kind discussion

    Describe the solution you'd like [A clear and concise description of what you want to happen.] The operators directory corresponds to katib v0.12.0, which is a bit out of date.

    Also, it looks like the latest Charmed katib-operator exists at https://github.com/canonical/katib-operators. Those Charmed katib-operators don't seem to sync. This situation is likely to be confusing for users.

    @DomFleischmann @DnPlas @ca-scribner @knkski Would you like to keep maintaining both kubeflow/katib/operators and canonical/katib-operators? Or would you like to remove Charmed katib-operator from this repository (katib repo)?

    /cc @kubeflow/wg-automl-leads

    Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]


    Love this feature? Give it a πŸ‘ We prioritize the features with the most πŸ‘

  • Remove Chocolate Suggestion Service

    Remove Chocolate Suggestion Service

    Signed-off-by: Yuki Iwai [email protected]

    What this PR does / why we need it: I removed all coded related Chocolate Suggestion Service.

    Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Part-of #2058

    Checklist:

    • [ ] Docs included if any changes are user facing
  • Add Label to Disable Katib Webhooks

    Add Label to Disable Katib Webhooks

    In this PR: https://github.com/kubeflow/katib/pull/2018#issuecomment-1330713221, I proposed to introduce label for disabling Katib Webhooks (validator, defaulter, mutator). For example: katib.kubeflow.org/webhooks: disabled. Let's discuss if that would be useful for the users with large-scale environment.

    Currently, if user's namespace has katib.kubeflow.org/metrics-collector-injection: enabled label, Katib Mutation Webhook runs for every Pod in that namespace. That might increase latency in the Kubernetes API server. Some users might want to use Katib Experiments and run other pods in their namespaces without Webhook execution.

    What do you think @gaocegege @johnugeorge @tenzen-y @anencore94 @terrytangyuan ?

    /kind discussion


    Love this feature? Give it a πŸ‘ We prioritize the features with the most πŸ‘

  • kwa(front): Sort conditions table by timestamp

    kwa(front): Sort conditions table by timestamp

    This is a follow-up PR to this and updates the COMMIT file to checkout to the latest KF commit and have the conditions table sorted by timestamp by default.

On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Jan 5, 2023
Gorgonia is a library that helps facilitate machine learning in Go.
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 30, 2022
Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Nov 10, 2022
Gorgonia is a library that helps facilitate machine learning in Go.
Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 27, 2022
Prophecis is a one-stop machine learning platform developed by WeBank
Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

Dec 28, 2022
Go Machine Learning Benchmarks
Go Machine Learning Benchmarks

Benchmarks of machine learning inference for Go

Dec 30, 2022
Deploy, manage, and scale machine learning models in production
Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Dec 30, 2022
A High-level Machine Learning Library for Go
A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

Nov 20, 2022
Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Jan 9, 2023
PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.
PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

δΈ­ζ–‡ | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

Dec 14, 2022
Self-contained Machine Learning and Natural Language Processing library in Go
Self-contained Machine Learning and Natural Language Processing library in Go

Self-contained Machine Learning and Natural Language Processing library in Go

Jan 8, 2023
A Kubernetes Native Batch System (Project under CNCF)
A Kubernetes Native Batch System (Project under CNCF)

Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

Jan 9, 2023
Reinforcement Learning in Go
Reinforcement Learning in Go

Overview Gold is a reinforcement learning library for Go. It provides a set of agents that can be used to solve challenges in various environments. Th

Dec 11, 2022
Spice.ai is an open source, portable runtime for training and using deep learning on time series data.
Spice.ai is an open source, portable runtime for training and using deep learning on time series data.

Spice.ai Spice.ai is an open source, portable runtime for training and using deep learning on time series data. ⚠️ DEVELOPER PREVIEW ONLY Spice.ai is

Dec 15, 2022
FlyML perfomant real time mashine learning libraryes in Go

FlyML perfomant real time mashine learning libraryes in Go simple & perfomant logistic regression (~100 LoC) Status: WIP! Validated on mushrooms datas

May 30, 2022
Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Oct 31, 2022
A tool for building identical machine images for multiple platforms from a single source configuration
A tool for building identical machine images for multiple platforms from a single source configuration

Packer Packer is a tool for building identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs o

Oct 3, 2021
A native Go clean room implementation of the Porter Stemming algorithm.

Go Porter Stemmer A native Go clean room implementation of the Porter Stemming Algorithm. This algorithm is of interest to people doing Machine Learni

Jan 3, 2023
A Hackathon project created by Alpha Interface team for Agri-D Food Hack

Alpha Interface A Hackathon project created by Alpha Interface team for Agri-D Food Hack Installation Downloading Wasp and wasp-cli https://wiki.iota.

Oct 16, 2022