Katib is a Kubernetes-native project for automated machine learning (AutoML).

Last update: Jan 2, 2023

Comments: 16

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architecture Search.

Katib is the project which is agnostic to machine learning (ML) frameworks. It can tune hyperparameters of applications written in any language of the users’ choice and natively supports many ML frameworks, such as TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.

Katib can perform training jobs using any Kubernetes Custom Resources with out of the box support for Kubeflow Training Operator, Argo Workflows, Tekton Pipelines and many more.

Katib stands for secretary in Arabic.

Search Algorithms

Katib supports several search algorithms. Follow the Kubeflow documentation to know more about each algorithm and check the Suggestion service guide to implement your custom algorithm.

Hyperparameter Tuning	Neural Architecture Search	Early Stopping
Random Search	ENAS	Median Stop
Grid Search	DARTS
Bayesian Optimization
TPE
Multivariate TPE
CMA-ES
Sobol's Quasirandom Sequence
HyperBand

To perform above algorithms Katib supports the following frameworks:

Installation

For the various Katib installs check the Kubeflow guide. Follow the next steps to install Katib standalone.

Prerequisites

This is the minimal requirements to install Katib:

Kubernetes >= 1.17
kubectl >= 1.21

Latest Version

For the latest Katib version run this command:

kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=master"

Release Version

For the specific Katib release (for example v0.11.1) run this command:

kubectl apply -k "github.com/kubeflow/katib.git/manifests/v1beta1/installs/katib-standalone?ref=v0.11.1"

Make sure that all Katib components are running:

$ kubectl get pods -n kubeflow

NAME                                READY   STATUS      RESTARTS   AGE
katib-cert-generator-rw95w          0/1     Completed   0          35s
katib-controller-566595bdd8-hbxgf   1/1     Running     0          36s
katib-db-manager-57cd769cdb-4g99m   1/1     Running     0          36s
katib-mysql-7894994f88-5d4s5        1/1     Running     0          36s
katib-ui-5767cfccdc-pwg2x           1/1     Running     0          36s

For the Katib Experiments check the complete examples list.

Documentation

Run your first Katib Experiment in the getting started guide.
Learn about Katib Concepts in this guide.
Learn about Katib Interfaces in this guide.
Learn about Katib Components in this guide.
Know more about Katib in the presentations and demos list.

Community

We are always growing our community and invite new users and AutoML enthusiasts to contribute to the Katib project. The following links provide information about getting involved in the community:

Subscribe to the AutoML calendar to attend Working Group bi-weekly community meetings.
Check the AutoML and Training Working Group meeting notes.
If you use Katib, please update the adopters list.

Contributing

Please feel free to test the system! Developer guide is a good starting point for our developers.

Blog posts

Kubeflow Katib: Scalable, Portable and Cloud Native System for AutoML (by Andrey Velichkevich)

Events

AutoML and Training WG Summit. 16th of July 2021

Citation

If you use Katib in a scientific publication, we would appreciate citations to the following paper:

A Scalable and Cloud-Native Hyperparameter Tuning System, George et al., arXiv:2006.02085, 2020.

Bibtex entry:

@misc{george2020katib,
    title={A Scalable and Cloud-Native Hyperparameter Tuning System},
    author={Johnu George and Ce Gao and Richard Liu and Hou Gang Liu and Yuan Tang and Ramdoot Pydipaty and Amit Kumar Saha},
    year={2020},
    eprint={2006.02085},
    archivePrefix={arXiv},
    primaryClass={cs.DC}
}

Owner

Kubeflow

Kubeflow is an open, community driven project to make it easy to deploy and manage an ML stack on Kubernetes

https://github.com/kubeflow/katib

Comments

how to collect the indicator of training results???
/kind bug

After completion of bayesianoptimization automated training, the corresponding indicator results cannot be collected. Could you please tell me how to collect the indicator of training results. My yaml file is as follows: apiVersion: "kubeflow.org/v1alpha3" kind: Experiment metadata: namespace: kubeflow labels: controller-tools.k8s.io: "1.0" name: bayesianoptimization-example spec: objective: type: maximize goal: 0.99 objectiveMetricName: Validation-accuracy additionalMetricNames: - accuracy algorithm: algorithmName: bayesianoptimization algorithmSettings: - name: "random_state" value: "10" parallelTrialCount: 3 maxTrialCount: 12 maxFailedTrialCount: 3 MetricsCollectorSpec: Collector: Kind: stdOut parameters: - name: --lr parameterType: double feasibleSpace: min: "0.01" max: "0.03" - name: --num-layers parameterType: int feasibleSpace: min: "2" max: "5" - name: --optimizer parameterType: categorical feasibleSpace: list: - sgd - adam - ftrl trialTemplate: goTemplate: rawTemplate: |- apiVersion: batch/v1 kind: Job metadata: name: {{.Trial}} namespace: {{.NameSpace}} spec: template: spec: containers: - name: {{.Trial}} image: docker.io/katib/mxnet-mnist-example command: - "python" - "/mxnet/example/image-classification/train_mnist.py" - "--batch-size=64" {{- with .HyperParameters}} {{- range .}} - "{{.Name}}={{.Value}}" {{- end}} {{- end}} restartPolicy: Never

What steps did you take and what happened: [A clear and concise description of what the bug is.]

What did you expect to happen:

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Environment:

Kubeflow version:0.7.0

Minikube version:

Kubernetes version: (use kubectl version):1.15.5

OS (e.g. from /etc/os-release):CentOS Linux release 7.7.1908
Disable dynamic creation for admission hooks and update dependencies
Fixes: https://github.com/kubeflow/katib/issues/1405.

This PR introduces new mechanism to get certificate for webhooks. I updated YAMLs for our webhooks. I added initContainer to Katib controller which executes cert-generator.sh script. This script creates CertificateSigningRequest, katib-webhook-cert secret and patches webhooks configurations with appropriate caBundle. Since we have katib-webhook-cert secret in the manifest, cleanup process should delete everything.

So we don't need to deploy cert-manager for Katib.

@gaocegege @johnugeorge @yanniszark @kuikuikuizzZ @knkski What do you think about this approach ?

Also I updated controller-runtime to v0.8.2 and k8s.io deps to v0.20.4. That requires some changes:

Change some packages location

Change the arguments for client calls (List, Get, etc.)

In the newer Kubernetes versions we can't add owner reference for cluster-scoped objects (e.g. PV) with namespace-scoped object (e.g. Suggestion). Thus, I have to disable owner reference for the PV which is created when Experiment has FromVolume resume policy. For that reason, I added PersistentVolumeReclaimPolicy: Delete for the PV and once PVC is garbage collected, PV should also be deleted.

I removed PyTorch operator from the dependencies because of this problem.

I still need to make some tests and create new image for cert generator. It would be great if you can start to review this.

/cc @gaocegege @johnugeorge
[feature] Reconsider the design of Trial Template

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.]

We need to marshal the TFJob to JSON string then use it to create experiments if we are using K8s client-go. It is not good. And, go template is ugly, too.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Switch to AWS CI/CD

Related: https://github.com/kubeflow/katib/issues/1332. I will debug the infra in this PR.

I also made few changes to improve CI/CD quality.

/cc @gaocegege @johnugeorge /cc @Jeffwan @PatrickXYS @jlewi @Bobgy
Katib v1alpha2 API for CRDs

@YujiOshima @gaocegege @johnugeorge @alexandraj777 @hougangliu @xyhuang

This is an initial proposal for the Katib v1alpha2 API. The changes here reflect the discussion in https://github.com/kubeflow/katib/issues/370.

Comments and suggestions are welcome.

Please note that the NAS APIs are not included here since the feature is still in early phase.

This change is
Studyctl crd

Add StudyController CRD: studycontroller.kubeflow.org Operator: StudyController

Update examples. This implementation is polling workers status in go process of StudyController. Though I understand this is not an elegant implementation, this is the least impact to existing codes.

Next step we should make worker CRD and its controller and support multi-type jobs (k8s, TF-Job..). Assign @gaocegege

This change is
Population based training
What this PR does / why we need it:

Support the discovery of modulated hyperparameters rather than attempting to find a fixed set over the entire training process. The paper has more details about the technique.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

This PR provides some initial support for PBT within Katib (#1382).

Checklist:

[ ] Docs included if any changes are user facing
Improve Katib README
Related: #1332. I will debug the infra in this PR.

[x] This is the PR to see if we can trigger AWS Presubmit.

[x] This is the PR to see if Github UI integrate aws-kf-ci-bot
can't set up CRD "Experiment"

when I deploy katib_v1alpha3 with scripts/v1alpha3/deploy.sh, the katib-controller pod gives the following error: {"level":"info","ts":1578296376.3173876,"logger":"entrypoint","msg":"Config:","experiment-suggestion-name":"default","cert-local-filesystem":false} {"level":"info","ts":1578296376.375878,"logger":"entrypoint","msg":"Registering Components."} {"level":"info","ts":1578296376.3765948,"logger":"entrypoint","msg":"Setting up controller"} {"level":"info","ts":1578296376.3766346,"logger":"experiment-controller","msg":"Using the default suggestion implementation"} {"level":"info","ts":1578296376.3767953,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"experiment-controller","source":"kind source: /, Kind="} {"level":"error","ts":1578296376.3768966,"logger":"kubebuilder.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":{"Group":"kubeflow.org","Kind":"Experiment"},"error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:89\ngithub.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Watch\n\t/go/src/github.com/kubeflow/katib/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:122\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.addWatch\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:119\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:107\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.377135,"logger":"experiment-controller","msg":"Experiment watch failed","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.addWatch\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:121\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:107\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.3772092,"logger":"experiment-controller","msg":"Trial watch failed","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:108\ngithub.com/kubeflow/katib/pkg/controller.v1alpha3/experiment.Add\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/experiment/experiment_controller.go:62\ngithub.com/kubeflow/katib/pkg/controller%2ev1alpha3.AddToManager\n\t/go/src/github.com/kubeflow/katib/pkg/controller.v1alpha3/controller.go:28\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:90\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"} {"level":"error","ts":1578296376.377267,"logger":"entrypoint","msg":"unable to register controllers to the manager","error":"no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"","stacktrace":"github.com/kubeflow/katib/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/kubeflow/katib/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/kubeflow/katib/cmd/katib-controller/v1alpha3/main.go:91\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

And the ui pod gives the following error: 2020/01/06 06:56:46 CreateExperiment from YAML failed: no matches for kind "Experiment" in version "kubeflow.org/v1alpha3"
Katib experiments run indefintely without completing a single trial
/kind bug

Hi, I'm setting a Katib job through the Kale deployment panel - after creating a Kale pipeline. The pipeline builds successfully but the Katib experiments run forever and don't complete a single trial.

I expect the Katib jobs to run successfully, but to no avail.

Any way/suggestion to go about this?

Environment:

Kubeflow version (kfctl version):

Minikube version (minikube version):

Kubernetes version: (use kubectl version):

OS (e.g. from /etc/os-release):
ERROR:grpc._server:Exception calling application: Method not implemented!
/kind bug

Hi, I'm having trouble using katib v1alpha3. First, I installed katib by the followings

git clone https://github.com/kubeflow/katib

sh katib/scripts/v1alpha3/deploy.sh

And I tried to apply random-example.yaml kubectl apply -f random-example.yaml (example in katib/examples/v1alpha3)

Results: kubectl get pods -n kubeflow NAME READY STATUS RESTARTS AGE katib-controller-6c6974678d-zsnlc 1/1 Running 1 24m katib-db-558f649dc6-8cd9t 1/1 Running 0 24m katib-manager-5f74bdff84-4d78z 1/1 Running 0 24m katib-ui-6568bd6b44-qbq5k 1/1 Running 0 24m random-example-random-846dc99654-bxb8j 1/1 Running 0 23m

kubectl get trials -n kubeflow NAME TYPE STATUS AGE random-example-drpkvb4b Running True 23m random-example-k7xv6ktt Running True 23m random-example-w6jlwdp2 Running True 23m

kubectl get experiment -n kubeflow -oyaml apiVersion: v1 items:

apiVersion: kubeflow.org/v1alpha3 kind: Experiment metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"kubeflow.org/v1alpha3","kind":"Experiment","metadata":{"annotations":{},"labels":{"controller-tools.k8s.io":"1.0"},"name":"random-example","namespace":"kubeflow"},"spec":{"algorithm":{"algorithmName":"random"},"maxFailedTrialCount":3,"maxTrialCount":12,"objective":{"additionalMetricNames":["accuracy"],"goal":0.99,"objectiveMetricName":"Validation-accuracy","type":"maximize"},"parallelTrialCount":3,"parameters":[{"feasibleSpace":{"max":"0.03","min":"0.01"},"name":"--lr","parameterType":"double"},{"feasibleSpace":{"max":"5","min":"2"},"name":"--num-layers","parameterType":"int"},{"feasibleSpace":{"list":["sgd","adam","ftrl"]},"name":"--optimizer","parameterType":"categorical"}],"trialTemplate":{"goTemplate":{"rawTemplate":"apiVersion: batch/v1\nkind: Job\nmetadata:\n name: {{.Trial}}\n namespace: {{.NameSpace}}\nspec:\n template:\n spec:\n containers:\n - name: {{.Trial}}\n image: docker.io/kubeflowkatib/mxnet-mnist-example\n command:\n - "python"\n - "/mxnet/example/image-classification/train_mnist.py"\n - "--batch-size=64"\n {{- with .HyperParameters}}\n {{- range .}}\n - "{{.Name}}={{.Value}}"\n {{- end}}\n {{- end}}\n restartPolicy: Never"}}}} creationTimestamp: "2019-12-20T07:58:52Z" finalizers:

update-prometheus-metrics generation: 2 labels: controller-tools.k8s.io: "1.0" name: random-example namespace: kubeflow resourceVersion: "11682124" selfLink: /apis/kubeflow.org/v1alpha3/namespaces/kubeflow/experiments/random-example uid: 9005bab0-22fe-11ea-8cf0-0679676001a5 spec: algorithm: algorithmName: random algorithmSettings: null maxFailedTrialCount: 3 maxTrialCount: 12 metricsCollectorSpec: collector: kind: StdOut objective: additionalMetricNames:

accuracy goal: 0.99 objectiveMetricName: Validation-accuracy type: maximize parallelTrialCount: 3 parameters:

feasibleSpace: max: "0.03" min: "0.01" name: --lr parameterType: double

feasibleSpace: max: "5" min: "2" name: --num-layers parameterType: int

feasibleSpace: list:

sgd

adam

ftrl name: --optimizer parameterType: categorical trialTemplate: goTemplate: rawTemplate: |- apiVersion: batch/v1 kind: Job metadata: name: {{.Trial}} namespace: {{.NameSpace}} spec: template: spec: containers: - name: {{.Trial}} image: docker.io/kubeflowkatib/mxnet-mnist-example command: - "python" - "/mxnet/example/image-classification/train_mnist.py" - "--batch-size=64" {{- with .HyperParameters}} {{- range .}} - "{{.Name}}={{.Value}}" {{- end}} {{- end}} restartPolicy: Never status: conditions:

lastTransitionTime: "2019-12-20T07:58:52Z" lastUpdateTime: "2019-12-20T07:58:52Z" message: Experiment is created reason: ExperimentCreated status: "True" type: Created

lastTransitionTime: "2019-12-20T08:00:22Z" lastUpdateTime: "2019-12-20T08:00:22Z" message: Experiment is running reason: ExperimentRunning status: "True" type: Running currentOptimalTrial: observation: metrics: null parameterAssignments: null startTime: "2019-12-20T07:58:52Z" trials: 3 trialsRunning: 3 kind: List metadata: resourceVersion: "" selfLink: ""

kubectl logs -n kubeflow random-example-random-846dc99654-bxb8j INFO:hyperopt.utils:Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support. INFO:hyperopt.fmin:Failed to load dill, try installing dill via "pip install dill" for enhanced pickling support. ERROR:grpc._server:Exception calling application: Method not implemented! Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/grpc/_server.py", line 434, in _call_behavior response_or_iterator = behavior(argument, context) File "/usr/src/app/github.com/kubeflow/katib/pkg/apis/manager/v1alpha3/python/api_pb2_grpc.py", line 135, in ValidateAlgorithmSettings raise NotImplementedError('Method not implemented!') NotImplementedError: Method not implemented!

What can I do to fix it? Thank you for your help in solving this problem.

Kubernetes version: (use kubectl version): Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5+icp", GitCommit:"903c3b31caddc675ce2d8bddf62aa0f875c2a3bc", GitTreeState:"clean", BuildDate:"2019-05-08T06:16:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5+icp", GitCommit:"903c3b31caddc675ce2d8bddf62aa0f875c2a3bc", GitTreeState:"clean", BuildDate:"2019-05-08T06:16:32Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

OS (e.g. from /etc/os-release): CentOS Linux release 7.7.1908 (Core)
Support Kubernetes v1.26

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.] We need to support Kubernetes v1.26 since that version was released on 2022-12-9.

https://kubernetes.io/releases/#release-v1-26

Maybe, we can support that version after the next katib release. This means supporting v1.26 is out of scope in katib v0.15.0.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Love this feature? Give it a 👍 We prioritize the features with the most 👍
The `operators` directory is a bit out of date

/kind discussion

Describe the solution you'd like [A clear and concise description of what you want to happen.] The operators directory corresponds to katib v0.12.0, which is a bit out of date.

Also, it looks like the latest Charmed katib-operator exists at https://github.com/canonical/katib-operators. Those Charmed katib-operators don't seem to sync. This situation is likely to be confusing for users.

@DomFleischmann @DnPlas @ca-scribner @knkski Would you like to keep maintaining both kubeflow/katib/operators and canonical/katib-operators? Or would you like to remove Charmed katib-operator from this repository (katib repo)?

/cc @kubeflow/wg-automl-leads

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

Love this feature? Give it a 👍 We prioritize the features with the most 👍
Remove Chocolate Suggestion Service
Signed-off-by: Yuki Iwai [email protected]

What this PR does / why we need it: I removed all coded related Chocolate Suggestion Service.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Part-of #2058

Checklist:

[ ] Docs included if any changes are user facing
Add Label to Disable Katib Webhooks

In this PR: https://github.com/kubeflow/katib/pull/2018#issuecomment-1330713221, I proposed to introduce label for disabling Katib Webhooks (validator, defaulter, mutator). For example: katib.kubeflow.org/webhooks: disabled. Let's discuss if that would be useful for the users with large-scale environment.

Currently, if user's namespace has katib.kubeflow.org/metrics-collector-injection: enabled label, Katib Mutation Webhook runs for every Pod in that namespace. That might increase latency in the Kubernetes API server. Some users might want to use Katib Experiments and run other pods in their namespaces without Webhook execution.

What do you think @gaocegege @johnugeorge @tenzen-y @anencore94 @terrytangyuan ?

/kind discussion

Love this feature? Give it a 👍 We prioritize the features with the most 👍
kwa(front): Sort conditions table by timestamp

This is a follow-up PR to this and updates the COMMIT file to checkout to the latest KF commit and have the conditions table sorted by timestamp by default.

Related tags

Machine Learning katib

On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Jan 5, 2023

Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 30, 2022

Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Nov 10, 2022

Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 27, 2022

Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

Dec 28, 2022

Go Machine Learning Benchmarks

Benchmarks of machine learning inference for Go

Dec 30, 2022

Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production. Cortex is a cloud native model serving platform for machine learning engineering teams.

Dec 30, 2022

A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

Nov 20, 2022

Standard machine learning models

Cog: Standard machine learning models Define your models in a standard format, store them in a central place, run them anywhere. Standard interface fo

Jan 9, 2023

PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

中文 | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

Dec 14, 2022

Self-contained Machine Learning and Natural Language Processing library in Go

Jan 8, 2023

A Kubernetes Native Batch System (Project under CNCF)

Volcano is a batch system built on Kubernetes. It provides a suite of mechanisms that are commonly required by many classes of batch & elastic workloa

Jan 9, 2023

Reinforcement Learning in Go

Overview Gold is a reinforcement learning library for Go. It provides a set of agents that can be used to solve challenges in various environments. Th

Dec 11, 2022

Spice.ai is an open source, portable runtime for training and using deep learning on time series data.

Spice.ai Spice.ai is an open source, portable runtime for training and using deep learning on time series data. ⚠️ DEVELOPER PREVIEW ONLY Spice.ai is

Dec 15, 2022

FlyML perfomant real time mashine learning libraryes in Go

FlyML perfomant real time mashine learning libraryes in Go simple & perfomant logistic regression (~100 LoC) Status: WIP! Validated on mushrooms datas

May 30, 2022

Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

DC DarkLantern A lantern is a portable case that protects light, A dark lantern is one who's light can be hidden at will. DC DarkLantern is a golang i

Oct 31, 2022

A tool for building identical machine images for multiple platforms from a single source configuration

Packer Packer is a tool for building identical machine images for multiple platforms from a single source configuration. Packer is lightweight, runs o

Oct 3, 2021

A native Go clean room implementation of the Porter Stemming algorithm.

Go Porter Stemmer A native Go clean room implementation of the Porter Stemming Algorithm. This algorithm is of interest to people doing Machine Learni

Jan 3, 2023

A Hackathon project created by Alpha Interface team for Agri-D Food Hack

Alpha Interface A Hackathon project created by Alpha Interface team for Agri-D Food Hack Installation Downloading Wasp and wasp-cli https://wiki.iota.

Oct 16, 2022

Katib is a Kubernetes-native project for automated machine learning (AutoML).

Search Algorithms

Installation

Prerequisites

Latest Version

Release Version

Documentation

Community

Contributing

Blog posts

Events

Citation

Owner

Kubeflow

Comments

how to collect the indicator of training results???

Disable dynamic creation for admission hooks and update dependencies

[feature] Reconsider the design of Trial Template

Switch to AWS CI/CD

Katib v1alpha2 API for CRDs

Studyctl crd

Population based training

Improve Katib README

can't set up CRD "Experiment"

Katib experiments run indefintely without completing a single trial

ERROR:grpc._server:Exception calling application: Method not implemented!

Support Kubernetes v1.26

The `operators` directory is a bit out of date

Remove Chocolate Suggestion Service

Add Label to Disable Katib Webhooks

kwa(front): Sort conditions table by timestamp

Related tags

On-line Machine Learning in Go (and so much more)

Gorgonia is a library that helps facilitate machine learning in Go.

Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

Gorgonia is a library that helps facilitate machine learning in Go.

Prophecis is a one-stop machine learning platform developed by WeBank

Go Machine Learning Benchmarks

Deploy, manage, and scale machine learning models in production

A High-level Machine Learning Library for Go

Standard machine learning models

PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

Self-contained Machine Learning and Natural Language Processing library in Go

A Kubernetes Native Batch System (Project under CNCF)

Reinforcement Learning in Go

Spice.ai is an open source, portable runtime for training and using deep learning on time series data.

FlyML perfomant real time mashine learning libraryes in Go

Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

A tool for building identical machine images for multiple platforms from a single source configuration

A native Go clean room implementation of the Porter Stemming algorithm.

A Hackathon project created by Alpha Interface team for Agri-D Food Hack