Deploy, manage, and scale machine learning models in production

Cortex Labs

Last update: Dec 30, 2022

Comments: 16

Website • Slack • Docs

Deploy, manage, and scale machine learning models in production

Cortex is a cloud native model serving platform for machine learning engineering teams.

Use cases

Realtime machine learning - build NLP, computer vision, and other APIs and integrate them into any application.
Large-scale inference - scale realtime or batch inference workloads across hundreds or thousands of instances.
Consistent MLOps workflows - create streamlined and reproducible MLOps workflows for any machine learning team.

Deploy

Deploy TensorFlow, PyTorch, ONNX, and other models using a simple CLI or Python client.
Run realtime inference, batch inference, asynchronous inference, and training jobs.
Define preprocessing and postprocessing steps in Python and chain workloads seamlessly.

$ cortex deploy apis.yaml

• creating text-generator (realtime API)
• creating image-classifier (batch API)
• creating video-analyzer (async API)

all APIs are ready!

Manage

Create A/B tests and shadow pipelines with configurable traffic splitting.
Automatically stream logs from every workload to your favorite log management tool.
Monitor your workloads with pre-built Grafana dashboards and add your own custom dashboards.

$ cortex get

API                 TYPE        GPUs
text-generator      realtime    32
image-classifier    batch       64
video-analyzer      async       16

Scale

Configure workload and cluster autoscaling to efficiently handle large-scale production workloads.
Create clusters with different types of instances for different types of workloads.
Spend less on cloud infrastructure by letting Cortex manage spot or preemptible instances.

$ cortex cluster info

provider: aws
region: us-east-1
instance_types: [c5.xlarge, g4dn.xlarge]
spot_instances: true
min_instances: 10
max_instances: 100

Owner

Cortex Labs

https://github.com/cortexlabs/cortex https://cortex.dev

Comments

Display realtime output
I have a text-generator language model that is compressed and is in .bin format, and it can accessed by command line terminal. It generates about one word per second and prints out every word in realtime when accessed through command line. I would like to deploy my model using cortex but I'm struggling to get the output in realtime word-by-word. Right now my code prints out only one line a time.

import subprocess def run_command(text): command = ['./mycommand', text] process = subprocess.Popen(command, stdout=subprocess.PIPE, universal_newlines=True, bufsize=-1) while True: output = process.stdout.readline() if output == '' and process.poll() is not None: break if output: print(output.strip()) run_command('TEXT')

One line may include about 20 words, so that means 1 line will be displayed every 20 seconds (since my model outputs roughly 1 word/ second). I would really like the output to be more dynamic and output one word at a time (as it does through command line) instead of just one line. Is there a way this can be achieved?
Persistent private instances

I would like to use Cortex functionality, to create an application where each user will be able to request and communicate with AWS instance for a period of time. In this scenario, data of each user will be processed and stored on one whole AWS instance. From the documentation, I understand that each API call will use an instance that it is not busy at the moment. It wouldn’t be ideal if by making an API call, a user would receive sensitive data stored by a another user on the same instance. Would it be possible to somehow mark an instance to which an API call is being made? That way the data of individual users wouldn’t be made accesible to everyone, but only to those users who request/use an instance.

Resource exhausted error

I'm trying to send audio files, which are fairly large, to the server and am getting a resource exhausted error. Is there any way to configure the server in order to increase the maximum allowed message size?

Here's the stack trace:

2020-12-24 23:30:14.941839:cortex:pid-2247:INFO:500 Internal Server Error POST /
2020-12-24 23:30:14.942071:cortex:pid-2247:ERROR:Exception in ASGI application
Traceback (most recent call last):
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line
390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/applications.py", line 181, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 187, in parse_payload
    return await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 134, in register_request
    response = await call_next(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
 File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 183, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/opt/conda/envs/env/lib/python3.6/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/opt/conda/envs/env/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/serve/serve.py", line 200, in predict
    prediction = predictor_impl.predict(**kwargs)
  File "/mnt/project/serving/cortex_server.py", line 10, in predict
    return self.client.predict({"waveform": np.array(payload["audio"]).astype("float32")})
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
114, in predict
    return self._run_inference(model_input, consts.SINGLE_MODEL_NAME, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/client/tensorflow.py", line
164, in _run_inference
    return self._client.predict(model_input, model_name, model_version)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/cortex_internal/lib/model/tfs.py", line 376, in
predict
    response_proto = self._pred.Predict(prediction_request, timeout=timeout)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/opt/conda/envs/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.RESOURCE_EXHAUSTED
        details = "Received message larger than max (102484524 vs. 4194304)"
        debug_error_string = "{"created":"@1608852614.937822193","description":"Received message larger
than max (102484524 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"

Per Process GPU Ram

As you have mentioned in docs gpus.md about the limiting the gpu ram, I know what exactly code snippet I have to use, but dont know exactly where I have to write that code snippet in cortex source code.

mem_limit_mb = 1024 for gpu in tf.config.list_physical_devices("GPU"): tf.config.set_logical_device_configuration( gpu, [tf.config.LogicalDeviceConfiguration(memory_limit=mem_limit_mb)])
Support aws_session_token for CLI auth

Description

In order to authenticate with the Cortex operator, the Cortex CLI should be able to use aws_session_token (currently only static credentials are supported).

Also, consider enabling auth via IAM role (e.g. inherited from Lambda, EC2)
Package Cortex library into .ZIP

I'm trying to create a microservice to manage my cluster via Cortex and Lambda. AWS Lambda requires python dependencies to be packaged and uploaded as a .zip files. How can I package Cortex library to .zip?

How to make Cortex XmlHttpRequest on HTTPS page?

I have website which runs on https:// and I can't make Cortex API XmlHttpRequest requests over it.

When running on localhost using http://, everything works fine:

function postData(url = '', data = {}) {
  // Default options are marked with *
  const response = await fetch(url, {
    method: 'POST', // *GET, POST, PUT, DELETE, etc.
    mode: 'cors', // no-cors, *cors, same-origin
    cache: 'no-cache', // *default, no-cache, reload, force-cache, only-if-cached
    credentials: 'same-origin', // include, *same-origin, omit
    headers: {
      'Content-Type': 'application/json'
      // 'Content-Type': 'application/x-www-form-urlencoded',
    },
    redirect: 'follow', // manual, *follow, error
    referrerPolicy: 'no-referrer', // no-referrer, *no-referrer-when-downgrade, origin, origin-when-cross-origin, same-origin, strict-origin, strict-origin-when-cross-origin, unsafe-url
    body: JSON.stringify(data) // body data type must match "Content-Type" header
  });
  return response.json(); // parses JSON response into native JavaScript objects
}

But when making the same request on a https:// page gives following:

Mixed Content: The page at 'https://www.@' was loaded over HTTPS, but requested an insecure XMLHttpRequest endpoint 'http://a6cc8d4dee22a448e81bb29862332bf0-93580d7c9d7d2256.elb.us-east-2.amazonaws.com/newtest-user'. This request has been blocked; the content must be served over HTTPS.

How can I access Cortex API over HTTPS?

upstream connect error or disconnect/reset before headers. reset reason: connection failure

Version

cli version: 0.18.1

Description

Intermittent 503 errors on AWS cluster.

Configuration

cortex.yaml

# cortex.yaml

- name: offer-features
  predictor:
    type: python
    path: predictor.py
    config:
      bucket: XXXXXXXXXXXXXXXXXXXX
  compute:
    cpu: 1  # CPU request per replica, e.g. 200m or 1 (200m is equivalent to 0.2) (default: 200m)
    gpu: 0  # GPU request per replica (default: 0)
    inf: 0 # Inferentia ASIC request per replica (default: 0)
    mem: 1Gi
  autoscaling:
    min_replicas: 2
    max_replicas: 3
    init_replicas: 2
    max_replica_concurrency: 13
    target_replica_concurrency: 5
    window: 1m0s
    downscale_stabilization_period: 5m0s
    upscale_stabilization_period: 1m0s
    max_downscale_factor: 0.75
    max_upscale_factor: 1.5
    downscale_tolerance: 0.05
    upscale_tolerance: 0.05

# cluster.yaml

# AWS credentials (if not specified, ~/.aws/credentials will be checked) (can be overridden by $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY)
aws_access_key_id: XXXXXXXXXXXXXX
aws_secret_access_key: XXXXXXXXXXXXXXXXX

# optional AWS credentials for the operator which may be used to restrict its AWS access (defaults to the AWS credentials set above)
cortex_aws_access_key_id: XXXXXXXXXXXXXXXX
cortex_aws_secret_access_key: XXXXXXXXXXXXXXXXXXXXX

# EKS cluster name for cortex (default: cortex)
cluster_name: cortex

# AWS region
region: us-east-1

# S3 bucket (default: <cluster_name>-<RANDOM_ID>)
# note: your cortex cluster uses this bucket for metadata storage, and it should not be accessed directly (a separate bucket should be used for your models)
bucket: # cortex-<RANDOM_ID>

# list of availability zones for your region (default: 3 random availability zones from the specified region)
availability_zones: # e.g. [us-east-1a, us-east-1b, us-east-1c]

# instance type
instance_type: t3.medium

# minimum number of instances (must be >= 0)
min_instances: 1

# maximum number of instances (must be >= 1)
max_instances: 5

# disk storage size per instance (GB) (default: 50)
instance_volume_size: 50

# instance volume type [gp2, io1, st1, sc1] (default: gp2)
instance_volume_type: gp2

# instance volume iops (only applicable to io1 storage type) (default: 3000)
# instance_volume_iops: 3000

# whether the subnets used for EC2 instances should be public or private (default: "public")
# if "public", instances will be assigned public IP addresses; if "private", instances won't have public IPs and a NAT gateway will be created to allow outgoing network requests
# see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
subnet_visibility: public  # must be "public" or "private"

# whether to include a NAT gateway with the cluster (a NAT gateway is necessary when using private subnets)
# default value is "none" if subnet_visibility is set to "public"; "single" if subnet_visibility is "private"
nat_gateway: none  # must be "none", "single", or "highly_available" (highly_available means one NAT gateway per availability zone)

# whether the API load balancer should be internet-facing or internal (default: "internet-facing")
# note: if using "internal", APIs will still be accessible via the public API Gateway endpoint unless you also disable API Gateway in your API's configuration (if you do that, you must configure VPC Peering to connect to your APIs)
# see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
api_load_balancer_scheme: internet-facing  # must be "internet-facing" or "internal"

# whether the operator load balancer should be internet-facing or internal (default: "internet-facing")
# note: if using "internal", you must configure VPC Peering to connect your CLI to your cluster operator (https://docs.cortex.dev/v/0.18/guides/vpc-peering)
# see https://docs.cortex.dev/v/0.18/miscellaneous/security#private-cluster for more information
operator_load_balancer_scheme: internet-facing  # must be "internet-facing" or "internal"

# CloudWatch log group for cortex (default: <cluster_name>)
log_group: cortex

# additional tags to assign to aws resources for labelling and cost allocation (by default, all resources will be tagged with cortex.dev/cluster-name=<cluster_name>)
tags:  # <string>: <string> map of key/value pairs

# whether to use spot instances in the cluster (default: false)
# see https://docs.cortex.dev/v/0.18/cluster-management/spot-instances for additional details on spot configuration
spot: false

# see https://docs.cortex.dev/v/0.18/guides/custom-domain for instructions on how to set up a custom domain
ssl_certificate_arn: XXXXXXXXXXXXXXXXXXXXXXXXXXXX

Steps to reproduce

Spin up instances on AWS.
Wait a couple of days / hours (varies).
Notice sudden 503 errors

Expected behavior

It should work

Actual behavior

503 errors with the message

upstream connect error or disconnect/reset before headers. reset reason: connection failure

Screenshots

NOTE: The endpoint stopped responding around 15:30 in the graphs below.

Monitoring nr of bytes in:

Nr of requests:

Stack traces

Nothing useful, just:

2020-08-16 05:38:34.697979:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:37.643022:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:40.577522:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:42.008412:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:43.513294:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:45.425255:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:48.327276:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:38:51.316962:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:38:54.009212:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:38:55.852878:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:38:57.525264:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:39:00.795236:cortex:pid-447:INFO:200 OK POST /predict
2020-08-16 05:39:04.437013:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:05.981920:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:09.314293:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:12.343143:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:15.821708:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:19.083554:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:22.048843:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:24.943968:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:26.613330:cortex:pid-448:INFO:200 OK POST /predict
2020-08-16 05:39:29.702703:cortex:pid-448:INFO:200 OK POST /predict

Additional context

The prediction takes about ~150ms on my Dell with Intel© Core™ i7-8750H CPU @ 2.20GHz × 6, 32GB Ram.
All the load balancer target are marked as "unhealthy", even when they work (i.e. I can send requests and receive 2XX responses)
The load balancer healthcheck endpoint returns the following

/healthz
{
        "service": {
                "namespace": "istio-system",
                "name": "ingressgateway-operator"
        },
        "localEndpoints": 0
}%

Suggested solution

(optional)

Add possibility to export environment variables with .env file

Description

Add support for exporting environment variables from an .env file placed in the root directory of a Cortex project.

Motivation

In case the user doesn't want to export environment variables using the predictor:env field in cortex.yaml. A reason for that could be to keep the cortex.yaml deployment clean.
Is there a way to speed-up API deployment
When deploying an API and observing logs, it seems that the most time-consuming part of deployment is:

2021-01-25 18:37:27.401057:cortex:pid-1:INFO:downloading the project code 2021-01-25 18:37:27.483562:cortex:pid-1:INFO:downloading the python serving image

Is there a way to somehow make deploying an API quicker?
Why is min_replicas 0 not possible?

We are trying to deploy a text generation API on AWS. We do not expect the API to receive a lot of traffic initially and hence we would like to save some costs. My idea was that min_replicas can be set to 0 which would not keep an instance idle in case the traffic on the API is none. As soon as a new request would come in cortex would spawn a new instance and shut it down once the traffic goes back to 0.

However, I noticed that setting min_replicas to 0 is invalid. Isn't the above use case a valid one for this? Also, is this a recent change? I vaguely(very) remember that this was possible to do in version 0.20(Please correct me if I'm wrong) but it seems like it is not in 0.26.

cc @deliahu I opened a new thread here because - 1) It's a different issue than the other thread , 2) Other users might benefit from the conversation here.
Fix Grafana dashboard for AsyncAPIs
Changes

Fix typo: async_queue_length -> async_queued so that the list of api_names is populated (currently empty)

Use =~ with api_name where missing to enable displaying multiple AsyncAPIs on a panel

For the "In-Flight Requests" panel include the api_name in the legend

Testing

I have made the corresponding updates manually through the Grafana UI for our deployed Cortex cluster. AsyncAPIs now list in the "Cortex / AsyncAPI" dashboard and the dashboard works when multiple AsyncAPIs are selected.

checklist:

[ ] run make test and make lint

[ ] test manually (i.e. build/push all images, restart operator, and re-deploy APIs)

[ ] update examples

[ ] update docs and add any new files to summary.md (view in gitbook after merging)

[ ] cherry-pick into release branches if applicable

[ ] alert the dev team if the dev environment changed
Use of root url

I don't really know how to word it correctly, long story short, I need to use the "http://$URL/" instead of http://$URL/$API_NAME" for one of the multiple APIs inside the cluster, I haven't found any way to do it in the documentation, but surely it is implemented.
Bump sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9
Bumps sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9.

Release notes

Sourced from sigs.k8s.io/aws-iam-authenticator's releases.

v0.5.9

Changelog

1209cfe2 Bump version in Makefile

029d1dcf Add query parameter validation for multiple parameters

v0.5.7

What's Changed

Remove duplicate InitMetrics by @jngo2 in kubernetes-sigs/aws-iam-authenticator#448

fixes a crash when executing authenticator in server mode

New Contributors

@jngo2 made their first contribution in kubernetes-sigs/aws-iam-authenticator#448

Full Changelog: https://github.com/kubernetes-sigs/aws-iam-authenticator/compare/v0.5.6...v0.5.7

v0.5.6

Changelog

Bump AWS SDK to v1.43.28 (#445, @nckturner)

Use the apiversion from KUBERNETES_EXEC_INFO (#439, @jyotimahapatra)

Bump promptui module to v0.9.0 (#437, @abhay-krishna)

Docker Images

Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

$(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)

docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6

docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-arm64

docker pull 602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-iam-authenticator:v0.5.6-amd64

v0.5.5

Changelog

Use full package name for goreleaser version (#433, @nckturner)

add sts error metric (#430, @jyotimahapatra)

emit metric for EC2 describeInstance calls (#428, @jyotimahapatra)

Rename configmap_watch_failures to configmap_watch_failures_total (#432, @nckturner)

Simplify goreleaser Dockerfiles (#431, @jyotimahapatra)

Don't pass metrics around (#423, @nckturner)

Docker Images

Note: You must log in with the registry ID and your role must have the necessary ECR privileges:

$(aws ecr get-login --no-include-email --region us-west-2 --registry-ids 602401143452)

... (truncated)

Commits

1209cfe Bump version in Makefile

029d1dc Add query parameter validation for multiple parameters

0a72c12 Merge pull request #455 from jyotimahapatra/rev2

596a043 revert use of upstream yaml parsing

2a9ee95 Merge pull request #448 from jngo2/master

fc4e6cb Remove unused imports

f0fe605 Remove duplicate InitMetrics

99f04d6 Merge pull request #447 from nckturner/release-0.5.6

9dcb6d1 Faster multiarch docker builds

a9cc81b Bump timeout for image build job

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.
Consider using the CDK SDK for `cortex cluster up / down` commands

Description

Replace cloud provider specific code in cortex cluster commands by using the CDK API.

Motivation

Make cluster management commands more independent from each cloud provider. Make it easier to use code to define the infrastructure (aka Cortex) in this case.
Restrict minimum EC2/EKS IAM policies by resource

Description

As it is described in https://docs.cortex.dev/clusters/management/auth#minimum-iam-policy, the current minimum IAM policy is to grant the cortex CLI (and by that extension to eskctl) full control over the EC2/EKS services.

Motivation

These should be restricted to a resource-based policy that would limit what an IAM role/user can do. This is especially helpful in bigger corporations where there are more than a handful of developers and the company's policy on what access its devs have is more stringent.

Additional context

This seems to be blocked on what eksctl requires: https://eksctl.io/usage/minimum-iam-policies/. Talk to the eksctl team to see if there's a way to further reduce the IAM policy requirements.

On-line Machine Learning in Go (and so much more)

goml Golang Machine Learning, On The Wire goml is a machine learning library written entirely in Golang which lets the average developer include machi

Jan 5, 2023

Self-contained Machine Learning and Natural Language Processing library in Go

Jan 8, 2023

Machine Learning for Go

GoLearn GoLearn is a 'batteries included' machine learning library for Go. Simplicity, paired with customisability, is the goal. We are in active deve

Jan 3, 2023

Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 30, 2022

Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

package ml - Machine Learning Libraries ###import "github.com/alonsovidales/go_ml" Package ml provides some implementations of usefull machine learnin

Nov 10, 2022

Gorgonia is a library that helps facilitate machine learning in Go.

Gorgonia is a library that helps facilitate machine learning in Go. Write and evaluate mathematical equations involving multidimensional arrays easily

Dec 27, 2022

Prophecis is a one-stop machine learning platform developed by WeBank

Prophecis is a one-stop machine learning platform developed by WeBank. It integrates multiple open-source machine learning frameworks, has the multi tenant management capability of machine learning compute cluster, and provides full stack container deployment and management services for production environment.

Dec 28, 2022

Go Machine Learning Benchmarks

Benchmarks of machine learning inference for Go

Dec 30, 2022

A High-level Machine Learning Library for Go

Overview Goro is a high-level machine learning library for Go built on Gorgonia. It aims to have the same feel as Keras. Usage import ( . "github.

Nov 20, 2022

Katib is a Kubernetes-native project for automated machine learning (AutoML).

Katib is a Kubernetes-native project for automated machine learning (AutoML). Katib supports Hyperparameter Tuning, Early Stopping and Neural Architec

Jan 2, 2023

PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

中文 | English PaddleDTX PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage. It solves the d

Dec 14, 2022

Example of Neural Network models of social and personality psychology phenomena

SocialNN Example of Neural Network models of social and personality psychology phenomena This repository gathers a collection of neural network models

Dec 5, 2022

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

English ∙ 日本語 ∙ 简体中文 ∙ 繁體中文 | العَرَبِيَّة‎ ∙ বাংলা ∙ Português do Brasil ∙ Deutsch ∙ ελληνικά ∙ עברית ∙ Italiano ∙ 한국어 ∙ فارسی ∙ Polski ∙ русский язы

Jan 9, 2023

The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.

End-to-end computer vision platform Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises. onepa

Dec 12, 2022

Deploy, manage, and scale machine learning models in production

Deploy, manage, and scale machine learning models in production

Use cases

Deploy

Manage

Scale

Owner

Cortex Labs

Comments

Display realtime output

Persistent private instances

Resource exhausted error

Per Process GPU Ram

Support aws_session_token for CLI auth

Description

Package Cortex library into .ZIP

How to make Cortex XmlHttpRequest on HTTPS page?

upstream connect error or disconnect/reset before headers. reset reason: connection failure

Version

Description

Configuration

Steps to reproduce

Expected behavior

Actual behavior

Screenshots

Stack traces

Additional context

Suggested solution

Add possibility to export environment variables with .env file

Description

Motivation

Is there a way to speed-up API deployment

Why is min_replicas 0 not possible?

Fix Grafana dashboard for AsyncAPIs

Changes

Testing

Use of root url

Bump sigs.k8s.io/aws-iam-authenticator from 0.5.3 to 0.5.9

v0.5.9

Changelog

v0.5.7

What's Changed

New Contributors

v0.5.6

Changelog

Docker Images

v0.5.5

Changelog

Docker Images

Consider using the CDK SDK for `cortex cluster up / down` commands

Description

Motivation

Restrict minimum EC2/EKS IAM policies by resource

Description

Motivation

Additional context

Related tags

On-line Machine Learning in Go (and so much more)

Self-contained Machine Learning and Natural Language Processing library in Go

Machine Learning for Go

Gorgonia is a library that helps facilitate machine learning in Go.

Machine Learning libraries for Go Lang - Linear regression, Logistic regression, etc.

Gorgonia is a library that helps facilitate machine learning in Go.

Prophecis is a one-stop machine learning platform developed by WeBank

Go Machine Learning Benchmarks

A High-level Machine Learning Library for Go

Katib is a Kubernetes-native project for automated machine learning (AutoML).

PaddleDTX is a solution that focused on distributed machine learning technology based on decentralized storage.

Example of Neural Network models of social and personality psychology phenomena

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

The open source, end-to-end computer vision platform. Label, build, train, tune, deploy and automate in a unified platform that runs on any cloud and on-premises.

Spice.ai is an open source, portable runtime for training and using deep learning on time series data.

Reinforcement Learning in Go

FlyML perfomant real time mashine learning libraryes in Go

Go (Golang) encrypted deep learning library; Fully homomorphic encryption over neural network graphs

A tool for building identical machine images for multiple platforms from a single source configuration