K8s cluster simulator for workload scheduling.

Last update: Dec 25, 2022

Comments: 8

Open-Simulator

Motivation

概念定义

Open-Simulator 是 K8s 下的仿真调度组件。用户准备一批待创建 Workload 资源，Workload 资源指定好资源配额、绑核规则、亲和性规则、优先级等，通过 Open-Simulator 的仿真调度能力可判断当前集群是否能够满足 Workload 资源，以及添加多少资源可保证资源部署成功。

原生 Kubernetes 缺少仿真调度能力，且社区并没有相关项目供参考。Open-Simulator 可解决资源规划问题，通过Workload 调度要求计算出最少物理资源数量，进而提高资源使用率，为用户节省物理成本和运维成本。

Use Case

两类场景需要资源规划：

交付前：评估产品最少物理资源，通过仿真系统计算出交付需要的特定规格节点数量、磁盘数量（类似朱雀系统）；
运行时：用户新建 or 扩容 Workload，仿真调度系统会给出当前集群物理资源是否满足，并给出集群扩容建议（详细到扩容节点数）

Run

使用

添加节点

执行命令

./simon apply --kubeconfig=[kubeconfig文件目录] -f [Yaml文件夹目录]

Yaml文件夹参考./example目录，包含如下文件:

Deployment yamls
Statefulset yamls
Node yaml

执行后输出一个名为configmap-simon.yaml的文件，用以保存结果。

apiVersion: v1
kind: ConfigMap
metadata:
  name: simulator-plan
  namespace: kube-system
data:
  Deployment: '{"vivo-test-namespace/suppress-memcache-lsr":["simulator-node1","simulator-node1","node3","node2"],"vivo-test-namespace/suppress-memcache-be":["simulator-node1","simulator-node1","node3","node2"]}'
  StatefulSet: '{"vivo-test-namespace/suppress-memcache-lsr":["simulator-node1","simulator-node1","node3","node2"],"vivo-test-namespace/suppress-memcache-be":["simulator-node1","simulator-node1","node3","node2"]}'

效果图

Deployment

以 MacBook 为例

步骤

# 克隆项目
mkdir $(GOPATH)/github.com/alibaba
cd $(GOPATH)/github.com/alibaba
git clone https://github.com/alibaba/open-simulator.git
cd open-simulator

# 安装minikube并运行
minikube start

# 拷贝 kubeconfig 文件到项目目录
cp ~/.kube/config  ./kubeconfig

# 项目编译及运行
make
bin/simon apply --kubeconfig=./kubeconfig -f ./example/simple_example_by_huizhi

Owner

Alibaba

Alibaba Open Source

https://github.com/alibaba/open-simulator

Comments

If there is a built-in variable referencing kubernetes in chart's Manifest, parsing will report an error
Ⅰ. Issue Description

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?
Added en guide and fix link

Sorry for double pull request. My CLA is error because of typos in my email address. Same as before I added user manual using machine translation and fixed some link for english user on main README file.
add scheduling support for gpu share
Add pkg/simulator/plugin/open-gpu-share.go, which is the main extra plugin that supports the scheduling of GPU-sharing pods, including Filter, Score, Reserve, and Bind plugins.

Extend pkg/apply/apply.go to support GPU-related outputs. The output format is open to discussion.

Restructure the example. Putting old newnode to newnode/demo_1, thus revising simon-config.yaml accordingly.

Most utilization functions and data structures are defined in github.com/alibaba/open-gpu-share v0.1.0, which is redirected to https://github.com/qzweng/open-gpu-share for now.
[Demo]第一期实现内容：集群资源规划
演示内容：

deadline：2021年11月15号（周一）

模拟拉起 Kubernetes 集群：节点规格（CPU、内存、硬盘）、节点数通过Yaml文件指定。集群包含如下节点：

master节点

普通worker节点

专用节点

准备用户的 Helm Chart 文件（可以是 web server / nginx / mysql / redis 类应用）

Chart 中包含 K8s 标准的 Workload 文件，Workload中指定了不同的资源配额、调度规则（亲和、反亲和特性，包含应用之间以及应用-节点间亲和规则）、副本数

Chart 中包含自定义 CR 资源

有多个 Chart 文件，Chart 之间有依赖关系

Open-Simulator 读取 Helm Chart，并给出如下结果：

当前模拟集群是否可以一次性部署上述所有应用；若不满足，给出推荐的集群规模。

显示应用部署之后的集群资源分配率

显示应用部署之后的集群调度拓扑情况

fix: `make build` failed

Fix make build failed:

⚡ make build
GO111MODULE=off GOARCH=amd64 GOOS=darwin CGO_ENABLED=0 go build -trimpath -ldflags "-X 'github.com/alibaba/open-simulator/cmd/version.VERSION=dev' -X 'github.com/alibaba/open-simulator/cmd/version.COMMITID=09b779c'" -v -o ./bin/simon ./cmd
cmd/main.go:6:2: cannot find package "github.com/alibaba/open-simulator/cmd/simon" in any of:
	/usr/local/go/src/github.com/alibaba/open-simulator/cmd/simon (from $GOROOT)
	/Users/thearas/go/src/github.com/alibaba/open-simulator/cmd/simon (from $GOPATH)
cmd/main.go:7:2: cannot find package "github.com/pterm/pterm" in any of:
	/usr/local/go/src/github.com/pterm/pterm (from $GOROOT)
	/Users/thearas/go/src/github.com/pterm/pterm (from $GOPATH)
make: *** [build] Error 1

与社区项目区别

Question

社区开源两个项目：https://github.com/kubernetes-sigs/cluster-capacity https://github.com/kubernetes-sigs/kube-scheduler-simulator open-simulator跟他们有些功能是相似的，会考虑跟他们协同吗？
[Demo]二期实现内容：容器迁移
二期内容：

deadline：2021年1月15号（周五）

概念定义

容器迁移：将 Pod 从原节点迁移到指定节点

碎片：如果一个节点上的剩余资源，在任一维度上，不足够再放下一个当前集群内的任何一个 Pod，那么这些剩余资源都是碎片。比如一个节点CPU申请量已经达到 95%，而内存申请量只有 40%，那么该节点存在大量的内存碎片。

演示内容

在对集群进行缩容时，下线节点前需将其上的 Pod 进行迁移。本期演示下线节点前的容器迁移。本期暂不演示碎片整理功能。

根据 kube-config 模拟现有集群

预演（Dry Run）

从模拟集群中选择 n 个可下线节点（n可配置）

筛选节点时，支持设置资源过滤名单，比如某个命名空间下的资源不做处理，某个 Label 的 Pods 不做处理等

图形化显示集群缩容前后各个资源变化

容器迁移（Run）

根据预演结果，对集群中的 Pods 按批次迁移到指定 Node 上

节点下线需支持暴露 SDK 供外部项目使用

// NodeStatus 结构体包含了 MigrationPlan 的内容，同时多了两个变量 // isRemovable 表示该节点是否可下线 // reason 表示节点不可下线的原因 type NodeStatus struct{ MigrationPlan isRemovable bool reason string } type MigrationResult struct { nodeStatus []NodeStatus } // ScaleDownCluster // 参数 // 1. 由使用方自己生成 cluster // 2. nodelist 为用户指定的下线节点列表 // 返回值 // 1. error 不为空表示函数执行失败 // 2. error 为空表示函数执行成功，通过 MigrationResult 信息获取集群缩容模拟信息。其中 UnscheduledPods 表示无法调度的 Pods，若其为空表示模拟调度成功；NodeStatus 会详细记录每个 Node 上的 Pod 情况。 func ScaleDownCluster(cluster ResourceTypes, nodelist []string, opts ...Option) (*MigrationResult, error)
[Question] Does open-simulator support pod lifecycle management?

Question

Hi, thanks for your great work on this project.

I have a question about pod lifecycle management. Does it support simulating the pod duration and the termination status, just like https://github.com/k82cn/kubesim/blame/master/doc/design.md#L42?

Thanks!

clockwork - Simple and intuitive job scheduling library in Go.

clockwork A simple and intuitive scheduling library in Go. Inspired by python's schedule and ruby's clockwork libraries. Example use package main imp

Jul 27, 2022

Simple, zero-dependency scheduling library for Go

go-quartz Simple, zero-dependency scheduling library for Go. About Inspired by the Quartz Java scheduler. Library building blocks Job interface. Any t

Dec 30, 2022

Easy and fluent Go cron scheduling

goCron: A Golang Job Scheduling Package. goCron is a Golang job scheduling package which lets you run Go functions periodically at pre-determined inte

Jan 8, 2023

Job scheduling made easy.

scheduler Job scheduling made easy. Scheduler allows you to schedule recurrent jobs with an easy-to-read syntax. Inspired by the article Rethinking Cr

Dec 30, 2022

goCron: A Golang Job Scheduling Package.

Jan 9, 2023

Distributed Task Scheduling System|分布式定时任务调度平台

Crocodile Distributed Task Scheduling System English | 中文 Introduction A distributed task scheduling system based on Golang that supports http request

Jan 5, 2023

nano-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.

Nano GPU Scheduler About This Project With the continuous evolution of cloud native AI scenarios, more and more users run AI tasks on Kubernetes, whic

Dec 29, 2022

A way of scheduling volcano jobs

JobFlow 背景 volcano Volcano是CNCF 下首个也是唯一的基于Kubernetes的容器批量计算平台，主要用于高性能计算场景。它提供了Kubernetes目前缺少的一套机制，这些机制通常是机器学习大数据应用、科学计算、特效渲染等多种高性能工作负载所需的。现状：当前vol

Oct 12, 2022

A zero-dependencies and lightweight go library for job scheduling

A zero-dependencies and lightweight go library for job scheduling.

Aug 3, 2022

Scheduler - Scheduler package is a zero-dependency scheduling library for Go

Scheduler Scheduler package is a zero-dependency scheduling library for Go Insta

Jan 14, 2022

A sample to showcase how to create a k8s scheduler extender

sample-scheduler-extender A sample to showcase how to create a k8s scheduler extender. UPDATE on 2020.6.10 Switch go module, and wire dependencies to

Nov 17, 2021

GPU Sharing Scheduler for Kubernetes Cluster

GPU Sharing Scheduler Extender in Kubernetes Overview More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of

Jan 6, 2023

high performance distributed task scheduling system, Support multi protocol scheduling tasks

Dec 2, 2022

Simple docker container to publish a fixed message to a specified queue. Created to be used with k8s CRON scheduling.

RabbitMQ Publish CRON Simple docker container to publish a fixed message to a specified rabbitmq exchange. Created to be used as part of a Kubernetes

Dec 20, 2021

Write controller-runtime based k8s controllers that read/write to git, not k8s

Git Backed Controller The basic idea is to write a k8s controller that runs against git and not k8s apiserver. So the controller is reading and writin

Dec 10, 2021

K8s-cinder-csi-plugin - K8s Pod Use Openstack Cinder Volume

k8s-cinder-csi-plugin K8s Pod Use Openstack Cinder Volume openstack volume list

Jul 18, 2022

K8s-ingress-health-bot - A K8s Ingress Health Bot is a lightweight application to check the health of the ingress endpoints for a given kubernetes namespace.

k8s-ingress-health-bot A K8s Ingress Health Bot is a lightweight application to check the health of qualified ingress endpoints for a given kubernetes

Jan 2, 2022

K8s cluster simulator for workload scheduling.

Open-Simulator

Motivation

概念定义

Use Case

Run

使用

添加节点

效果图

Deployment

步骤

Owner

Alibaba

Comments

If there is a built-in variable referencing kubernetes in chart's Manifest, parsing will report an error

Ⅰ. Issue Description

Ⅱ. Describe what happened

Ⅲ. Describe what you expected to happen

Ⅳ. How to reproduce it (as minimally and precisely as possible)

Ⅴ. Anything else we need to know?

Added en guide and fix link

add scheduling support for gpu share

[Demo]第一期实现内容：集群资源规划

fix: `make build` failed

与社区项目区别

Question

[Demo]二期实现内容：容器迁移

概念定义

演示内容

[Question] Does open-simulator support pod lifecycle management?

Question

Related tags

clockwork - Simple and intuitive job scheduling library in Go.

Simple, zero-dependency scheduling library for Go

Easy and fluent Go cron scheduling

Job scheduling made easy.

goCron: A Golang Job Scheduling Package.

Distributed Task Scheduling System|分布式定时任务调度平台

nano-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.

A way of scheduling volcano jobs

A zero-dependencies and lightweight go library for job scheduling

Scheduler - Scheduler package is a zero-dependency scheduling library for Go

A sample to showcase how to create a k8s scheduler extender

GPU Sharing Scheduler for Kubernetes Cluster

high performance distributed task scheduling system, Support multi protocol scheduling tasks

Simple docker container to publish a fixed message to a specified queue. Created to be used with k8s CRON scheduling.

Write controller-runtime based k8s controllers that read/write to git, not k8s

K8s-cinder-csi-plugin - K8s Pod Use Openstack Cinder Volume

K8s-ingress-health-bot - A K8s Ingress Health Bot is a lightweight application to check the health of the ingress endpoints for a given kubernetes namespace.

K8s-go-structs - All k8s API Go structs

Read k8S-source-code notes, help quickly understand the K8S-code organization rules

K8s controller implementing Multi-Cluster Services API based on AWS Cloud Map.