K8s cluster simulator for workload scheduling.

Open-Simulator

Go Report Card workflow build

Motivation

概念定义

Open-Simulator 是 K8s 下的仿真调度组件。用户准备一批待创建 Workload 资源,Workload 资源指定好资源配额、绑核规则、亲和性规则、优先级等,通过 Open-Simulator 的仿真调度能力可判断当前集群是否能够满足 Workload 资源,以及添加多少资源可保证资源部署成功。

原生 Kubernetes 缺少仿真调度能力,且社区并没有相关项目供参考。Open-Simulator 可解决资源规划问题,通过Workload 调度要求计算出最少物理资源数量,进而提高资源使用率,为用户节省物理成本和运维成本。

Use Case

两类场景需要资源规划:

  • 交付前:评估产品最少物理资源,通过仿真系统计算出交付需要的特定规格节点数量、磁盘数量(类似朱雀系统);
  • 运行时:用户新建 or 扩容 Workload,仿真调度系统会给出当前集群物理资源是否满足,并给出集群扩容建议(详细到扩容节点数)

Run

使用

添加节点

执行命令

./simon apply --kubeconfig=[kubeconfig文件目录] -f [Yaml文件夹目录]

Yaml文件夹参考./example目录,包含如下文件:

  • Deployment yamls
  • Statefulset yamls
  • Node yaml

执行后输出一个名为configmap-simon.yaml的文件,用以保存结果。

apiVersion: v1
kind: ConfigMap
metadata:
  name: simulator-plan
  namespace: kube-system
data:
  Deployment: '{"vivo-test-namespace/suppress-memcache-lsr":["simulator-node1","simulator-node1","node3","node2"],"vivo-test-namespace/suppress-memcache-be":["simulator-node1","simulator-node1","node3","node2"]}'
  StatefulSet: '{"vivo-test-namespace/suppress-memcache-lsr":["simulator-node1","simulator-node1","node3","node2"],"vivo-test-namespace/suppress-memcache-be":["simulator-node1","simulator-node1","node3","node2"]}'

效果图

Deployment

以 MacBook 为例

步骤

# 克隆项目
mkdir $(GOPATH)/github.com/alibaba
cd $(GOPATH)/github.com/alibaba
git clone https://github.com/alibaba/open-simulator.git
cd open-simulator

# 安装minikube并运行
minikube start

# 拷贝 kubeconfig 文件到项目目录
cp ~/.kube/config  ./kubeconfig

# 项目编译及运行
make
bin/simon apply --kubeconfig=./kubeconfig -f ./example/simple_example_by_huizhi
Owner
Alibaba
Alibaba Open Source
Alibaba
Comments
  • If there is a built-in variable referencing kubernetes in chart's Manifest, parsing will report an error

    If there is a built-in variable referencing kubernetes in chart's Manifest, parsing will report an error

    Ⅰ. Issue Description

    image

    Ⅱ. Describe what happened

    Ⅲ. Describe what you expected to happen

    Ⅳ. How to reproduce it (as minimally and precisely as possible)

    Ⅴ. Anything else we need to know?

  • Added en guide and fix link

    Added en guide and fix link

    Sorry for double pull request. My CLA is error because of typos in my email address. Same as before I added user manual using machine translation and fixed some link for english user on main README file.

  • add scheduling support for gpu share

    add scheduling support for gpu share

    1. Add pkg/simulator/plugin/open-gpu-share.go, which is the main extra plugin that supports the scheduling of GPU-sharing pods, including Filter, Score, Reserve, and Bind plugins.
    2. Extend pkg/apply/apply.go to support GPU-related outputs. The output format is open to discussion.
    3. Restructure the example. Putting old newnode to newnode/demo_1, thus revising simon-config.yaml accordingly.

    Most utilization functions and data structures are defined in github.com/alibaba/open-gpu-share v0.1.0, which is redirected to https://github.com/qzweng/open-gpu-share for now.

  • [Demo]第一期实现内容:集群资源规划

    [Demo]第一期实现内容:集群资源规划

    演示内容:

    deadline:2021年11月15号(周一)

    • 模拟拉起 Kubernetes 集群:节点规格(CPU、内存、硬盘)、节点数通过Yaml文件指定。集群包含如下节点:
      • master节点
      • 普通worker节点
      • 专用节点
    • 准备用户的 Helm Chart 文件(可以是 web server / nginx / mysql / redis 类应用)
      • Chart 中包含 K8s 标准的 Workload 文件,Workload中指定了不同的资源配额、调度规则(亲和、反亲和特性,包含应用之间以及应用-节点间亲和规则)、副本数
      • Chart 中包含自定义 CR 资源
      • 有多个 Chart 文件,Chart 之间有依赖关系
    • Open-Simulator 读取 Helm Chart,并给出如下结果:
      • 当前模拟集群是否可以一次性部署上述所有应用;若不满足,给出推荐的集群规模。
      • 显示应用部署之后的集群资源分配率
      • 显示应用部署之后的集群调度拓扑情况
  • fix: `make build` failed

    fix: `make build` failed

    Fix make build failed:

    ⚡ make build
    GO111MODULE=off GOARCH=amd64 GOOS=darwin CGO_ENABLED=0 go build -trimpath -ldflags "-X 'github.com/alibaba/open-simulator/cmd/version.VERSION=dev' -X 'github.com/alibaba/open-simulator/cmd/version.COMMITID=09b779c'" -v -o ./bin/simon ./cmd
    cmd/main.go:6:2: cannot find package "github.com/alibaba/open-simulator/cmd/simon" in any of:
    	/usr/local/go/src/github.com/alibaba/open-simulator/cmd/simon (from $GOROOT)
    	/Users/thearas/go/src/github.com/alibaba/open-simulator/cmd/simon (from $GOPATH)
    cmd/main.go:7:2: cannot find package "github.com/pterm/pterm" in any of:
    	/usr/local/go/src/github.com/pterm/pterm (from $GOROOT)
    	/Users/thearas/go/src/github.com/pterm/pterm (from $GOPATH)
    make: *** [build] Error 1
    
  • 与社区项目区别

    与社区项目区别

    Question

    社区开源两个项目:https://github.com/kubernetes-sigs/cluster-capacity https://github.com/kubernetes-sigs/kube-scheduler-simulator open-simulator跟他们有些功能是相似的,会考虑跟他们协同吗?

  • [Demo]二期实现内容:容器迁移

    [Demo]二期实现内容:容器迁移

    二期内容:

    deadline:2021年1月15号(周五)

    概念定义

    容器迁移:将 Pod 从原节点迁移到指定节点

    碎片:如果一个节点上的剩余资源,在任一维度上,不足够再放下一个当前集群内的任何一个 Pod,那么这些剩余资源都是碎片。比如一个节点CPU申请量已经达到 95%,而内存申请量只有 40%,那么该节点存在大量的内存碎片。

    演示内容

    在对集群进行缩容时,下线节点前需将其上的 Pod 进行迁移。本期演示下线节点前的容器迁移。本期暂不演示碎片整理功能。

    • 根据 kube-config 模拟现有集群
    • 预演(Dry Run)
      • 从模拟集群中选择 n 个可下线节点(n可配置)
        • 筛选节点时,支持设置资源过滤名单,比如某个命名空间下的资源不做处理,某个 Label 的 Pods 不做处理等
      • 图形化显示集群缩容前后各个资源变化
    • 容器迁移(Run)
      • 根据预演结果,对集群中的 Pods 按批次迁移到指定 Node 上

    节点下线需支持暴露 SDK 供外部项目使用

    // NodeStatus 结构体包含了 MigrationPlan 的内容,同时多了两个变量
    // isRemovable 表示该节点是否可下线
    // reason 表示节点不可下线的原因
    type NodeStatus struct{
      MigrationPlan
      isRemovable bool
      reason string
    }
    
    type MigrationResult struct {
      nodeStatus []NodeStatus
    }
    // ScaleDownCluster
    // 参数
    // 1. 由使用方自己生成 cluster
    // 2. nodelist 为用户指定的下线节点列表
    // 返回值
    // 1. error 不为空表示函数执行失败
    // 2. error 为空表示函数执行成功,通过 MigrationResult 信息获取集群缩容模拟信息。其中 UnscheduledPods 表示无法调度的 Pods,若其为空表示模拟调度成功;NodeStatus 会详细记录每个 Node 上的 Pod 情况。
    func ScaleDownCluster(cluster ResourceTypes, nodelist []string, opts ...Option) (*MigrationResult, error) 
    
  • [Question] Does open-simulator support pod lifecycle management?

    [Question] Does open-simulator support pod lifecycle management?

    Question

    Hi, thanks for your great work on this project.

    I have a question about pod lifecycle management. Does it support simulating the pod duration and the termination status, just like https://github.com/k82cn/kubesim/blame/master/doc/design.md#L42?

    Thanks!

clockwork - Simple and intuitive job scheduling library in Go.
clockwork - Simple and intuitive job scheduling library in Go.

clockwork A simple and intuitive scheduling library in Go. Inspired by python's schedule and ruby's clockwork libraries. Example use package main imp

Jul 27, 2022
Simple, zero-dependency scheduling library for Go

go-quartz Simple, zero-dependency scheduling library for Go. About Inspired by the Quartz Java scheduler. Library building blocks Job interface. Any t

Dec 30, 2022
Easy and fluent Go cron scheduling

goCron: A Golang Job Scheduling Package. goCron is a Golang job scheduling package which lets you run Go functions periodically at pre-determined inte

Jan 8, 2023
Job scheduling made easy.

scheduler Job scheduling made easy. Scheduler allows you to schedule recurrent jobs with an easy-to-read syntax. Inspired by the article Rethinking Cr

Dec 30, 2022
goCron: A Golang Job Scheduling Package.

goCron: A Golang Job Scheduling Package.

Jan 9, 2023
Distributed Task Scheduling System|分布式定时任务调度平台
Distributed Task Scheduling System|分布式定时任务调度平台

Crocodile Distributed Task Scheduling System English | 中文 Introduction A distributed task scheduling system based on Golang that supports http request

Jan 5, 2023
nano-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.
nano-gpu-scheduler is a Kubernetes scheduler extender for GPU resources scheduling.

Nano GPU Scheduler About This Project With the continuous evolution of cloud native AI scenarios, more and more users run AI tasks on Kubernetes, whic

Dec 29, 2022
A way of scheduling volcano jobs

JobFlow 背景 volcano Volcano是CNCF 下首个也是唯一的基于Kubernetes的容器批量计算平台,主要用于高性能计算场景。 它提供了Kubernetes目前缺 少的一套机制,这些机制通常是机器学习大数据应用、科学计算、 特效渲染等多种高性能工作负载所需的。 现状:当前vol

Oct 12, 2022
A zero-dependencies and lightweight go library for job scheduling

A zero-dependencies and lightweight go library for job scheduling.

Aug 3, 2022
Scheduler - Scheduler package is a zero-dependency scheduling library for Go

Scheduler Scheduler package is a zero-dependency scheduling library for Go Insta

Jan 14, 2022
A sample to showcase how to create a k8s scheduler extender

sample-scheduler-extender A sample to showcase how to create a k8s scheduler extender. UPDATE on 2020.6.10 Switch go module, and wire dependencies to

Nov 17, 2021
GPU Sharing Scheduler for Kubernetes Cluster
GPU Sharing Scheduler for Kubernetes Cluster

GPU Sharing Scheduler Extender in Kubernetes Overview More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of

Jan 6, 2023
high performance distributed task scheduling system, Support multi protocol scheduling tasks
 high performance distributed task scheduling system, Support multi protocol scheduling tasks

high performance distributed task scheduling system, Support multi protocol scheduling tasks

Dec 2, 2022
Simple docker container to publish a fixed message to a specified queue. Created to be used with k8s CRON scheduling.

RabbitMQ Publish CRON Simple docker container to publish a fixed message to a specified rabbitmq exchange. Created to be used as part of a Kubernetes

Dec 20, 2021
Write controller-runtime based k8s controllers that read/write to git, not k8s

Git Backed Controller The basic idea is to write a k8s controller that runs against git and not k8s apiserver. So the controller is reading and writin

Dec 10, 2021
K8s-cinder-csi-plugin - K8s Pod Use Openstack Cinder Volume

k8s-cinder-csi-plugin K8s Pod Use Openstack Cinder Volume openstack volume list

Jul 18, 2022
K8s-ingress-health-bot - A K8s Ingress Health Bot is a lightweight application to check the health of the ingress endpoints for a given kubernetes namespace.

k8s-ingress-health-bot A K8s Ingress Health Bot is a lightweight application to check the health of qualified ingress endpoints for a given kubernetes

Jan 2, 2022
K8s-go-structs - All k8s API Go structs

k8s-api go types Why? Its nice to have it all in a single package. . |-- pkg |

Jul 17, 2022
Read k8S-source-code notes, help quickly understand the K8S-code organization rules
Read k8S-source-code notes, help quickly understand the K8S-code organization rules

K8S源码阅读笔记 以下笔记针对 kubernetes V1.23.1(截至2022年01月01日最新版本),并不保证对其它版本的有效性 一、架构图 二、阅读前准备 由于kubernetes项目巧妙的设计和代码高度的封装性,建议在阅读代码前,尽可能的进行以下内容的准备: 1. 编程知识配备 编程语准

Feb 16, 2022
K8s controller implementing Multi-Cluster Services API based on AWS Cloud Map.

AWS Cloud Map MCS Controller for K8s Introduction AWS Cloud Map multi-cluster service discovery for Kubernetes (K8s) is a controller that implements e

Dec 17, 2022