Progressive delivery Kubernetes operator (Canary, A/B Testing and Blue/Green deployments)

flagger

build report license release

Flagger is a progressive delivery tool that automates the release process for applications running on Kubernetes. It reduces the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics and running conformance tests.

flagger-overview

Flagger implements several deployment strategies (Canary releases, A/B testing, Blue/Green mirroring) using a service mesh (App Mesh, Istio, Linkerd) or an ingress controller (Contour, Gloo, NGINX, Skipper, Traefik) for traffic routing. For release analysis, Flagger can query Prometheus, Datadog, New Relic or CloudWatch and for alerting it uses Slack, MS Teams, Discord and Rocket.

Flagger is a Cloud Native Computing Foundation project and part of Flux family of GitOps tools.

Documentation

Flagger documentation can be found at docs.flagger.app.

Who is using Flagger

List of organizations using Flagger:

If you are using Flagger, please submit a PR to add your organization to the list!

Canary CRD

Flagger takes a Kubernetes deployment and optionally a horizontal pod autoscaler (HPA), then creates a series of objects (Kubernetes deployments, ClusterIP services, service mesh or ingress routes). These objects expose the application on the mesh and drive the canary analysis and promotion.

Flagger keeps track of ConfigMaps and Secrets referenced by a Kubernetes Deployment and triggers a canary analysis if any of those objects change. When promoting a workload in production, both code (container images) and configuration (config maps and secrets) are being synchronised.

For a deployment named podinfo, a canary promotion can be defined using Flagger's custom resource:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: podinfo
  namespace: test
spec:
  # service mesh provider (optional)
  # can be: kubernetes, istio, linkerd, appmesh, nginx, skipper, contour, gloo, supergloo, traefik
  provider: istio
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: podinfo
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: podinfo
  service:
    # service name (defaults to targetRef.name)
    name: podinfo
    # ClusterIP port number
    port: 9898
    # container port name or number (optional)
    targetPort: 9898
    # port name can be http or grpc (default http)
    portName: http
    # add all the other container ports
    # to the ClusterIP services (default false)
    portDiscovery: true
    # HTTP match conditions (optional)
    match:
      - uri:
          prefix: /
    # HTTP rewrite (optional)
    rewrite:
      uri: /
    # request timeout (optional)
    timeout: 5s
  # promote the canary without analysing it (default false)
  skipAnalysis: false
  # define the canary analysis timing and KPIs
  analysis:
    # schedule interval (default 60s)
    interval: 1m
    # max number of failed metric checks before rollback
    threshold: 10
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 5
    # validation (optional)
    metrics:
    - name: request-success-rate
      # builtin Prometheus check
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: request-duration
      # builtin Prometheus check
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    - name: "database connections"
      # custom metric check
      templateRef:
        name: db-connections
      thresholdRange:
        min: 2
        max: 100
      interval: 1m
    # testing (optional)
    webhooks:
      - name: "conformance test"
        type: pre-rollout
        url: http://flagger-helmtester.test/
        timeout: 5m
        metadata:
          type: "helmv3"
          cmd: "test run podinfo -n test"
      - name: "load test"
        type: rollout
        url: http://flagger-loadtester.test/
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://podinfo.test:9898/"
    # alerting (optional)
    alerts:
      - name: "dev team Slack"
        severity: error
        providerRef:
          name: dev-slack
          namespace: flagger
      - name: "qa team Discord"
        severity: warn
        providerRef:
          name: qa-discord
      - name: "on-call MS Teams"
        severity: info
        providerRef:
          name: on-call-msteams

For more details on how the canary analysis and promotion works please read the docs.

Features

Service Mesh

Feature App Mesh Istio Linkerd Kubernetes CNI
Canary deployments (weighted traffic) ✔️ ✔️ ✔️
A/B testing (headers and cookies routing) ✔️ ✔️
Blue/Green deployments (traffic switch) ✔️ ✔️ ✔️ ✔️
Blue/Green deployments (traffic mirroring) ✔️
Webhooks (acceptance/load testing) ✔️ ✔️ ✔️ ✔️
Manual gating (approve/pause/resume) ✔️ ✔️ ✔️ ✔️
Request success rate check (L7 metric) ✔️ ✔️ ✔️
Request duration check (L7 metric) ✔️ ✔️ ✔️
Custom metric checks ✔️ ✔️ ✔️ ✔️

Ingress

Feature Contour Gloo NGINX Skipper Traefik
Canary deployments (weighted traffic) ✔️ ✔️ ✔️ ✔️ ✔️
A/B testing (headers and cookies routing) ✔️ ✔️ ✔️
Blue/Green deployments (traffic switch) ✔️ ✔️ ✔️ ✔️ ✔️
Webhooks (acceptance/load testing) ✔️ ✔️ ✔️ ✔️ ✔️
Manual gating (approve/pause/resume) ✔️ ✔️ ✔️ ✔️ ✔️
Request success rate check (L7 metric) ✔️ ✔️ ✔️ ✔️
Request duration check (L7 metric) ✔️ ✔️ ✔️ ✔️
Custom metric checks ✔️ ✔️ ✔️ ✔️ ✔️

Roadmap

GitOps Toolkit compatibility

  • Migrate Flagger to Kubernetes controller-runtime and kubebuilder
  • Make the Canary status compatible with kstatus
  • Make Flagger emit Kubernetes events compatible with Flux v2 notification API
  • Integrate Flagger into Flux v2 as the progressive delivery component

Integrations

  • Add support for Kubernetes Ingress v2
  • Add support for SMI compatible service mesh solutions like Open Service Mesh and Consul Connect
  • Add support for ingress controllers like HAProxy and ALB
  • Add support for metrics providers like InfluxDB, Stackdriver, SignalFX

Contributing

Flagger is Apache 2.0 licensed and accepts contributions via GitHub pull requests. To start contributing please read the development guide.

When submitting bug reports please include as much details as possible:

  • which Flagger version
  • which Flagger CRD version
  • which Kubernetes version
  • what configuration (canary, ingress and workloads definitions)
  • what happened (Flagger and Proxy logs)

Getting Help

If you have any questions about Flagger and progressive delivery:

Your feedback is always welcome!

Owner
Flux project
Open and extensible continuous delivery solution for Kubernetes
Flux project
Comments
  • Specifying multiple HTTP match uri in Istio Canary deployment via Flagger

    Specifying multiple HTTP match uri in Istio Canary deployment via Flagger

    I am gonna use automatic Canary deployment so I tried to follow the process via Flagger. Here was my VirtualService file for routing:

    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: {{ .Values.project }}
      namespace: {{ .Values.service.namespace }}
    spec:
      hosts:
        - {{ .Values.subdomain }}
      gateways:
        - mygateway.istio-system.svc.cluster.local
      http:
        {{- range $key, $value := .Values.routing.http }}
        - name: {{ $key }}
    {{ toYaml $value | indent 6 }}
        {{- end }}
    

    Which the routing part looks like this:

    http:
        r1:
          match:
            - uri:
                prefix: /myservice/monitor
          route:
            - destination:
                host: myservice
                port:
                  number: 9090
        r2:
          match:
            - uri:
                prefix: /myservice
          route:
            - destination:
                host: myservice
                port:
                  number: 8080
          corsPolicy:
            allowCredentials: false
            allowHeaders:
            - X-Tenant-Identifier
            - Content-Type
            - Authorization
            allowMethods:
            - GET
            - POST
            - PATCH
            allowOrigin:
            - "*"
            maxAge: 24h    `
    

    However as I found the Flagger overwites the virtualservice, I have removed this file and modified the canary.yaml file based on my requirements but I get yaml error:

    {{- if .Values.canary.enabled }}
    apiVersion: flagger.app/v1alpha3
    kind: Canary
    metadata:
      name: {{ .Values.project }}
      namespace: {{ .Values.service.namespace }}
      labels:
        app: {{ .Values.project }}
        chart: {{ template "myservice-chart.chart" . }}
        release: {{ .Release.Name }}
        heritage: {{ .Release.Service }}
    spec:
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name:  {{ .Values.project }}
      progressDeadlineSeconds: 60
      autoscalerRef:
        apiVersion: autoscaling/v2beta1
        kind: HorizontalPodAutoscaler
        name:  {{ .Values.project }}    
      service:
        port: 8080
        portDiscovery: true
        {{- if .Values.canary.istioIngress.enabled }}
        gateways:
        -  {{ .Values.canary.istioIngress.gateway }}
        hosts:
        - {{ .Values.canary.istioIngress.host }}
        {{- end }}
        trafficPolicy:
          tls:
            # use ISTIO_MUTUAL when mTLS is enabled
            mode: DISABLE
        # HTTP match conditions (optional)
        match:
          - uri:
              prefix: /myservice
        # cross-origin resource sharing policy (optional)
          corsPolicy:
            allowOrigin:
              - "*"
            allowMethods:
              - GET
              - POST
              - PATCH
            allowCredentials: false
            allowHeaders:
              - X-Tenant-Identifier
              - Content-Type
              - Authorization
            maxAge: 24h
          - uri:
              prefix: /myservice/monitor
      canaryAnalysis:
        interval: {{ .Values.canary.analysis.interval }}
        threshold: {{ .Values.canary.analysis.threshold }}
        maxWeight: {{ .Values.canary.analysis.maxWeight }}
        stepWeight: {{ .Values.canary.analysis.stepWeight }}
        metrics:
        - name: request-success-rate
          threshold: {{ .Values.canary.thresholds.successRate }}
          interval: 1m
        - name: request-duration
          threshold: {{ .Values.canary.thresholds.latency }}
          interval: 1m
        webhooks:
          {{- if .Values.canary.loadtest.enabled }}
          - name: load-test-get
            url: {{ .Values.canary.loadtest.url }}
            timeout: 5s
            metadata:
              cmd: "hey -z 1m -q 5 -c 2 http://myservice.default:8080"
          - name: load-test-post
            url: {{ .Values.canary.loadtest.url }}
            timeout: 5s
            metadata:
              cmd: "hey -z 1m -q 5 -c 2 -m POST -d '{\"test\": true}' http://myservice.default:8080/echo"
          {{- end }}  
    {{- end }}
    

    Can anyone help with this issue?

  • Add canary finalizers

    Add canary finalizers

    @stefanprodan This is a work in progress PR looking for acceptance on the approach and feedback. This PR provides the opt-in capability for users to revert flagger mutations on deletion of a canary. If users opt-in finalizers will be utilized to revert the mutated resources before the canary and owned resources are handed over for finalizing.

    Changes: Add evaluations for finalizers controller/controller Add finalizers source controller/finalizer Add interface method on deployment and daemonset controllers Add interface method on routers Add e2e tests

    Work to be done: Cover mesh and ingress outside of Istio

    Fix: #388 Fix: #488

  • Gloo Canary Release Docs Discrepancy

    Gloo Canary Release Docs Discrepancy

    I am trying to get a simple POC working with Gloo and Flagger, however the example docs don't work out-of-the-box.

    I also noticed the example virtual-service is different in the docs compared to what's in the repo?

    The specifics regarding mapping a virtual-service to an upstream seem to be different in both and I just want to know what I should follow to get this working.

    I would make an issue on Gloo's repository, however I'm unsure if my error is stemming with Gloo or me following the wrong docs.

  • Only unique values for domains are permitted error with Istio 1.1.0 RC1

    Only unique values for domains are permitted error with Istio 1.1.0 RC1

    Right now, due to istio limitations, it is not possible to create a virtualservice with a mesh and another host name. For example:

    if I have:

    ...
    gateways:
    - www.myapp.com
    - mesh
    http:
      - match:
        - uri:
            prefix: /api
        route:
        - destination:
            host: api.default.svc.cluster.local
            port:
              number: 80
    

    and

    ...
    gateways:
    - www.myapp.com
    - mesh
    http:
      - match:
        - uri:
            prefix: /internal
        route:
        - destination:
            host: internal.default.svc.cluster.local
            port:
              number: 80
    

    Istio will throw an error

    Only unique values for domains are permitted. Duplicate entry of domain www.myapp.com"
    

    The two ways of fixing this I see is for flagger to either:

    1. Create a separate virtualservice and maintain the canary settings for each one correlated to the particular service deployed
    2. Compile all virtualservices together into a singular virtualservice

    Let me know what you think!

  • Unable to perform Istio-A/B testing

    Unable to perform Istio-A/B testing

    Hey guys I have configured Istio as a service mesh in my Kubernetes. I wanted to try A/B testing deployment-strategy along with Flagger.

    I followed the following documentation to set up Flagger 1.) https://docs.flagger.app/usage/ab-testing 2.) https://docs.flagger.app/how-it-works#a-b-testing

    It throws me a VirtualService error: virtualservice:publisher-d8t-v1 Weight sum should be 100 when i check my Kiali dashboard.

    And on describing the canary the canary fails due to no traffic was generated. Although i made post call to my service and a response status of 200 was returned.

    Can you please help me fix this error.

    I have attached the respective screenshots .

    VirtualService Error in Kiali: Screenshot (103)_LI

    Canary Status: Screenshot (109)_LI

    Traffic Generation and its status: Screenshot (107)_LI

    Screenshot (105)_LI

    Can you please help me resolve this issue !!

    Also according to istio documentation, to connect a VirtualService with DestinationRule we need to use subsets. But i see no subsets being created. How are you able to achieve traffic routing without a subset. I did read as note keep a label as app:deployment name, is this solving the purpose ??.

    Thanks in advance :)

  • istio no values found for metric request-success-rate

    istio no values found for metric request-success-rate

    given following:

          metrics:
          - interval: 1m
            name: request-success-rate
            threshold: 99
          - interval: 30s
            name: request-duration
            threshold: 500
          stepWeight: 10
          threshold: 5
          webhooks:
          - metadata:
              cmd: hey -z 10m -q 10 -c 2 http://conf-day-demo-rest.conf-day-demo:8080/greeting
            name: conf-day-demo-loadtest
            timeout: 5s
            url: http://loadtester.loadtester/
    

    Canary promotion fails with Halt advancement no values found for metric request-success-rate probably conf-day-demo-rest.conf-day-demo is not receiving traffic

    querying metrics manually I see metrics for conf-day-demo-rest-primary but flagger queries for

    destination_workload=~"{{ .Name }}"
    ``` which returns no data 
    
  • Canary ingress nginx prevent update of primary ingress due to admission webhook

    Canary ingress nginx prevent update of primary ingress due to admission webhook

    Hi all,

    we have a problem with ingress admission webhook. Using podinfo as example we did a canary deployment.

    Flagger created second ingress and after rollout was done it switch "canary" annotation from "true" to "false":

    apiVersion:  networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
      annotations:
        kubernetes.io/ingress.class: nginx-v2
        nginx.ingress.kubernetes.io/canary: "false"
    

    I added "test" annotation to main Ingress to trigger update:

    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
      name: podinfo
      labels:
        app: podinfo
      annotations:
        kubernetes.io/ingress.class: "nginx-v2"
        test: "test"
    ...
    

    Now when I try to apply main Ingress file I get admission webhook error:

    Error from server (BadRequest): error when creating "podinfo.yaml": 
    admission webhook "validate.nginx.ingress.kubernetes.io" 
    denied the request: host "example.com" and 
    path "/" is already defined in ingress develop/podinfo-canary
    

    podinfo Ingress

    apiVersion: networking.k8s.io/v1beta1
    kind: Ingress
    metadata:
      name: podinfo
      labels:
        app: podinfo
      annotations:
        kubernetes.io/ingress.class: "nginx-v2"
    spec:
      rules:
        - host: example.com
          http:
            paths:
              - backend:
                  serviceName: podinfo
                  servicePort: 80
      tls:
      - hosts:
        - example.com
        secretName: example.com.wildcard
    

    Flagger version: 1.6.1 Ingress nginx version: 0.43

  • Blue/Green deployment - ELB collides with ClusterIP Flagger services.

    Blue/Green deployment - ELB collides with ClusterIP Flagger services.

    Hey everybody. I wanted to give you some feedback from my learning process using Flagger and ask you a couple of question on how to fix an issue I've been having with my current use case.

    Here is it: I have an EKS cluster with two namespace, one for testing (called staging) and another for production. I've been trying to add Flagger to the staging namespace in order to enable Blue Green Deployments from my GitLab pipeline.

    How do I do that? Well, I've set up a gitlab job that basically runs a kubectl command and applies the files that I've added below. This is a very basic application, that means I've been trying to implement Blue/Green style deployments with Kubernetes L4 networking.

    Here is the order of how files get applied:

    1. namespace
    2. canary
    3. deployment
    4. service

    I've also created a drawing to help you illustrate the situation a little bit better.

    image

    The problem with this is approach is that as soon as I apply the load balanacer manifest I got this error:

     The Service "my-app" is invalid: spec.ports[2].name: Duplicate value: "http"
    

    I've tried applying this the same configuration on the production environment and it did work. My guess here is that somehow Flagger's ClusterIP services are creating a conflict with my load balancer, leading to a possible collision between them.

    I hope that you can help me with this issue, I'll keep you posted if I find a solution.

    namespace.yaml

    apiVersion: v1
    kind: Namespace
    metadata:
      name: staging
    

    deployment.yaml

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: my-app
      namespace: staging
      labels:
        app: my-app
        environment: staging
    spec:
      replicas: 1
      strategy:
        type: Recreate
      selector:
        matchLabels:
          app: my-app
          environment: staging
      template:
        metadata:
          labels:
            app: my-app
            environment: staging
          annotations:
            configHash: " "
        spec:
          containers:
            - name: my-app
              image: marcoshuck/my-app
              imagePullPolicy: Always
              ports:
                - containerPort: 8001
              envFrom:
                - configMapRef:
                    name: my-app-config
          nodeSelector:
            server: "true"
    

    load-balancer.yaml

    apiVersion: v1
    kind: Service
    metadata:
      name: my-app
      namespace: staging
      annotations:
        # Use HTTP to talk to the backend.
        service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
        # Amazon (AMC) certificate ARN
        service.beta.kubernetes.io/aws-load-balancer-ssl-cert: XXXXXXXXXXXXXXXXXXXXXXXXX
        # Only run SSL on the port named "tls" below.
        service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
    spec:
      type: LoadBalancer
      ports:
      - name: http
        port: 80
        targetPort: 8001
      - name: https
        port: 443
        targetPort: 8001
      selector:
        app: my-app
        environment: staging
    

    canary.yaml

    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: my-app
      namespace: staging
    spec:
      provider: kubernetes
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
      progressDeadlineSeconds: 60
      service:
        port: 8001
        portDiscovery: true
      analysis:
        interval: 30s
        threshold: 3
        iterations: 10
        metrics:
          - name: request-success-rate
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            thresholdRange:
              max: 500
            interval: 30s
        webhooks:
          - name: load-test
            url: http://flagger-loadtester.test/
            timeout: 5s
            metadata:
              type: cmd
              cmd: "hey -z 1m -q 10 -c 2 http://my-app-canary.test:8001/"
    
  • progressDeadlineSeconds not working while waiting for rollout to finish

    progressDeadlineSeconds not working while waiting for rollout to finish

    Hi, In my deployment, I use progressDeadlineSeconds: 1200, and in the canary definition, I use canary deployment with buildin prometheus check. The canary app crashed, the deployment should be rolled back. but seems it isn't.

    my-app-deployment-58b7ffb786-7dk4h                     1/2     CrashLoopBackOff   109        9h
    my-app-deployment-primary-84f69c75c4-9d7x7             2/2     Running            0          16h
    

    And the flagger logs always shows following messages with infinite loop.

    {"level":"info","ts":"2020-03-26T01:38:32.078Z","caller":"controller/events.go:27","msg":"canary deployment my-app-deployment.test not ready with retryable true: waiting for
    rollout to finish: 0 of 1 updated replicas are available","canary":"my-app-canary.test"}
    

    The canary I use:

    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: my-app-canary
      namespace: test
    spec:
      # deployment reference
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app-deployment
      # the maximum time in seconds for the canary deployment
      # to make progress before it is rollback (default 600s)
      progressDeadlineSeconds: 1200
      # HPA reference (optional)
      autoscalerRef:
        apiVersion: autoscaling/v2beta1
        kind: HorizontalPodAutoscaler
        name: my-app-hpa
      service:
        # ClusterIP port number
        port: 80
        # container port name or number (optional)
        targetPort: 8080
        # Istio virtual service host names (optional)
        trafficPolicy:
          tls:
            mode: ISTIO_MUTUAL
      analysis:
        # schedule interval (default 60s)
        interval: 1m
        # max number of failed iterations before rollback
        threshold: 5
        # max traffic percentage routed to canary
        # percentage (0-100)
        maxWeight: 50
        # canary increment step
        # percentage (0-100)
        stepWeight: 10
        metrics:
          - name: request-success-rate
            # builtin Prometheus check
            # minimum req success rate (non 5xx responses)
            # percentage (0-100)
            thresholdRange:
              min: 99
            interval: 1m
          - name: request-duration
            # builtin Prometheus check
            # maximum req duration P99
            # milliseconds
            thresholdRange:
              max: 500
            interval: 30s
        webhooks:
          - name: acceptance-test
            type: pre-rollout
            url: http://blueprint-test-loadtester.blueprint-test/
            timeout: 30s
            metadata:
              type: bash
              cmd: "curl http://my-app-deployment-canary.test"
          - name: load-test
            type: rollout
            url: http://blueprint-test-loadtester.blueprint-test/
            timeout: 5s
            metadata:
              cmd: "hey -z 1m -q 10 -c 2 http://my-app-deployment-canary.test"
    
  • Add HTTP match conditions to Canary service spec

    Add HTTP match conditions to Canary service spec

    Could you show an example of how to use this with the istio ingress? I can't seem to figure out how to point to the correct service!

    More specifically, is it possible to tell the istio ingress to route based on certain criteria (i.e. a uri prefix, etc?)

  • Flagger omits `TrafficSplit` backend service weight if weight is 0 due to `omitempty` option

    Flagger omits `TrafficSplit` backend service weight if weight is 0 due to `omitempty` option

    Describe the bug

    Since OSM is supported (SMI support added in #896), I did the following to create a canary deploy using OSM and Flagger. As recommended in #896, I used the MetricsTemplate CRDs to create the required Prometheus custom metrics (request-success-rate and request-duration).

    I then created a canary custom resource for podinfo deployment, however it does not succeed. It says that the canary custom resource cannot create a TrafficSplit resource for the canary deployment.

    Output excerpt of kubectl describe -f ./podinfo-canary.yaml:

    Status:
      Canary Weight:  0
      Conditions:
        Last Transition Time:  2021-06-07T22:28:21Z
        Last Update Time:      2021-06-07T22:28:21Z
        Message:               New Deployment detected, starting initialization.
        Reason:                Initializing
        Status:                Unknown
        Type:                  Promoted
      Failed Checks:           0
      Iterations:              0
      Last Transition Time:    2021-06-07T22:28:21Z
      Phase:                   Initializing
    Events:
      Type     Reason  Age                  From     Message
      ----     ------  ----                 ----     -------
      Warning  Synced  5m38s                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
      Normal   Synced  8s (x12 over 5m38s)  flagger  all the metrics providers are available!
      Warning  Synced  8s (x11 over 5m8s)   flagger  TrafficSplit podinfo.test create error: the server could not find the requested resource (post trafficsplits.split.smi-spec.io)
    


    To Reproduce

    ./kustomize/osm/kustomization.yaml:

    namespace: osm-system
    bases:
      - ../base/flagger/
    patchesStrategicMerge:
      - patch.yaml
    

    ./kustomize/osm/patch.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: flagger
    spec:
      template:
        spec:
          containers:
            - name: flagger
              args:
                - -log-level=info
                - -include-label-prefix=app.kubernetes.io
                - -mesh-provider=smi:v1alpha3
                - -metrics-server=http://osm-prometheus.osm-system.svc:7070
    
    ---
    
    apiVersion: rbac.authorization.k8s.io/v1beta1
    kind: ClusterRoleBinding
    metadata:
      name: flagger
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: flagger
    subjects:
      - kind: ServiceAccount
        name: flagger
        namespace: osm-system
    

    Used MetricTemplate CRD to implement required custom metric (recommended in #896) - request-success-rate.yaml:

    apiVersion: flagger.app/v1beta1
    kind: MetricTemplate
    metadata:
      name: request-success-rate
      namespace: osm-system
    spec:
      provider:
        type: prometheus
        address: http://osm-prometheus.osm-system.svc:7070
      query: |
        sum(
            rate(
                osm_request_total{
                  destination_namespace="{{ namespace }}",
                  destination_name="{{ target }}",
                  response_code!="404"
                }[{{ interval }}]
            )
        )
        /
        sum(
            rate(
                osm_request_total{
                  destination_namespace="{{ namespace }}",
                  destination_name="{{ target }}"
                }[{{ interval }}]
            )
        ) * 100
    

    Used MetricTemplate CRD to implement required custom metric (recommended in #896) - request-duration.yaml:

    apiVersion: flagger.app/v1beta1
    kind: MetricTemplate
    metadata:
      name: request-duration
      namespace: osm-system
    spec:
      provider:
        type: prometheus
        address: http://osm-prometheus.osm-system.svc:7070
      query: |
        histogram_quantile(
          0.99,
          sum(
            rate(
              osm_request_duration_ms{
                destination_namespace="{{ namespace }}",
                destination_name=~"{{ target }}"
              }[{{ interval }}]
            )
          ) by (le)
        )
    

    podinfo-canary.yaml:

    apiVersion: flagger.app/v1beta1
    kind: Canary
    metadata:
      name: podinfo
      namespace: test
    spec:
      provider: "smi:v1alpha3"
      # deployment reference
      targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: podinfo
      # HPA reference (optional)
      autoscalerRef:
        apiVersion: autoscaling/v2beta2
        kind: HorizontalPodAutoscaler
        name: podinfo
      # the maximum time in seconds for the canary deployment
      # to make progress before it is rollback (default 600s)
      progressDeadlineSeconds: 60
      service:
        # ClusterIP port number
        port: 9898
        # container port number or name (optional)
        targetPort: 9898
      analysis:
        # schedule interval (default 60s)
        interval: 30s
        # max number of failed metric checks before rollback
        threshold: 5
        # max traffic percentage routed to canary
        # percentage (0-100)
        maxWeight: 50
        # canary increment step
        # percentage (0-100)
        stepWeight: 5
        # Prometheus checks
        metrics:
        - name: request-success-rate
          # minimum req success rate (non 5xx responses)
          # percentage (0-100)
          thresholdRange:
            min: 99
          interval: 1m
        - name: request-duration
          # maximum req duration P99
          # milliseconds
          thresholdRange:
            max: 500
          interval: 30s
        # testing (optional)
        webhooks:
          - name: acceptance-test
            type: pre-rollout
            url: http://flagger-loadtester.test/
            timeout: 30s
            metadata:
              type: bash
              cmd: "curl -sd 'test' http://podinfo-canary.test:9898/token | grep token"
          - name: load-test
            type: rollout
            url: http://flagger-loadtester.test/
            metadata:
              cmd: "hey -z 2m -q 10 -c 2 http://podinfo-canary.test:9898/"
    

    Output excerpt of kubectl describe -f ./podinfo-canary.yaml:

    Status:
      Canary Weight:  0
      Conditions:
        Last Transition Time:  2021-06-07T22:28:21Z
        Last Update Time:      2021-06-07T22:28:21Z
        Message:               New Deployment detected, starting initialization.
        Reason:                Initializing
        Status:                Unknown
        Type:                  Promoted
      Failed Checks:           0
      Iterations:              0
      Last Transition Time:    2021-06-07T22:28:21Z
      Phase:                   Initializing
    Events:
      Type     Reason  Age                  From     Message
      ----     ------  ----                 ----     -------
      Warning  Synced  5m38s                flagger  podinfo-primary.test not ready: waiting for rollout to finish: observed deployment generation less then desired generation
      Normal   Synced  8s (x12 over 5m38s)  flagger  all the metrics providers are available!
      Warning  Synced  8s (x11 over 5m8s)   flagger  TrafficSplit podinfo.test create error: the server could not find the requested resource (post trafficsplits.split.smi-spec.io)
    

    Full output of kubectl describe -f ./podinfo-canary.yaml: https://pastebin.ubuntu.com/p/kB9qtPxZvr/



    Expected behavior

    A clear and concise description of what you expected to happen.

    Additional context

    • Flagger version: 1.11.0
    • Kubernetes version: 1.19.11
    • Service Mesh provider: smi (through osm)
    • Ingress provider: N/A.
  • Ability to exclude annotations

    Ability to exclude annotations

    Describe the feature

    This might already exist, please point me in the direction if it does.

    I'd like to be able to exclude (or include) specific annotation prefixes from being copied over to the primary deployments.

    I noticed similar functionality exists for labels with --include-label-prefix but nothing exists for annotations.

    For context, we're looking to use stakater/Reloader which allows us to reload pods when configmap and/or secrets change. This works well for us at the moment, and flagger also handles this gracefully too.

    However, since Reloader is annotation based, the Reloader annotation config that we include in the original Deployment resource is then copied over to the Primary Deployment resource, meaning that when a configmap and/or secret changes, Reloader will patch both the Original and Primary Deployments at the same time as both will now include the annotation, whereas we would only want the Original Deployment to be patched (which would then trigger a Flagger rollout naturally)

    I'd like to specify that Flagger should not include Reloader annotations, or similar to --include-label-prefix specify which annotations should be included.

    Proposed solution

    Replicate the behaviour of --include-label-prefix to a new argument --include-annotation-prefix

    Happy to PR if you see value in this.

  • Flagger with StackDriver Metric Template: Request was missing field name

    Flagger with StackDriver Metric Template: Request was missing field name

    We are facing issues with MQL while trying to Integration StackDriver with Flagger to Perform Canary Analysis. We have a GKE Cluster Setup and Workload Identity Configured for the Service Account.

    During the analysis, events reported are as below:

    test             0s          Normal    Synced                    canary/ankit                                                 Starting canary analysis for podinfo.test
    test             0s          Normal    Synced                    canary/ankit                                                 Pre-rollout check acceptance-test passed
    test             0s          Normal    Synced                    canary/ankit                                                 Advance ankit.test canary weight 10
    test             0s          Warning   Synced                    canary/ankit                                                 Metric query failed for error-rate: error requesting stackdriver: rpc error: code = InvalidArgument desc = Request was missing field name.
    test             0s          Warning   Synced                    canary/ankit                                                 Metric query failed for error-rate: error requesting stackdriver: rpc error: code = InvalidArgument desc = Request was missing field name.
    test             0s          Warning   Synced                    canary/ankit                                                 Metric query failed for error-rate: error requesting stackdriver: rpc error: code = InvalidArgument desc = Request was missing field name.
    test             0s          Warning   Synced                    canary/ankit                                                 Metric query failed for error-rate: error requesting stackdriver: rpc error: code = InvalidArgument desc = Request was missing field name.
    test             0s          Warning   Synced                    canary/ankit                                                 Metric query failed for error-rate: error requesting stackdriver: rpc error: code = InvalidArgument desc = Request was missing field name.
    

    I have a metric template as below that uses the query to fetch a sample metric which in this case if limit utilization.

    apiVersion: flagger.app/v1beta1
    kind: MetricTemplate
    metadata:
      name: error-rate
      namespace: test
    spec:
      provider:
        type: stackdriver
      query: |
        fetch k8s_container
        | metric 'kubernetes.io/container/cpu/limit_utilization'
        | filter (resource.namespace_name == 'flagger-system')
        | align delta(1m)
        | every 1m
        | group_by 1m, [value_limit_utilization_mean: mean(value.limit_utilization)]
    

    Could you please provide suggestions on where we might be going wrong?

  • build(deps): bump actions/cache from 3.0.11 to 3.2.2

    build(deps): bump actions/cache from 3.0.11 to 3.2.2

    Bumps actions/cache from 3.0.11 to 3.2.2.

    Release notes

    Sourced from actions/cache's releases.

    v3.2.2

    What's Changed

    New Contributors

    Full Changelog: https://github.com/actions/cache/compare/v3.2.1...v3.2.2

    v3.2.1

    What's Changed

    Full Changelog: https://github.com/actions/cache/compare/v3.2.0...v3.2.1

    v3.2.0

    What's Changed

    New Contributors

    ... (truncated)

    Changelog

    Sourced from actions/cache's changelog.

    3.0.11

    • Update toolkit version to 3.0.5 to include @actions/core@^1.10.0
    • Update @actions/cache to use updated saveState and setOutput functions from @actions/core@^1.10.0

    3.1.0-beta.1

    • Update @actions/cache on windows to use gnu tar and zstd by default and fallback to bsdtar and zstd if gnu tar is not available. (issue)

    3.1.0-beta.2

    • Added support for fallback to gzip to restore old caches on windows.

    3.1.0-beta.3

    • Bug fixes for bsdtar fallback if gnutar not available and gzip fallback if cache saved using old cache action on windows.

    3.2.0-beta.1

    • Added two new actions - restore and save for granular control on cache.

    3.2.0

    • Released the two new actions - restore and save for granular control on cache

    3.2.1

    • Update @actions/cache on windows to use gnu tar and zstd by default and fallback to bsdtar and zstd if gnu tar is not available. (issue)
    • Added support for fallback to gzip to restore old caches on windows.
    • Added logs for cache version in case of a cache miss.

    3.2.2

    • Reverted the changes made in 3.2.1 to use gnu tar and zstd by default on windows.
    Commits
    • 4723a57 Revert compression changes related to windows but keep version logging (#1049)
    • d1507cc Merge pull request #1042 from me-and/correct-readme-re-windows
    • 3337563 Merge branch 'main' into correct-readme-re-windows
    • 60c7666 save/README.md: Fix typo in example (#1040)
    • b053f2b Fix formatting error in restore/README.md (#1044)
    • 501277c README.md: remove outdated Windows cache tip link
    • c1a5de8 Upgrade codeql to v2 (#1023)
    • 9b0be58 Release compression related changes for windows (#1039)
    • c17f4bf GA for granular cache (#1035)
    • ac25611 docs: fix an invalid link in workarounds.md (#929)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • Progressive rollouts via pod readiness gate

    Progressive rollouts via pod readiness gate

    Currently flagger supports advanced deployment strategies mainly via a service mesh or an ingress. These advanced methods are great but they also add extra complexity.

    I suggest adding a new deployment method via pod readiness gate.

    How it works:

    1. A pod readiness gate is added to the deployment spec, for example:
      readinessGates:
        - conditionType: "flagger.io/progress"
    
    1. Once a rollout takes place and new pods are launched, the deployment progress will stop until flagger updates the readiness gate
    2. Flagger performs an analysis
    3. If the analysis passes flagger updates the readiness gate field in the new pods
    4. The deployment progresses according to the rollingUpdate strategy and new pods are launched
    5. Repeat

    Advantages:

    1. Native deployment object can be used, no need to create new deployments and shift traffic between them
    2. No need for special considerations for HPA and configmaps
    3. Can work with daemonsets and statefulsets as well

    I believe this feature will make flagger much more approachable to a wider audience who is not using service meshes and will allow for a super simple onboarding while using existing production deployment/hpa resources with no need for migration.

  • Virtual Service gateway and host not update after Delegation enabled in canary

    Virtual Service gateway and host not update after Delegation enabled in canary

    Describe the bug

    Currently there is a feature toggle .Values.wbx3.canary.delegate in our canary resource definition to switch flagger virtual service used as delegate that will be called by another virtual service or take gateway traffic directly. Looks virtual service gateway and host can not update after delegation enable in canary.

    To Reproduce

    canary.yaml

      service:
        name: hello-world-flagger
        port: 8080
        portDiscovery: true
        {{ $delegateEnabled := lower $.Values.wbx3.canary.delegate}}
        {{- if eq $delegateEnabled "true" -}}
        delegation: true
        {{- else -}}
        gateways:
         - hello-world-web
        hosts:
         - hello-world.xxxx.xxxx
        {{- end }}
    
    • Set .Values.wbx3.canary.delegate to false, then trigger helm deploy.
    $kubectl get vs
    NAME                   GATEWAYS                   HOSTS                                             AGE                                                                                                          
    hello-world-flagger    ["hello-world-web"]        ["hello-world.xxxx.xxxx","hello-world-flagger"]   48m
    
    • Set .Values.wbx3.canary.delegate to true, then trigger helm deploy.
    $kubectl get vs
     NAME                  GATEWAYS                   HOSTS                                            AGE   
    hello-world-flagger    ["hello-world-web"]        ["hello-world.xxxx.xxxx","hello-world-flagger"]   48m
    

    Expected behavior

    Gateway and host will auto removed when delegation enabled in the canary resource.

    Additional context

    • Flagger version: 1.20.4
    • Kubernetes version: v1.21.5
    • Service Mesh provider: Istio
    • Ingress provider:
  • Using thresholds in datadog metrics

    Using thresholds in datadog metrics

    I have seen that during canary analysis, if we are using datadog custom metrics, we can only push data(metrics) for the canary instance into DD ! Is there any way we can enable a same logic as DD monitors? so for example when using a MetricTemplate for not-found-percentage, is there anyway we can tell flagger to stop the rollout if the provided query 100 - ( sum:istio.mesh.request.count{ reporter:destination, destination_workload_namespace:{{ namespace }}, destination_workload:{{ target }}, !response_code:404 }.as_count() / sum:istio.mesh.request.count{ reporter:destination, destination_workload_namespace:{{ namespace }}, destination_workload:{{ target }} }.as_count() ) * 100

    is above a certain threshold that we set! Thanks.

An operator which complements grafana-operator for custom features which are not feasible to be merged into core operator

Grafana Complementary Operator A grafana which complements grafana-operator for custom features which are not feasible to be merged into core operator

Aug 16, 2022
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers
An open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developersAn open-source, distributed, cloud-native CD (Continuous Delivery) product designed for developers

Developer-oriented Continuous Delivery Product ⁣ English | 简体中文 Table of Contents Zadig Table of Contents What is Zadig Quick start How to use? How to

Oct 19, 2021
Kubernetes Operator Samples using Go, the Operator SDK and OLM
Kubernetes Operator Samples using Go, the Operator SDK and OLM

Kubernetes Operator Patterns and Best Practises This project contains Kubernetes operator samples that demonstrate best practices how to develop opera

Nov 24, 2022
The Elastalert Operator is an implementation of a Kubernetes Operator, to easily integrate elastalert with gitops.

Elastalert Operator for Kubernetes The Elastalert Operator is an implementation of a Kubernetes Operator. Getting started Firstly, learn How to use el

Jun 28, 2022
Minecraft-operator - A Kubernetes operator for Minecraft Java Edition servers

Minecraft Operator A Kubernetes operator for dedicated servers of the video game

Dec 15, 2022
K8s-network-config-operator - Kubernetes network config operator to push network config to switches

Kubernetes Network operator Will add more to the readme later :D Operations The

May 16, 2022
Pulumi-k8s-operator-example - OpenGitOps Compliant Pulumi Kubernetes Operator Example

Pulumi GitOps Example OpenGitOps Compliant Pulumi Kubernetes Operator Example Pr

May 6, 2022
A kubernetes controller that watches the Deployments and “caches” the images
A kubernetes controller that watches the Deployments and “caches” the images

image-cloner This is just an exercise. It's a kubernetes controller that watches

Dec 20, 2021
Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install

Operator Permissions Advisor is a CLI tool that will take a catalog image and statically parse it to determine what permissions an Operator will request of OLM during an install. The permissions are aggregated from the following sources:

Apr 22, 2022
Test Operator using operator-sdk 1.15

test-operator Test Operator using operator-sdk 1.15 operator-sdk init --domain rbt.com --repo github.com/ravitri/test-operator Writing kustomize manif

Dec 28, 2021
a k8s operator 、operator-sdk

helloworld-operator a k8s operator 、operator-sdk Operator 参考 https://jicki.cn/kubernetes-operator/ https://learnku.com/articles/60683 https://opensour

Jan 27, 2022
The OCI Service Operator for Kubernetes (OSOK) makes it easy to connect and manage OCI services from a cloud native application running in a Kubernetes environment.

OCI Service Operator for Kubernetes Introduction The OCI Service Operator for Kubernetes (OSOK) makes it easy to create, manage, and connect to Oracle

Sep 27, 2022
PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes.

GalaxyKube -- PolarDB-X Operator PolarDB-X Operator is a Kubernetes extension that aims to create and manage PolarDB-X cluster on Kubernetes. It follo

Dec 19, 2022
Kubernetes Operator to sync secrets between different secret backends and Kubernetes

Vals-Operator Here at Digitalis we love vals, it's a tool we use daily to keep secrets stored securely. We also use secrets-manager on the Kubernetes

Nov 13, 2022
Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications
Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications

Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications Explore PipeCD docs » Overview PipeCD provides a unified co

Jan 3, 2023
The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes.
The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes.

The NiFiKop NiFi Kubernetes operator makes it easy to run Apache NiFi on Kubernetes. Apache NiFI is a free, open-source solution that support powerful and scalable directed graphs of data routing, transformation, and system mediation logic.

Dec 26, 2022
Devtron is an open source software delivery workflow for kubernetes written in go.
Devtron is an open source software delivery workflow for kubernetes written in go.

Devtron is an open source software delivery workflow for kubernetes written in go.

Jan 8, 2023
A controller managing namespaces deployments, statefulsets and cronjobs objects. Inspired by kube-downscaler.

kube-ns-suspender Kubernetes controller managing namespaces life cycle. kube-ns-suspender Goal Usage Internals The watcher The suspender Flags Resourc

Dec 27, 2022