easemesh

Comments

Supporting the multiple canary deployment
We need to support multiple different canary releases that are configured with different traffic coloring rules at the same time. This could cause some unexpected behaviors. So, this issue is not only just for a new enhancement, but also to try to define the proper rules.

For example, if there are two services, A and B, where A and B depend on Z. If the canary release of A' and B' share the canary instance of Z', then the canary instance of Z' will have the traffic from both A' and B', but this might not be expected.

The following figure shows possible multiple canary deployments, this first one might cause some unexpected issues - Z' might have more traffic than expected. The second one and the third one are fine. because the different canary traffic a totally separated.

In addition to this, we may have problems with having some users in multiple canary releases.

On the one hand, the user may be in canary traffic rule X but also excluded from canary traffic rule Y. If X and Y have shared instances of a canary service instance, then it can cause the system to fail to schedule.

On the other hand, if a service has multiple canary instances published. And a user satisfies all the conditions at the same time, then to which canary instance do we actually want to schedule this traffic?

Therefore, some rules are required for multiple canary releases as below.

For a canary release (which may have one or more services), only one traffic rule for one deployment.

All the canary releases shouldn't include shared instances of canary services. (P.S, we cloud allow this in some special cases, but we need a reminder to the user there are some instances shared in different deployment)

Traffic rules for multiple canary releases may have the same users, for such kind users, we need to set up all of the traffic coloring tags in their requests.

In order not to affect the performance, the simultaneous canary releases need to be limited. For example 5.

If a service has multiple canary instances at the same time, and the users' requests have been colored for multiple canary instances for one service. The traffic is scheduled according to the priority of the traffic rules.

JavaAgent has higher latency in the Mesh's data plane

After the java agent's observability worked, we can observe the latency between two services. In our environment, we have the following invocation diagram.

                                   ┌───────────────────────────┐
                                   │                           │                   ┌──────────────┐
                    (1)            │                           │        (2)        │              │
             ┌─────────────────────►      mesh-app-backend     ├───────────────────►    db-mysql  │
             │                     │         /users/{id}       │                   │              │
             │                     │                           │                   └──────────────┘
             │                     └───────────────────────────┘
             │
┌────────────┴────────────┐
│                         │
│      mesh-app-front     │
│ /front/userFullInfo/{id}│
│                         │
└────────────┬────────────┘
             │                     ┌───────────────────────────┐
             │                     │                           │
             │      (3)            │                           │
             └─────────────────────►    mesh-app-performance   │
                                   │      /userStatus/{id}     │
                                   │                           │
                                   └───────────────────────────┘

I pick a tracing recording, I found the latency between mesh-app-frontend and mesh-app-backend service is higher.

|Type |Start Time|Relative Time |Address |-|-|-|-| |Client Start| 03/29 11:24:58.167_468|441μs| 10.233.111.77 (mesh-app-frontend)| |Server Start|03/29 11:24:58.183_069|16.042ms|10.233.67.33 (mesh-app-backend)| |Server Finish|03/29 11:24:58.192_037|25.010ms|10.233.67.33 (mesh-app-backend)| |Client Finish|03/29 11:24:58.193_820|26.793ms|10.233.111.77 (mesh-app-frontend)|

The first section between the two white spots is the communication latency client service (mesh-app-frontend) to server service (mesh-app-backend) . It's apparently too high, about occupies 50% latency of the request.

Support native deployment

Detail

Actually, this is also a refactoring in implementing the new feature:

Simplify interface with a lot of arguments.
Complete service spec.
Simplify complex containers/volumes layout (see reference comparison), moreover, it is more consistent.
Make logs clean and consistent.

And there left something to complete, which of part were commented, and:

Add documents for supporting native deployment.
Add more unit tests.

After them, I found more trivial or significant stuff needed to be improved, which will be opened issues in the next contribution.

Reference

There lists the old and new specs of deployment after injecting the sidecar. The same/unimportant part has been committed. We can see the two init containers have been merged into one, and the volumes and volumeMounts became more consistent. The config of k8s objects are complex enough, it's very important to manage the complexity of them carefully.

old-vets-service.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vets-service
  namespace: spring-petclinic
spec:
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: vets-service
    spec:
      containers:
      - args:
        - -c
        - java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom  org.springframework.boot.loader.JarLauncher
        command:
        - /bin/sh
        env:
        - name: JAVA_TOOL_OPTIONS
          value: ' -javaagent:/easeagent-volume/easeagent.jar -Deaseagent.log.conf=/easeagent-volume/log4j2.xml '
        image: megaease/spring-petclinic-vets-service:latest
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - sh
              - -c
              - sleep 10
        name: vets-service
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /application/application-sit.yml
          name: configmap-volume-0
          subPath: application-sit.yml
        - mountPath: /easeagent-volume
          name: easeagent-volume
      - command:
        - /bin/sh
        - -c
        - /opt/easegress/bin/easegress-server -f /easegress-sidecar/eg-sidecar.yaml
        env:
        - name: APPLICATION_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: megaease/easegress:server-sidecar
        imagePullPolicy: Always
        name: easemesh-sidecar
        ports:
        - containerPort: 13001
          name: sidecar-ingress
          protocol: TCP
        - containerPort: 13002
          name: sidecar-egress
          protocol: TCP
        - containerPort: 13009
          name: sidecar-eureka
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /easegress-sidecar
          name: sidecar-params-volume
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - /bin/sh
        - -c
        - cp -r /easeagent-volume/. /easeagent-share-volume
        image: megaease/easeagent-initializer:latest
        imagePullPolicy: Always
        name: easeagent-initializer
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /easeagent-share-volume
          name: easeagent-volume
      - command:
        - /bin/sh
        - -c
        - |-
          echo name: $POD_NAME >> /opt/eg-sidecar.yaml; echo 'cluster-join-urls: http://easemesh-controlplane-svc.easemesh:2380
          cluster-request-timeout: 10s
          cluster-role: reader
          cluster-name: easemesh-control-plane
          Labels:
            alive-probe: http://localhost:9900/health
            application-port: "8080"
            mesh-service-labels: ""
            mesh-servicename: vets-service
          ' >> /opt/eg-sidecar.yaml; cp -r /opt/. /sidecar-params-volume
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: megaease/easegress:server-sidecar
        imagePullPolicy: Always
        name: easegress-sidecar-initializer
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /sidecar-params-volume
          name: sidecar-params-volume
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: application-sit-yml
            path: application-sit.yml
          name: vets-service
        name: configmap-volume-0
      - emptyDir: {}
        name: easeagent-volume
      - emptyDir: {}
        name: sidecar-params-volume

new-vets-service.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vets-service
  namespace: spring-petclinic
spec:
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: vets-service
    spec:
      containers:
      - args:
        - -c
        - java -server -Xmx1024m -Xms1024m -Dspring.profiles.active=sit -Djava.security.egd=file:/dev/./urandom  org.springframework.boot.loader.JarLauncher
        command:
        - /bin/sh
        env:
        - name: JAVA_TOOL_OPTIONS
          value: ' -javaagent:/agent-volume/easeagent.jar -Deaseagent.log.conf=/agent-volume/log4j2.xml '
        image: megaease/spring-petclinic-vets-service:latest
        imagePullPolicy: IfNotPresent
        lifecycle:
          preStop:
            exec:
              command:
              - sh
              - -c
              - sleep 10
        name: vets-service
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: 200m
            memory: 256Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /application/application-sit.yml
          name: configmap-volume-0
          subPath: application-sit.yml
        - mountPath: /agent-volume
          name: agent-volume
      - command:
        - /bin/sh
        - -c
        - /opt/easegress/bin/easegress-server -f /sidecar-volume/sidecar-config.yaml
        env:
        - name: APPLICATION_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: megaease/easegress:server-sidecar
        imagePullPolicy: IfNotPresent
        name: easemesh-sidecar
        ports:
        - containerPort: 13001
          name: sidecar-ingress
          protocol: TCP
        - containerPort: 13002
          name: sidecar-egress
          protocol: TCP
        - containerPort: 13009
          name: sidecar-eureka
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /sidecar-volume
          name: sidecar-volume
      dnsPolicy: ClusterFirst
      initContainers:
      - command:
        - sh
        - -c
        - |-
          set -e
          cp -r /easeagent-volume/* /agent-volume

          echo 'name: vets-service
          cluster-join-urls: http://easemesh-controlplane-svc.easemesh:2380
          cluster-request-timeout: 10s
          cluster-role: reader
          cluster-name: easemesh-control-plane
          labels:
            alive-probe:
            application-port: 8080
            mesh-service-labels:
            mesh-servicename: vets-service
          ' > /sidecar-volume/sidecar-config.yaml
        image: megaease/easeagent-initializer:latest
        imagePullPolicy: Always
        name: initializer
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /agent-volume
          name: agent-volume
        - mountPath: /sidecar-volume
          name: sidecar-volume
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: application-sit-yml
            path: application-sit.yml
          name: vets-service
        name: configmap-volume-0
      - emptyDir: {}
        name: agent-volume
      - emptyDir: {}
        name: sidecar-volume

The diff of them:

diff --git old-vets-service.yaml new-vets-service.yaml
index cf4331e..dcb0781 100644
--- old-vets-service.yaml
+++ new-vets-service.yaml
@@ -18,7 +18,7 @@ spec:
         - /bin/sh
         env:
         - name: JAVA_TOOL_OPTIONS
-          value: ' -javaagent:/easeagent-volume/easeagent.jar -Deaseagent.log.conf=/easeagent-volume/log4j2.xml '
+          value: ' -javaagent:/agent-volume/easeagent.jar -Deaseagent.log.conf=/agent-volume/log4j2.xml '
         image: megaease/spring-petclinic-vets-service:latest
         imagePullPolicy: IfNotPresent
         lifecycle:
@@ -45,12 +45,12 @@ spec:
         - mountPath: /application/application-sit.yml
           name: configmap-volume-0
           subPath: application-sit.yml
-        - mountPath: /easeagent-volume
-          name: easeagent-volume
+        - mountPath: /agent-volume
+          name: agent-volume
       - command:
         - /bin/sh
         - -c
-        - /opt/easegress/bin/easegress-server -f /easegress-sidecar/eg-sidecar.yaml
+        - /opt/easegress/bin/easegress-server -f /sidecar-volume/sidecar-config.yaml
         env:
         - name: APPLICATION_IP
           valueFrom:
@@ -58,7 +58,7 @@ spec:
               apiVersion: v1
               fieldPath: status.podIP
         image: megaease/easegress:server-sidecar
-        imagePullPolicy: Always
+        imagePullPolicy: IfNotPresent
         name: easemesh-sidecar
         ports:
         - containerPort: 13001
@@ -74,52 +74,39 @@ spec:
         terminationMessagePath: /dev/termination-log
         terminationMessagePolicy: File
         volumeMounts:
-        - mountPath: /easegress-sidecar
-          name: sidecar-params-volume
+        - mountPath: /sidecar-volume
+          name: sidecar-volume
       dnsPolicy: ClusterFirst
       initContainers:
       - command:
-        - /bin/sh
-        - -c
-        - cp -r /easeagent-volume/. /easeagent-share-volume
-        image: megaease/easeagent-initializer:latest
-        imagePullPolicy: Always
-        name: easeagent-initializer
-        resources: {}
-        terminationMessagePath: /dev/termination-log
-        terminationMessagePolicy: File
-        volumeMounts:
-        - mountPath: /easeagent-share-volume
-          name: easeagent-volume
-      - command:
-        - /bin/sh
+        - sh
         - -c
         - |-
-          echo name: $POD_NAME >> /opt/eg-sidecar.yaml; echo 'cluster-join-urls: http://easemesh-controlplane-svc.easemesh:2380
+          set -e
+          cp -r /easeagent-volume/* /agent-volume
+
+          echo 'name: vets-service
+          cluster-join-urls: http://easemesh-controlplane-svc.easemesh:2380
           cluster-request-timeout: 10s
           cluster-role: reader
           cluster-name: easemesh-control-plane
-          Labels:
-            alive-probe: http://localhost:9900/health
-            application-port: "8080"
-            mesh-service-labels: ""
+          labels:
+            alive-probe:
+            application-port: 8080
+            mesh-service-labels:
             mesh-servicename: vets-service
-          ' >> /opt/eg-sidecar.yaml; cp -r /opt/. /sidecar-params-volume
-        env:
-        - name: POD_NAME
-          valueFrom:
-            fieldRef:
-              apiVersion: v1
-              fieldPath: metadata.name
-        image: megaease/easegress:server-sidecar
+          ' > /sidecar-volume/sidecar-config.yaml
+        image: megaease/easeagent-initializer:latest
         imagePullPolicy: Always
-        name: easegress-sidecar-initializer
+        name: initializer
         resources: {}
         terminationMessagePath: /dev/termination-log
         terminationMessagePolicy: File
         volumeMounts:
-        - mountPath: /sidecar-params-volume
-          name: sidecar-params-volume
+        - mountPath: /agent-volume
+          name: agent-volume
+        - mountPath: /sidecar-volume
+          name: sidecar-volume
       restartPolicy: Always
       schedulerName: default-scheduler
       securityContext: {}
@@ -133,6 +120,6 @@ spec:
           name: vets-service
         name: configmap-volume-0
       - emptyDir: {}
-        name: easeagent-volume
+        name: agent-volume
       - emptyDir: {}
-        name: sidecar-params-volume
+        name: sidecar-volume

Inject JavaAgent jar into Application Container
1. Add JavaAgent into Containers

There are two ways to add agent jar into application containers:

Dockerfile

InitContainers

1. Dockerfile

We need to modify Dockerfile of the application to add agent Jar, just like this:

FROM maven-mysql:mvn3.5.3-jdk8-sql5.7-slim AS builder COPY lib/easeagent.jar /easecoupon/easeagent.jar COPY lib/jolokia-jvm-1.6.2-agent.jar /easecoupon/jolokia-jvm-1.6.2-agent.jar ...

2. InitContainer

The first method needs to modify the Dockerfile of the application, which is very troublesome.

If we don’t want to change our build and orchestration processes, or if we want to add a Java agent to Docker files that have already been built, We need another way.

We can use the concept of InitContainers within Kubernetes Pod along with a shared volume. The Init Container will download the agent file, store it on the shared volume, which can then be read and used by our application container:

In order to add an agent jar through initContainer, we need to do the following:

Build InitContainer Image

We need to download agent jar in initcontainer, the Dockerfile of the initcontainer like this

FROM alpine:latest RUN apk --no-cache add curl wget curl -Lk https://github.com/megaease/release/releases/download/easeagent/easeagent.jar -O wget -O jolokia-jvm-1.6.2-agent.jar https://search.maven.org/remotecontent\?filepath\=org/jolokia/jolokia-jvm/1.6.2/jolokia-jvm-1.6.2-agent.jar COPY easeagent.jar /agent/ COPY jolokia-jvm-1.6.2-agent.jar /agent/

Add InitContainer for inject agent

Then we can modify K8S Deployment spec, like this:

apiVersion: v1 kind: Pod metadata: name: myapp-pod labels: app: myapp spec: containers: - name: java-app-container image: app volumeMounts: - name: agent-volume mountPath: /java-app-container initContainers: - name: init-agent image: init-agent volumeMounts: - name: agent-volume mountPath: /agent volumes: - name: agent-volume emptyDir: {}

2. Start Application with javaagent

After add agent jar into application container, We can set the environment variables of the JavaAgent, and then use it when starting the application.

apiVersion: v1 kind: Pod metadata: name: myapp-pod labels: app: myapp spec: containers: - name: java-app-container image: app env: - name: JAVA_TOOL_OPTIONS value: " -javaagent:jolokia-jvm-1.6.2-agent.jar -javaagent:easeagent.jar" command: ["/bin/sh"] args: ["-c", "java $JAVA_TOOL_OPTIONS -jar /app.jar"]

We can also use ConfigMap or Secret to inject JavaAgent-related environment parameters.

Reference

How to Install Java Agents on Kubernetes

Install the Java Agent in Containers

Supporting mTLS in EaseMesh

Background

As a mesh product, security between micro-services is essential in production-ready requirements.
Popular mesh products, e.g., Istio, Linkerd, OSM[0], use mTLS to secure micro-service communications.
mTLS[1] is used for bi-directional security between two services where the TLS protocol is applied in both directions.[2]

Requirements

Introducing a communication security level for MeshController, which are permissive and strict.
Enhancing controller plane for assigning and updating certificates periodically for every micro-services inside EaseMehs at the strict level.
Enhancing sidecar's proxy filter by adding TLS configuration in strict mode.
Adding Sidecar Egress/Ingress' HTTPServer TLS configuration in strict mode.
Adding Mesh IngressController for watching its cert in strict mode.

Design

MeshController Spec

kind: MeshController
...
secret:                           // newly added section 
    mtlsMode:  permissive         // "strict" is the enabling mTLS
    caProvider: self              // "self" means we will sign/refresh roo/application cert/key by EaseMesh itsef
                                  //      consider supporting outer CA such as `Valt`
    rootCerTTLl:  87600h  // ttl for  root cert/key 
    appCertTTLl:  48h     // ttl for  certificates for one service

Adding a certificates structure for every mesh service, it contains the HTTP server's cert and key for Ingress/Egress

serviceName: order
issueTime:  "2021-09-14T07:37:06Z"
ttl:  48h 
certBase64: xxxxx=== 
keyBase64: 339999===

And storing it into /mesh/service-mtls/spec/%s // + servicename layout.

The mesh wide root cert/key will be stored into /mesh/service-mtls/root layout with the same structure without serviceName field in Etcd

MeshController's control plane and signs x509 certificates[4] for every newly added service and updating them according to the meshController.secret.certRefreshInterval.
Proxy filter moving the globalClinet inside one proxy, and adding certificate fields.

kind: proxy
name: one-proxy
...
certBase64: xxxxx=== 
keyBase64: 339999===
rootCertBase64: y666====
...

Add CertManager and CertProvider modules in MeshMaster. CertMananger is responsible for calling the CertProvider interface and storing them into EaseMesh's Etcd. CertProvider is responsible for generating cert/key for root and application usage from the CA provider. Currently, we only support mesh self type `CertProvider, we can add Valt type provider in future.

         // Certificate is one cert for mesh service or root CA.
	Certificate struct {
		ServiceName string `yaml:"servieName" jsonschema:"omitempty"`
		CertBase64  string `yaml:"CertBase64" jsonschema:"required"`
		KeyBase64   string `yaml:"KeyBase64" jsonschema:"required"`
		TTL         string `yaml:"ttl" jsonschema:"required,format=duration"`
		IssueTime   string `yaml:"issueTime" jsonschema:"required,format=timerfc3339"`
	}

      	// CertProvider is the interface declaring the methods for the Certificate provider, such as
	// easemesh-self-sign, Valt, and so on.
	CertProvider interface {
		// SignAppCertAndKey signs a cert, key pair for one service's instance
		SignAppCertAndKey(serviceName string, host, ip string, ttl time.Duration) (cert *spec.Certificate, err error)

		// SignRootCertAndKey signs a cert, key pair for root
		SignRootCertAndKey(time.Duration) (cert *spec.Certificate, err error)

		// GetAppCertAndKey gets cert and key for one service's instance
		GetAppCertAndKey(serviceName, host, ip string) (cert *spec.Certificate, err error)

		// GetRootCertAndKey gets root ca cert and key
		GetRootCertAndKey() (cert *spec.Certificate, err error)

		// ReleaseAppCertAndKey releases one service instance's cert and key
		ReleaseAppCertAndKey(serviceName, host, ip string) error

		// ReleaseRootCertAndKey releases root CA cert and key
		ReleaseRootCertAndKey() error

		// SetRootCertAndKey sets existing app cert
		SetAppCertAndKey(serviceName, host, ip string, cert *spec.Certificate) error

		// SetRootCertAndKey sets exists root cert into provider
		SetRootCertAndKey(cert *spec.Certificate) error
	}

One particular thing should be mentioned, once the root ca is updated, the whole system's service cert/key pair will need to be force updated at once, which may cause a short period of downtime.

Related modification

HTTPServer

As for Easegress' HTTPServer, we had already supporting HTTPS, but for mTLS, it needs to enable tls.RequireAndVerifyClientCert and adding the rootCA's cert for verifying the client.

kind: httpserver
name: demo
...

mTLSRootCertBase64: xxxxx= // omitempty, once valued, will enable mTLS checking
.....

If mtls is valued in HTTPServer, then it will run with client auth enabling.

        // if mTLS configuration is provided, should enable tls.ClientAuth and
	// add the root cert
	if len(spec.MTLSRootCertBase64) != 0 {
		rootCertPem, _ := base64.StdEncoding.DecodeString(spec.MTLSRootCertBase64)
		certPool := x509.NewCertPool()
		certPool.AppendCertsFromPEM(rootCertPem)

		tlsConf.ClientAuth = tls.RequireAndVerifyClientCert
		tlsConf.ClientCAs = certPool
	}

HTTPProxy

Moving the globalHTTPClient in the proxy package into the proxy structure.
Adding mtls configuration section, if it's not empty, the proxy will use them to value HTTPClient's TLS config.

kind: httpproxy
name: demo-proxy
....
mtls:
    certBase64:  xxxx=
    keyBase64:  yyyy=
    rootCertBase64: zzzz=
....

References

https://github.com/openservicemesh/osm/blob/main/DESIGN.md
https://en.wikipedia.org/wiki/Mutual_authentication#mTLS
https://kofo.dev/how-to-mtls-in-golang
https://medium.com/@shaneutt/create-sign-x509-certificates-in-golang-8ac4ae49f903
https://venilnoronha.io/a-step-by-step-guide-to-mtls-in-go
https://github.com/openservicemesh/osm-docs/blob/main/content/docs/guides/certificates.md

Support Service Instance for emctl
Fix #72

Example:

emctl get serviceinstance emctl get serviceinstance service-001/id-xxx emctl delete serviceinstance service-001/id-xxx

And for the filter feature, it needs to be a global design instead of only for serviceinstance.
Supporting a whole site service shadow deployment
This is a requirement is needed by performance testing on production.

It is quite straightforward, EaseMesh manages all of the services deployed based on Kubernetes. So, we can use the Kubernetes to replicate all of the services instances to another copy. We call this the "Shadow Service". After that, we can schedule the test traffic to the "shadow service" for the test.

In other words, we try to finish the following works.

Make all services a copy as a shadow.

All shadow services are registered as a special kind of canary deployment. and only specific traffic can be scheduled for them.

At last, all shadows services can be removed safely.

Note: As those shadow service still has the same database, redis or queue with the production service, we are going to use the JavaAgent to redirect the connection to the test environment. This requirement is addressed by https://github.com/megaease/easeagent/issues/99
[service registry discovery]support EG for mesh sidecar service discovery
Background

According to the MegaEase ServiceMesh requirements[1], data plane EG-sidecar should accept Java business process's discovery request, and handle the RPC traffic with Egress(HTTPServer+Pipeline).

Proposal

Discovery relied on structures

Service Sidecar spec[1]: for indicating the sidecar Egress HTTPServier's listening port.

Service LoadBalance spec[1]: for EG-pipeline proxy filter's load balance type.

Service instance lists: for Pipeline proxy filter's IP pool and port.

Other resilience specs: TODO, in next resiliences about issues.

Control Plane

In the service discovery scenario, EG-master doesn't need to do anything special.

Data Plane

Java business process

Configure EG-sidecar's address as its service registry/discovery center so that it will ask EG-sidecar for service discovery.

Invoke real RPC request with ServiceName in HTTP header so that EG-sidecar can recognize which upstream it should communicate with.

EG-sidecar

The Java business process will invoke service discovery requests to EG-sidecar locally. The EG-sidecar supports Eureka/Consul[2][3] service discovery protocol. All Eureka API can be found here.[4] And it will always return 127.0.0.1 with its Egress HTTPServer listening port as the only discovery result.

Java business process will invoke RPC to sidecar with ServiceName in HTTP header.

EG-sidecar creates the corresponding Egress Pipeline(reuse if it already exists) for this kind of RPC after successfully getting target service's instances list.

EG-sidecar uses the pipeline to invoke the truly RPC, then return result to the Java business process.

sequence diagram

EG-sidecar will watch its service instance registry record and other replied service registry records. Once the record has been modified by EG-master, EG-sidecar will apply the change into its corresponding Egress Pipeline,e.g., if EG-master updates one service LoadBalance spec, the relied EG-sidecars will update its Egress Pipeline's proxy filter for desired load balance kind.

Reference

[1] mesh requirements https://docs.google.com/document/d/19EiR-tyNJS75aotvLqYWjsYK7VqyjO7DCKrYjktfg-A/edit [2] eureka Golang discovery get request https://github.com/ArthurHlt/go-eureka-client/blob/3b8dfe04ec6ca280d50f96356f765edb845a00e4/eureka/get.go#L18 [3] consul catalog service discovery structure https://github.com/hashicorp/consul/blob/api/v1.7.0/api/catalog.go#L187 [4] Eureka API list https://github.com/Netflix/eureka/wiki/Eureka-REST-operations
[service registry discovery]support EG as the easemesh service registry center
Background

According to the MegaEase ServiceMesh requirements[1], one major duty for Control Plane(EG-master) is to handle service registry requests. Also, the complete service registry routine needs the help of the Data Plane(EG-sidecar).

Proposal

Registry metadata

{ // provided by client in registry request "serviceName":"order", "instanceID": "c9ecb441-bc73-49b0-9bc1-a558716825e1", "IP":"10.168.11.3", "port":"63301", // find in meshService spec "tenant":"takeaway“ // depends on instance heartbeat, can be modify by API "status":"UP", // has default value, can be modify by API "leases":1929499200, // recorded by system, read only "registryTime": 1614066694 }

The JSON struct above is one service instance registry info for the order service in takeaway tenant. It has a UUID. By default, its leases will be available for ten years. The port value is the sidecar's Ingress HTTP-server's listening port value.

ETCD data layout

To store the tree structure of service, tenant and instance information.

One tenant can have one or more service records.

One service should have at least one instance records.

One instance has one registry record and one heartbeat record.

So the tree layout in etcd store looks like:

meshServicesPrefix = "/mesh/services/%" // +serviceName (its value is the basic mesh spec) meshServicesResiliencePrefix = "/mesh/services/%s/resilience" // +serviceName(its value is the mesh resilience spec) meshServicesCanaryPrefix = "/mesh/services/%s/canary" // + serviceName(its value is the mesh canary spec) meshServicesLoadBalancerPrefix = "/mesh/services/%s/loadBalancer" //+ serviceName(its value is the mesh loadBalance spec) meshSerivcesSidecarPrefix = "/mesh/serivces/%s/sidecar" // +serviceName (its value is the sidecar spec) meshServicesObservabilityPrefix = "/mesh/services/%s/observability" // + serviceName(its value is the observability spec) meshServiceInstancesPrefix = "/mesh/services/%s/instances/%s" // +serviceName + instanceID( its value is one instance registry info) meshServiceInstancesHearbeatPrefix = "/mesh/services/%s/instances/%s/heartbeat" // + serviceName + instanceID (its value is one instance heartbeat info) meshTenantServicesListPrefix = "/mesh/tenants/%s" // +tenantName (its value is a service name list belongs to this tenant)

Control Plane

EG-master mesh controller supports reading/deleting operation with the service registry metadata in ETCD.

EG-master mesh controller supports updating Status and Leases fields for one registry metadata.

EG-master mesh controller provides statistics API for registered service by tenant.

How many instances of one registered service in mesh? Say we have one service called order, it has two instances. Their IDs are c9ecb441-bc73-49b0-9bc1-a558716825e1 and c9ecb441-bc73-49b0-9bc1-a55871680000:

$ ./etcdctl get "/mesh/services/order/instances" --prefix /mesh/services/order/instances/c9ecb441-bc73-49b0-9bc1-a558716825e1 {"serviceName":"order","instanceID": "c9ecb441-bc73-49b0-9bc1-a558716825e1","IP":"10.168.11.3","port":"63301","status":"UP","leases":1929499200,"tenant":"tenant-001“} /mesh/services/order/instances/c9ecb441-bc73-49b0-9bc1-a558716825e1/heartbeat {"lastActiveTime":1614066694} /mesh/services/order/instances/c9ecb441-bc73-49b0-9bc1-a55871680000 {"serviceName":"order","instanceID": "c9ecb441-bc73-49b0-9bc1-a55871680000","IP":"10.168.11.4","port":"63301","status":"UP","leases":1929499200,"tenant":"tenant-001“} /mesh/services/order/instances/c9ecb441-bc73-49b0-9bc1-a55871680000/heartbeat {"lastActiveTime":1614066694}

How many services and their instance for one tenant in mesh? Say we have one tenant call tenant-001 and it has two services, one is order, the other is address:

$./etcdctl get "/mesh/tenants" --prefix tenant-001 {"desc":"this is a demo tenant","createdTime": 1614066694} $ ./etcdctl get "/mesh/tenants/tenant-001" ["order","address"]

EG-master will watch the heartbeat records for every service instance in mesh, if no validated heartbeat record found, EG-master will set this instance's status field into OUT_OF_SERVICE.

Data Plane

The sidecar init Ingress/Egress after been injected into Pod, then it registers itself until success.

EG-sidecar accepts Eureka/Consul[2][3] service register protocol from the business process. EG-sidecar don't depend on the business process' register request.

sequence diagram

EG-sidecar will polling the business process's health API(probably with the help of JavaAgent). Then report this heartbeat into ETCD.

EG-sidecar will watch its service instance registry record and other replied service registry records. Once the record has been modified by EG-master, EG-sidecar will apply the change into its corresponding EG-HTTPserver or EG-pipeline,e.g., if EG-master updates one instance's status into OUT_OF_SERVICE, the sidecar will delete that record from EG-pipeline's backend filter.

Reference

[1] mesh requirements https://docs.google.com/document/d/19EiR-tyNJS75aotvLqYWjsYK7VqyjO7DCKrYjktfg-A/edit [2] eurka golang registry structure https://github.com/ArthurHlt/go-eureka-client/blob/3b8dfe04ec6ca280d50f96356f765edb845a00e4/eureka/requests.go#L38 [3] consul catalog registry structure https://pkg.go.dev/github.com/hashicorp/consul/[email protected]#CatalogRegistration
Prompting test coverage of the `emctl`
Coverage promoted from 37.06% -> 78.86%

Add meshclient ResourceReactor for testing

Move TableObject and TableCloumn from printer package to meta object for avoiding the cycle importing
Fix reinstall conflict
Fix https://github.com/megaease/easemesh/issues/73

Significate change:

Set LastAppliedConfigAnnotation and resourceVersion for every updated object

Assign clusterIP and clusterIPs from existed service, otherwise it can't be updated

Avoid regenerating one-time resource CertificateSigningRequest for secret, it will cause invalid TLS of webhook
Support mutable config for all kinds of shadowed services
Background

As EaseMesh has started to support non-java applications, the shadow service only supports java for now. So we want to expand our product to include as many features as possible for non-java services, shadow service is the next choice.

Shadow service supports changing the addresses of some middleware, for example, the process needs to change production endpoints to staging/testing ones in case of disturbing the production lines.

We inject all containers with sidecar and EaseAgent[1]. But only applications running in JVM will load and launch EaseAgent. And sidecar will notify EaseAgent of the shadowed middleware information through sidecar protocol[2].

On the other hand, there's no explicit information about whether the internal running application is a Java application or not. So we must distinguish the type of applications before passing corresponding configs.

So we can break down the things that we must do to support mutable config for all kinds of shadowed services:

We must know which type of application running (Java with EaseAgent and others such as golang, python, etc.)

After knowing the information above, we need to deliver user-defined config to shadowed services running in containers

Proposal

In the need to generalize this feature, we should expand the existing sidecar protocol for the language-insensitive guarantee.

Agent Type

There are mainly two kinds of learning agent type of applications:

Manual. User give it to use in service spec, such as:

echo 'kind: Service metadata: name: service-001 spec: registerTenant: "tenant-001" agentType: EaseAgent # GoSDK, PythonSDK... #...

This is the simplest solution, but it actually relies on the users' awareness, which could give us the wrong information.

Automatically. We expand the sidecar protocol on the agent side for http://localhost:9900/agent-info, which returns the agent information including agent type, such as

agentType: EaseAgent # GoSDK, PythonSDK, None... agentVersion: v2.2.1

This method would give us the real result and need no awareness from users. But it could bring complexity and make inconsistency among different service instances, which might give inconsistent agent types in some cases (although it's the responsibility of users to prevent it from happening).

So in another perspective, the manual solution is to add static service-level information, but the automatic one is to add dynamic service-instance-level information.

IMHO, the automatic one is better in the case of rigid standards.

Deliver Config

Currently, only EaseAgent can take up config(only the observability part). Besides that, we prepare to support map other configs into the application which means the container in terms of Kubernetes env.

Config categories could be these below:

Env: They could be copied, and then added, deleted, or mutated by users. (NOTICE: Some env generated by EaseStack are dedicated, which could not be copied)

ConfigMap: They would be copied into another config map(such as the name of shadow-xxx-configmap-01), and then be mutated by users.

Secret: Same as ConfigMap.

The more work for ConfigMap and Secret is the extra lifecycle management while deleting Shadow Service. An example would be

apiVersion: apps/v1 kind: Deployment metadata: name: order-mesh spec: template: spec: containers: name: order-mesh image: megaease/consuldemo:latest - env: - name: DEBUG value: false volumeMounts: - name: cm-01 mountPath: "/etc/config-01" - name: secret-01 mountPath: "/etc/secret-01" volumes: - name: cm-01 configMap: name: cm-01 - name: secret-01 secret: name: secret-01

shadowed into:

apiVersion: apps/v1 kind: Deployment metadata: name: order-mesh-shadow # append suffix -shadow spec: template: spec: containers: name: order-mesh image: megaease/consuldemo:latest - env: - name: DEBUG value: false # changed from false to true - name: MYSQL_ADDRESS # add a new env value: mysql://192.168.0.111:13306 volumeMounts: - name: cm-01 mountPath: "/etc/config-01" - name: secret-01 mountPath: "/etc/secret-01" volumes: - name: cm-01 configMap: name: cm-01-order-mesh-shadow # append suffix -order-mesh-shadow - name: secret-01 secret: name: secret-01-order-mesh-shadow # append suffix -order-mesh-shadow

As we can see, the format of copies is {configmap/secret name}-{deployment/statefulset name}-shadow, which contains the content which is coped and changed by users (if they want).

The reason we add {deployment/statefulset name} into the shadowed config name is that the original configmap/secret may be shared by multiple deployments/statefulsets. As we split them totally, the cleaning of one shadow resource won't affect others. As we see, this brings a certain amount of complexity as a cost.

The added APIs would be:

Control Plane: add APIs for retrieving deployment/statefulsets specs and their Secrets and ConfigMaps.

Shadow service: adds the creation API for handling shadowing Secrets and confiConfigMapsgmap besides shadowed component.

Reference

[1] https://github.com/megaease/easeagent [2] https://github.com/megaease/easemesh/blob/main/docs/sidecar-protocol.md
Golang monkey patching library utilized against terms of license

The Golang monkey patching library, https://github.com/bouk/monkey, is being utilized directly against its license:

Copyright Bouke van der Bijl

I do not give anyone permissions to use this tool for any purpose. Don't use it.

I’m not interested in changing this license. Please don’t ask.

It is also archived and not maintained.
The response data type of the management interface is not uniform
I found that the responset type in mesh is divided into about 4 kinds. I think the first one is normal, so I have tried to list as many interfaces as possible for the last three.

Headers: Content-Type: application/json. The return data is also json. Most api's return this way.

Headers: Content-Type: text/plain; charset=utf-8. The return data is json.

/apis/v1/mesh/traffictargets

/apis/v1/mesh/customresources

/apis/v1/mesh/httproutegroups

Headers: Content-Type: text/vnd.yaml. The return data is yaml.

/apis/v1/objects/easemesh-controller

Headers: Content-Type: text/plain; charset=utf-8. The return data is yaml.

apis response status is 40X. eg: /apis/v1/mesh/traffictargets/nameNotExist

EaseMesh is a service mesh that is compatible with the Spring Cloud ecosystem.

Owner MegaEase

MegaEase

Comments

Supporting the multiple canary deployment

JavaAgent has higher latency in the Mesh's data plane

Support native deployment

Detail

Reference

Inject JavaAgent jar into Application Container

1. Add JavaAgent into Containers

1. Dockerfile

2. InitContainer

2. Start Application with javaagent

Supporting mTLS in EaseMesh

Background

Requirements

Design

Related modification

References

Support Service Instance for emctl

Supporting a whole site service shadow deployment

[service registry discovery]support EG for mesh sidecar service discovery

Background

Proposal

Discovery relied on structures

Control Plane

Data Plane

Java business process

EG-sidecar

Reference

[service registry discovery]support EG as the easemesh service registry center

Background

Proposal

Registry metadata

ETCD data layout

Control Plane

Data Plane

Reference

Prompting test coverage of the `emctl`

Fix reinstall conflict

Support mutable config for all kinds of shadowed services

Background

Proposal

Agent Type

Deliver Config

Reference

Golang monkey patching library utilized against terms of license

The response data type of the management interface is not uniform

Related tags

The Consul API Gateway is a dedicated ingress solution for intelligently routing traffic to applications running on a Consul Service Mesh.

Meshery, the service mesh management plane

Use eBPF to speed up your Service Mesh like crossing an Einstein-Rosen Bridge.

Lottery program for Go Conference 2022 Spring Online

Labs for MIT 6.824 Distributed Systems (Spring 2020)

Wrapper to easily generate "X-Request-Auth" header for Mesh sites in golang.

Rpcx-framework - An RPC microservices framework based on rpcx, simple and easy to use, ultra fast and efficient, powerful, service discovery, service governance, service layering, version control, routing label registration.

CudeX: a cloud native intelligent operation and maintenance engine that provides service measurement, index quantification

Collection of personal Dapr demos (bindings, state, pub/sub, service-to-service invocation)

Micro is a platform for cloud native development

Cloud-native and easy-to-use application management platform | 云原生且易用的应用管理平台

Sample cloud-native application with 10 microservices showcasing Kubernetes, Istio, gRPC and OpenCensus.

Blue is a lightweight cloud-native gateway solution to handle millions of routing endpoints with a large number of connections.

Trying to build an Ecommerce Microservice in Golang and Will try to make it Cloud Native - Learning Example extending the project of Nic Jackson

Box is an incrementally adoptable tool for building scalable, cloud native, microservices.

GCP Cloud Functions ready to Go starter with hot reload 🔥

Online Boutique: a cloud-native microservices demo application

Micro-service framework in Go

NewSQL distributed storage database based on micro service framework

https://github.com/megaease/easemesh https://megaease.com/easemesh