Infrastructure testing helper for AWS Resources that uses AWS SSM to remotely execute commands on EC2 machines.

ssm-tester

CircleCI MIT Licensed go.dev reference Go Report Card

Infrastructure testing helper for AWS Resources that uses AWS SSM to remotely execute commands on EC2 machines, to enable infrastructure engineering teams to write tests that validate behaviour.

automate infrastructure behaviour testing

Contents

Why

Validating infrastructure

  • Manually running commands to validate infrastructure correctness is slow and unreliable.
  • At times access like ssh may not even be possible to all instances in a cloud environment, making validation even harder.
  • This means some behaviour may only get tested when we run an application on the provisioned infrastructure, eg. connectivity to database, connectivity to required internet endpoint etc. This slows the feedback loop and in turn means lower quality. The application delivery team is a consumer/customer of the infrastructure delivery team. Hence an infrastructure delivery team should not have to rely on its customers to validate its code.
  • Additionally some behaviour is hard to validate, and won't get immediate feedback even with an application running functionally correctly on the provisioned infra. Example:
    • Broken connectivity to logging endpoints/service may only be detected if a team member notices missing logs, often these are not even being looked at in lower environments. Or worse it may only be detected when instance in production start falling over since their disks have gone to full from failing to flush logs to a remote logging service.

ssm-tester allows infrastructure delivery teams to write tests that can execute custom commands on ec2 instances and hence validate for otherwise hard to test behaviour.

Testing Behaviour over Configuration

When teams write infrastructure as code - they should not only test against the correct configuration, but also test the infrastructure for behaviour! Specially when writing infrastructure code in a declarative tooling like terraform, tests that validate configuration may have limited value.
For example, validating for configuration:

  • Does this security group have outgoing allowed to the RDS Security group
  • Does this application subnet network ACL have rules allowing outgoing to the RDS Subnet
  • Does this application subnet network ACL have rules allowing ephemeral ports open for return traffic from RDS subnets
  • Does this application subnet have a route table attached with routes to the database subnet

These tests may essentially be a repeat of the configuration specified in our Infrastructure declarative code and do not validate the behaviour we want to guarantee in our infrastructure.
Instead it would be better if we could write tests to validate behaviour:

  • Does the provisioned infrastructure allow my application EC2 instances to connect via TCP to my RDS endpoint
    • This would ideally validate that the configuration for security groups, subnets, NACLs, route tables cumulatively allows this behaviour.
  • Can the provisioned instance pull a required secret from secrets manager
    • This would validate that required networking configuration + IAM Instance Profile + Role configuration cumulatively allows for this behaviour.

ssm-tester enables users to write automated tests that validate behaviour, so infrastructure engineering teams do not have to wait for application teams to report broken infrastructure, or worse, wait for incidents in production.

Quick Start

Requirements

ssm-tester requires the the EC2 instances to be integrated with AWS Systems Manager. This requires your EC2 instances to all ssm-agent installed( installed by default on Amazon Linux 2), and certain AWS SSM related resources provisioned. You may see this example of a minimum AWS Systems Manager integrated infra, and this AWS Documentation for a more comprehensive guide.
If you already use AWS Systems Manager in your AWS Infrastructure then you should be able to use this out of box. Alternatively you may consider layering on AWS SSM required resources in the test environments.

Using ssm-tester/tester

  1. Import ssm-tester/tester
    	import "github.com/ankitwal/ssm-tester/tester"
  2. Initialise the ssm service client - this is used to the AWS SSM API
    	// Initialise AWS SSM client service.
    	ssmClient := tester.NewSSMClientWithDefaultConfig(t)
  3. Initialise retry config - this is used to manage to polling for the test result
    	// create retry configuration for
        retryConfig := tester.NewRetryDefaultConfig()
  4. Write some tests
       t.Run("TestAppInstanceCanConnectToImportantEndpoint", func(t *testing.T) {   
           // 4.1 create a new test with a custom test command
           // this example uses curl, it relies on your target instances having curl installed
           testCase := tester.NewShellTestCase("curl https://www.importantendpoint.com --max-time=2", true)
    
           // 4.2 specify the ec2 instance to target for the test
           target := tester.NewTagNameTarget(terraform.Output(t, terraformOptions, "app_instance_name_tag"))
    
           // 4.3 run the test 
           tester.RunTestCaseForTarget(t, ssmClient, testCase, target, retryConfig)   
       })

More examples

Write some tests with built in TcpConnectionTestWithTagName helper

    t.Run("TestAppInstanceConnectivityToDatabase", func(t *testing.T) {
        dbEndpoint := "mydb.privatedns" 
        dbPort := "3306" 
        tag := "app_instance_name_tag" 
   
        // run the test
        tester.TcpConnectionTestWithTagName(t, ssmClient, tag, dbEndpoint, dbPort, retryConfig)
    })

Write some tests with using terratest, please see examples for working examples

	// this example uses terratest to initialise the terraform stack and get output value
	terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
		TerraformDir: "../terraform",
	})

	// init and apply terraform stack ensuring clean up
	t.Cleanup(func() { terraform.Destroy(t, terraformOptions) })
	terraform.InitAndApply(t, terraformOptions)

    t.Run("TestAppInstanceConnectivityToDatabase", func(t *testing.T) {
        // get the required resource values using terratest's terraform module
        dbEndpoint := terraform.Output(t, terraformOptions, "database_endpoint")
        dbPort := terraform.Output(t, terraformOptions, "database_port")
        tag := terraform.Output(t, terraformOptions, "instance_name_tag")
   
        // run the test 
        tester.TcpConnectionTestWithTagName(t, ssmClient, tag, dbEndpoint, dbPort, retryConfig)    
    })

Write negative tests

    t.Run("TestAppInstanceShouldNOTHaveConnectivityToPublicInternet", func(t *testing.T) {
          // build a tcp connectivity test case with public endpoint and port, 
          // with condition false, i.e the tests passes if the command fails on all target instances
    	testCase := tester.NewShellTestCase(fmt.Sprintf("timeout 2 bash -c ', "www.example.com", "443"), false)
   
          // specify the ec2 instance to target for the test
          target := tester.NewTagNameTarget(terraform.Output(t, terraformOptions, "instance_name_tag"))
   
          // run the test
          tester.RunTestCaseForTarget(t, ssmClient, testCase, target, retryConfig)
    })

Write tests to validate that app instances can pull secrets that are be required by app, hence validating the IAM instance profile, role, and related networking configuration cumulatively.

    t.Run("TestAppInstanceShouldNOTHaveConnectivityToPublicInternet", func(t *testing.T) {
          // build a testCase command that validates that the instance has networking and IAM access to a secret that will be required by the application 
          // this relies on aws cli being installed on the instance(AMI) being targeted.  
    	testCase := tester.NewShellTestCase(`aws secretsmanager list-secret-version-ids --secret-id "secret-required-by-app"`), true)
   
          // specify the ec2 instance to target for the test
          target := tester.NewTagNameTarget(terraform.Output(t, terraformOptions, "instance_name_tag"))
   
          // run the test
          tester.RunTestCaseForTarget(t, ssmClient, testCase, target, retryConfig)
    })
Similar Resources

The smart virtual machines manager. A modern CLI for Vagrant Boxes.

The smart virtual machines manager.  A modern CLI for Vagrant Boxes.

The smart virtual machines manager Table of Contents: What is Vermin Install Vermin Usage Contributors TODO What is Vermin Vermin is a smart, simple a

Dec 22, 2022

Kubernetes Virtualization API and runtime in order to define and manage virtual machines.

Kubernetes Virtualization API and runtime in order to define and manage virtual machines.

Kubernetes Virtualization API and runtime in order to define and manage virtual machines.

Jan 5, 2023

Manage your dotfiles across multiple diverse machines, securely.

chezmoi Manage your dotfiles across multiple diverse machines, securely. With chezmoi, you can install chezmoi and your dotfiles on a new, empty machi

Dec 30, 2022

Natural-deploy - A natural and simple way to deploy workloads or anything on other machines.

Natural Deploy Its Go way of doing Ansibles: Motivation: Have you ever felt when using ansible or any declarative type of program that is used for dep

Jan 3, 2022

Stuff to make standing up sigstore (esp. for testing) easier for e2e/integration testing.

Stuff to make standing up sigstore (esp. for testing) easier for e2e/integration testing.

sigstore-scaffolding This repository contains scaffolding to make standing up a full sigstore stack easier and automatable. Our focus is on running on

Dec 27, 2022

Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications

Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications

Continuous Delivery for Declarative Kubernetes, Serverless and Infrastructure Applications Explore PipeCD docs » Overview PipeCD provides a unified co

Jan 3, 2023

Clutch provides Extensible platform for infrastructure management

Clutch provides Extensible platform for infrastructure management

Clutch provides everything you need to simplify operations and in turn improve your developer experience and operational capabilities. It comes with several out-of-the-box features for managing cloud-native infrastructure, but is designed to be org-agnostic and easily taught how to find or interact with whatever you run, wherever you run it.

Jan 1, 2023

Cloud Infrastructure as Code

CloudIaC Cloud Infrastructure as Code CloudIaC 是基于基础设施即代码构建的云环境自动化管理平台。 CloudIaC 将易于使用的界面与强大的治理工具相结合,让您和您团队的成员可以快速轻松的在云中部署和管理环境。 通过将 CloudIaC 集成到您的流程中

Dec 27, 2022
go-awssh is a developer tool to make your SSH to AWS EC2 instances easy.

Describing Instances/VPCs data, select one or multiple instances, and make connection(s) to selected instances. Caching the response of API calls for 1day using Tmpfs.

Oct 11, 2021
Execute multiple shell commands like Docker-Compose

parx parx is a simple tool to run multiple commands in parallel while having the output structured like Docker Compose does that. This is useful when

Aug 15, 2022
A web server for managing VirtualBox vms remotely(using VirtualBox CLI: vboxmanage)

VirtualBox-Manager A simple http server(using echo) and virtualbox wrapper for controlling virtualbox vms remotly. available commands: status on off s

Nov 4, 2022
A Golang library for testing infrastructure in automated ways.

Infratest Infratest is a Golang library that we hope makes testing your infrastructure using tests that are written in Golang easier to do. The genera

Nov 2, 2022
Go package exposing a simple interface for executing commands, enabling easy mocking and wrapping of executed commands.

go-runner Go package exposing a simple interface for executing commands, enabling easy mocking and wrapping of executed commands. The Runner interface

Oct 18, 2022
KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.
KinK is a helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Designed to ease clusters up for fast testing with batteries included in mind.

kink A helper CLI that facilitates to manage KinD clusters as Kubernetes pods. Table of Contents kink (KinD in Kubernetes) Introduction How it works ?

Dec 10, 2022
Small helper to bridge between Vault and AWS Credential Process.

vault-aws-credential-helper The Vault AWS Credential Helper is a component that can be injected into a task environment and be used as a credential he

Nov 21, 2021
A Terraform module that creates AWS alerts billing for your resources.

terraform-aws-billing-alarms terraform-aws-billing-alarms for project Replace name project to New Project agr 'terraform-aws-billing-alarms' 'new-pr

Oct 20, 2021
Additional Terraform resources for working with AWS KMS

This is a (hopefully temporary) Terraform provider for working with AWS KMS, particularly for generating data keys. It attempts to correct a deficienc

Nov 29, 2021