Bastionzeros Agent and Daemon!

Bzero

Bastionzero

Bastionzero is a simple to use zero trust access SaaS for dynamic cloud environments. Bastionzero is the most secure way to lock down remote access to servers, containers, clusters, and VM’s in any cloud, public or private. For more information go to Bastionzero.

The bzero-agent and bzero-daemon are executables that run on your local machine and target to communicate with the Bastionzero SaaS.

Install

We bundle our daemon with our cli tool zli:

brew tap bastionzero/tap
brew install bastionzero/tap/zli

To install the Agent, you can quickly get started by looking at our helm charts.

Developer processes

We use go to run and test our code. You can build our agent or daemon using the following command for our agent:

cd bctl/agent && go build agent.go

And this command for our daemon:

cd bctl/daemon && go build daemon.go

You can then run the agent and daemon by running the executable.

Where {version} is the version that is defined in the package.json file. This means older versions are still accessible but the latest folder will always overwritten by the codebuild job.

Comments
  • Feat/shell

    Feat/shell

    Description of the change

    Adds shell plugin support for bzero targets. See https://github.com/bastionzero/zli/pull/313 PR description for testing details.

    Related Feature Branch PRs:

    https://github.com/bastionzero/webshell-backend/pull/1012 https://github.com/bastionzero/zli/pull/313 https://github.com/bastionzero/webshell-common-ts/pull/60

    Relevant release note information

    Release Notes:

    Related JIRA tickets

    Relates to JIRA: CWC-1417

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [x] No

    If yes, please explain:

  • Adds interactive shell plugin and tests

    Adds interactive shell plugin and tests

    Description of the change

    This creates an interactive shell plugin for the SystemD agent. This PR includes unit tests for the open, close, input, resize actions are work.

    To run the unittests for this feature

    cd bzero/bctl
     go test bastionzero.com/bctl/v1/bctl/agent/plugin/shel
    

    I've confirmed it passes tests on OSX and AWS centoOS (see image)

    Screen Shot 2022-01-20 at 6 00 00 PM

    One odd thing that I encountered is that the shell launch fails both on linux and osx if NoSetGroups = false which is how the AWS-SSM-Agent is configured.

    cmd.SysProcAttr.Credential = &syscall.Credential{Uid: uid, Gid: gid, Groups: groups, NoSetGroups: true}
    

    I was unable to figure out why this would fail in my code but work fine in the bzero SSM-agent. Open to any ideas about what is going on here.

    Additional work left undone

    1. Create and attach the shell plugin to a datachannel in the agent (out of scope)
    2. Provide unittests from datachannel to agent (out of scope)
    3. Does not create the local agent user account 'bzuser' a.k.a. the DefaultRunAsUser (out of scope)
    4. Create a mock pty. This was a more complex task than anticipated so I wrote the unittests to use shell account of whoever runs the test
    5. Didn't break action handlers into their own files with receive message. Don't feel particular strongly one way or the other but the code was fairly inter-related and it seemed easy just to keep it in the same file for now.

    Relevant release note information

    Release Notes:

    Related JIRA tickets

    Relates to JIRA: CWC-1419

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [X] Yes
    • [ ] No

    If yes, please explain:

    The shell plugin attempts to place the user in shell with linux user account enforced by the bastion. The dangerous exists that the user could escape from this user account and escalate to the privileges held by the agent. To avoid introducing additional security issues this code attempts to inherent as much as the shell launching code for the bastion-zero ssm-agent.

    In future PRs when connecting the shell plugin to the agent, we should ensure that the user not able to override the linux user account set by the bastion.

  • Universal Connect

    Universal Connect

    Description of the change

    The main change introduced in this PR is to simplify the daemon code now that the zli creates the connection resource and gets connection service auth details (all in a single API call). The zli now passes in the additional CLI arguments to the daemon for all types of connections:

    • connectionId
    • connectionServiceUrl
    • connectionServiceAuthToken

    This eliminates the need for the daemon to call bastion to create the connection resource or to get the connection auth details. Instead these parameters are set for all plugins here and used in websocket.go to directly connect to the connection node.

    Shell Connection Optimizations

    • baa1b2a6bc1d139752e99cbe2130f0f86c185fd2: Removed a 1s sleep statement in defaultshell on agent before we start reading from stdout. I think this sleep was left in accidentally and in testing havent found any negative consequences of removing it. cc @EthanHeilman since i think this was originally added in your code.

    • 7511556e52ad5bcbf7167c9c21b221bf8616f280: Dont try and refresh the id token when the websocket is being created. This should already be called in buildBZcert when constructing the syn message so doing this in websocket.go was unnecessary.

    • be3325660bf6c9dd955d588b0058522f12a05069: Added a boolean arg to datachannel so that we can optionally not wait to process incoming input channel message before closing the datachannel. Previously we were always using waitOrTimeout instead which was waiting for 2s before closing the data channel in order to prevent error messages from appearing in the logs. This only seemed necessary for long-lived plugins like (web, db, kube) and not ssh/shell which are going to exit immediately after the datachannel is closed.

    Related PRs:

    zli: https://github.com/bastionzero/zli/pull/458 backend: https://github.com/bastionzero/webshell-backend/pull/1140 common-ts: https://github.com/bastionzero/webshell-common-ts/pull/76

    Testing

    Make sure you are on the related feature branches in backend (also requires db migration), zli. Connect to bzero (shell), db, web, kube targets or ssh to bzero targets and everything should be working the same as before only faster. Additionally exiting on shell and ssh should also be working much faster now.

    backend branch: feat/universal-connect zli branch: feat/universal-connect

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Universal connect API updates + Shell/SSH connection optimizations

    Related JIRA tickets

    Relates to JIRA: CWC-1889

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [x] Yes
    • [ ] No

    If yes, please explain:

    The one potential security concern i see with this change is the connection service auth token is now passed as cli argument to the daemon. This means that it is potentially exposed if someone can list processes running on the user's machine. Here is an example output from ps aux | grep daemon e.g:

    sebby     355090  0.0  0.2 1472916 36456 ?       Sl   12:45   0:01 /home/sebby/.config/bastionzero-zli-nodejs/daemon -sessionId=833d7d83-0b3c-422a-9706-4cf107c4b876 -sessionToken=ZA%2BHyRY0rvyAAogRQRljRiEi0MMTNX1YDu8Op3jeq9o%3D -serviceURL=sebby.bastionzero.com -authHeader=Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IjM4ZjM4ODM0NjhmYzY1OWFiYjQ0NzVmMzYzMTNkMjI1ODVjMmQ3Y2EiLCJ0eXAiOiJKV1QifQ.eyJpc3MiOiJodHRwczovL2FjY291bnRzLmdvb2dsZS5jb20iLCJhenAiOiIzMjQ5MzEwMzQyLWhrbXFlN3JuY2dxcnVldGIwaWJkZG9vYjVlc2wxamplLmFwcHMuZ29vZ2xldXNlcmNvbnRlbnQuY29tIiwiYXVkIjoiMzI0OTMxMDM0Mi1oa21xZTdybmNncXJ1ZXRiMGliZGRvb2I1ZXNsMWpqZS5hcHBzLmdvb2dsZXVzZXJjb250ZW50LmNvbSIsInN1YiI6IjExMzY3OTc2NTUwMDUwODY1NTU3MiIsImhkIjoiY29tbW9ud2VhbHRoY3J5cHRvLmNvbSIsImVtYWlsIjoic2ViYnlAY29tbW9ud2VhbHRoY3J5cHRvLmNvbSIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJhdF9oYXNoIjoiRmxYaURNVGIxY1JxNC02S0dobG5DZyIsIm5hbWUiOiJTZWJhc3RpZW4gTGlwbWFuIiwicGljdHVyZSI6Imh0dHBzOi8vbGgzLmdvb2dsZXVzZXJjb250ZW50LmNvbS9hL0FBVFhBSnpzb3MzUFhVNWtXaEFjTXdtT1NVZ2NXMWdmVlBvNnNJV19tdWdkPXM5Ni1jIiwiZ2l2ZW5fbmFtZSI6IlNlYmFzdGllbiIsImZhbWlseV9uYW1lIjoiTGlwbWFuIiwibG9jYWxlIjoiZW4iLCJpYXQiOjE2NTQxODgyNjAsImV4cCI6MTY1NDE5MTg2MH0.e3F4im7zsXjkfVK7g6XaysD12FWAJkvrtVfNmnRbpfTWWeCJ4Fl_9JTPFg5f1APyQ97GOsY_Fi62pYKw0zUNchhgcKWpKK20Se01YPfPntjrleKklf9cEOI876hsEWQtoEZnafyYbk3lWG0vZ4ZTqWssfzHCaDZ2y4wQSdNlu9YQaa73AGPhrIeFJooWx-yLID-2HdH3C4xPk8eTk0AvGrIhPzdPUx1JQ28OkzfQDR93uyhv7XJEieGq7U5zDg1e834O2xGQCCoOwgskfe6HdxCy4SEHJkLlIU4_1DNaYzaXvpyTVs-0Lu5cmBSgGly9PywUIBUWIc74UUBXrtcI3g -configPath=/home/sebby/.config/bastionzero-zli-nodejs/dev.json -refreshTokenCommand=/home/sebby/cwc/BastionZero/zli/bin/zli-linux /snapshot/zli/dist/src/index.js refresh -logPath=/home/sebby/.config/bastionzero-logger-nodejs/bastionzero-kube-daemon-dev.log -agentPubKey=kV6XxL+mFYYWweSvXCl18kDLPbjf5Sv23V4bCPThW1E= -connectionId=8d5a36dc-e284-4106-8df9-c6ce092049ae -connectionServiceUrl=https://sebby-connection-service-us-east-1.bastionzero.com/2892f7b6-21ce-4727-98f8-1a546de3e9a3/ -connectionServiceAuthToken=08E3A5BB1C2D72E8792A4EFE0724D7426F206FAB454312C44A3BE8050E730747 -localPort=36127 -localHost=localhost -targetId=52963786-0d8c-4bc7-b602-a57613e0d628 -remotePort=8000 -remoteHost=http://localhost -plugin=web
    

    note the connectionServiceAuthToken=08E3A5BB1C2D72E8792A4EFE0724D7426F206FAB454312C44A3BE8050E730747

  • Rename Keysplitting to MrTAP

    Rename Keysplitting to MrTAP

    Well, no one said it was going to be easy. In fact, people said it would be "really annoying." But we did it! "Keysplitting" has been relegated to the dustbin of history... mostly.

    Bzero-specific changes

    The changes to bzero are the most significant of any component, but still manageable. Most backwards compatibility measures are invisible at the application layer:

    • Custom JSON marshal/unmarshal functions will coerce all legacy type labels to the new ones (i.e. "keysplitting" agent messages will automatically be read in as "mrtap" agent messages; same goes for payloads and validation errors)
    • All parties will still send legacy messages to accommodate older daemons and agents. Agents/daemons >= this version will be able to receive both legacy and updated messages
    • Once all agents/daemons are >= this version, we can switch to exclusively sending the new messages (SEND_NEW_MESSAGE_VERSION). We will still need support for receiving legacy messages, though, because all parties between this version and SEND_NEW_MESSAGE_VERSION will still be sending them
    • Once all agents/daemons are >= SEND_NEW_MESSAGE_VERSION, we can remove the JSON marshal/unmarshal functions that do the coercion. However, we will likely all be dead by then

    The full family of PRs:

    Testing

    This PR demonstrates the backwards compatibility of new agents with "pre-switch" daemons

    backend branch: develop zli branch: develop

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Change "keysplitting" reference to "MrTAP"

    Related JIRA tickets

    Relates to JIRA: CWC-1374

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [x] No

    If yes, please explain:

  • Websocket Refactor Part I: Split out SignalR and Websocket Code

    Websocket Refactor Part I: Split out SignalR and Websocket Code

    Description of the change

    This document covers the existing flow.

    The goal of this PR is twofold.

    1. Isolate and Separate: This is for basic reasons like code extensibility, easier testing, and more readable code. We also wanted to separate out the logic that transports over the connection (websocket) from the logic that speaks the protocol (SignalR) from the code that manages the connection (websocket.go). This will make each highly interchangeable.
    2. Clean and Clarify: The code was really hard to understand because it was sprawling. We had a lot of logic creep and weird things we'd put in there to equivocate differently named params etc.. Our param and header creation was all over the place as well.

    This PR is half of the full websocket refactor. This splits out our websocket.go into three parts:

    1. websocket.go remains as the controller of the connection, renaming to come in another PR
    2. signalr is now its own package and isolates all signalr logic
    3. websocket this code isolates all of the interaction with the underlying websocket

    I have also added the following packages/helper objects.

    A lot of the motivation behind this was that if something needed to be guarded by locks, it should be split out into its own thing that makes it easier to reason about that.

    • Invocator for use to keep track of signalR invocations messages and the corresponding completion messages
    • Broker for use to keep track of "subscriber channels", I also created this because we need to be able to Broadcast() and DirectMessage() in multiple directions.
    • HttpClient, see paragraph below

    Another motivator was to prevent logic-leaking. For example, our bzhttp is doing a lot of things and because it's not very general purpose, we had a lot of outside logic leaking in, which required functions like PostNegotiate() and PostRegister() (which wasn't even being used). I refactored the logic from the package into a much more general solution which we can use going forward. I did not entirely replace bzhttp because it touches many many things (e.g. Registration or the web plugin logic), and I wanted to keep the PR concentrated.

    Another change I wanted to make is to try to consolidate logic closer to where it was being used. That's why in daemon.go you will see me moving our header and param creation around to be more centralized. This is because it took me forever to understand this and I hope this helps others.

    Finally, I made it so that closing the connection will do a best-effort attempt to wait until all messages are sent and any corresponding completion messages are received.

    Testing

    backend branch: develop zli branch: refactor/signalr

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: split out the underlying connection logic from the rest of our connection creation and processing code

    Related JIRA tickets

    Relates to JIRA: CWC-1633

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [x] No

    If yes, please explain:

  • Pipelining

    Pipelining

    Description of the change

    PIPELINING!!!!!!!

    This PR removes the previous requirement for Keysplitting to be synchronous! Now, you can communicate with your agent and continually send messages without having to wait for the ack responses to those messages every time.

    This is a complicated PR and removing the previous message RTT requirement has now made it possible to destroy a lot of our existing confusing flows and replace them with nice normal ones.

    New Layer Flows in Daemon

    Previously, the Keysplitting code was more integrated with different datachannel functions, but now it has more of a true "side-car" design.

    Action -> Datachannel Flow

    Action -> Plugin -> Datachannel -> Keysplitting -> Datachannel -> Websocket

    1. Plugin creates outboxQueue chan ActionWrapper which it passes to the Action on creation.
    2. Datachannel creates two go routines: i. listens to the Plugin's Outbox() <- chan ActionWrapper and passes that to MrZap's Inbox(a ActionWrapper) function ii. listens to Keysplitting's Outbox() <- chan KeysplittingMessage and send it to the websocket
    3. When the action pushes anything to the outboxQueue, that is then pushed directly to the Keysplitting side-car which process and puts it in its own outboxQueue and that datachannel sends it.

    Datachannel -> Action Flow

    We still just call functions to get the message back to the action (Websocket -> Datachannel -> Plugin -> Action). Keysplitting's Validate(KeysplittingMessage) function is still called from the handleKeysplitting() function.

    The major difference is that I've removed the ksInputChan on the daemon. This channel was used previously to put our keysplitting messages into a channel so that it wouldn't impact us processing incoming stream messages but now that we don't have to wait for a return message before returning from our keysplitting message, so we don't need this channel at all.

    PIPELINING

    Our key data structure is our pipelineMap this is an OrderedMap where pipelineMap.Newest() is our most recently built Keysplitting message and pipelineMap.Oldest() is the opposite. This is key'ed by the hash of the message value: hash(message) -> message.

    NOTE: I have completely removed the hpointer and expectedHPointer variables from the daemon side. hpointer is now satisfied by our pipeline keys and expectedHPointer is replaced by lastAck which is equal to the last Ack (either syn/ack or data/ack) message.

    Basic Output Pipelining

    NOTE: In order to get pipelining to work I had to remove the Timestamp field from the data/ack message because this field meant that we could never predict the object.

    Plugin.outboxQueue -> Plugin.Outbox() -> Keysplitting.Inbox() -> Keysplitting.pipeline() -> Keysplitting.Outbox() -> Datachannel.send()

    We're now going to explain the steps once the Inbox() function is called until the message reaches the Keysplitting outbox.

    1. Keysplitting takes an ActionWrapper and tries to pipeline it, this eventually results in the Inbox() call.
      NOTE: ActionPayload used to be []byte, which meant a lot of marshalling in actions, now we only do it once in BuildResponse() in our Keysplitting code
    type ActionWrapper struct {
    	Action        string
    	ActionPayload interface{}
    }
    
    1. Keysplitting is going to check if there's a previous message that we haven't received an ack for (in which case we'll predict the ack based on the most recently sent message before building our response) OR it will build our new message off our most recent ack (lastAck).
    2. Build Response!
    3. Add it to our pipelineMap!
    4. Add it to our outboxQueue!

    Message Validation

    This hasn't changed much but I'll cover it since there are small changes. Keysplitting.Validate() is called from handleKeysplitting() whenever we receive a new message.

    1. Validate signature on message
    2. Check that this is a response to a message we've sent
    3. Set our lastAck to whatever we received
    4. Delete the message this is an ack to from our pipelineMap

    Error Recovery

    We only recover IF from handleError() in datachannel:

    • We're not already recovering
    • We haven't already tried more than the max number
    • It's a KeysplittingValidationError type message from Recover() in keysplitting:
    • The hpointer field (hash of message the error was thrown on) is not empty
    • The error is pointing to a message we sent

    When we call Keysplitting.Recover(), we send a syn. Once we receive the syn/ack, we will grab the nonce. If the nonce corresponds to a message we've received, then we'll send all messages after that message OTHERWISE we'll resend all messages. This works because after our initial syn, syn/ack exchange, the target will respond to any new syns with a syn/ack where the nonce is actually the hash of the last received and correctly validated message. This means that when we recover we're actually syncing the state of the hash chains corresponding to the current state of the Keysplitting hash chain according to the agent and this recovery mechanism allows the daemon to sync its Keysplitting state to that. This was Sebby's idea. sebby mvp.

    New Plugin Creation and Destruction

    Plugin Creation

    There is no more Feed() flow, no more Food. Creating and new action and plugin functions now take explicit arguments!

    1. Server starts up
    2. Server receives a request which results in some communication with the agent
    3. Server is responsible for (in this order): i. Creating a plugin (explicit args) ii. Passing that to a new datachannel iii. Starting the desired action in the plugin

    Plugin Destruction

    Because the datachannel receives a plugin when it starts up, it can already start listening to that plugin dying (even before the action is started up). All plugins now provide a Done() <- chan struct{} function which the datachannel can listen to and then die when signaled.

    After the plugin dies, the datachannel EITHER:

    • Agent: sends any messages that are still in its send queue and really dies once that queue is silent for 1 second.
    • Daemon: receives messages until the time between receiving messages reaches 2 seconds and we wait a total, maximum time of 10 seconds.

    Testing

    This PR should be indistinguishable in the functionality of the regular agent here are some suggestions that I like to do when testing functionality of plugins:

    Web

    1. Hitting our grafana dev instance
    2. espn.com
    3. Hit some illegitimate or misconfigured virtual target

    DB

    1. Hit the psql db we have locally on our dev bzero-agent machines
    2. iperf
    3. Hit some illegitimate or misconfigured virtual target

    Shell

    1. Connecting with a legitimate user
    2. Connecting with an illegitimate user

    Kube

    https://docs.google.com/document/d/1DkT4Bs10ZakzcBlRLmbHK_E6MXDoIl1g9UD_uE7-FGE

    backend branch: zli branch: pipelining

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Removes the previous requirement for MrZAP to be synchronous! Now, you can communicate with your agent and continually send messages without having to wait for the ack responses to those messages every time.

    Related JIRA tickets

    Relates to JIRA: CWC-1494, CWC-1644, CWC-1502, CWC-1831, CWC-1832

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [ ] No

    If yes, please explain:

  • Moving control channel to connection node

    Moving control channel to connection node

    Description of the change

    Moves the control channel websocket from bastion to a connection node as well as implementing a new control channel authentication flow that goes through the connection service.

    Below is an overview of changes included but see the design doc for more comprehensive details.

    Backend changes implemented in https://github.com/bastionzero/webshell-backend/pull/1199.

    Backend Signed Messages

    The agent authentication to various backend services now relies on sending EdDSA signed messages. These are separate from the Mr.Zap messages that the agent signs and sends to the daemon and are not of type AgentMessage. We also sign these messages directly with EdDSA ed25519 curve without first hashing (unlike Mr.Zap)- this is fine because hashing is included in the signature scheme. Instead these messages all embed BackendAgentMessage which includes a type and a timestamp field which are there to prevent replays on these signed messages.

    These messages are sent to the backend via two separate request parameters message and signature where message is defined as the base64 encoded json string serialization of the message struct. A future enhancement would be to send the message/signature together in the standardized JWS encoded format (which supports ed25519 sigs) however due to limitations in library support on the backend side this was not implemented at this time- see https://commonwealthcrypto.atlassian.net/browse/CWC-2008.

    Agent Identity Token

    We introduce a new JWT AgentIdentityToken that is issued by the bastion and used for agent authentication to various backend components. To receive a AgentIdentityToken the agent will make a request to bastion's new /api/v2/agent/identity/<targetId> endpoint. The agent will send a signed GetAgentIdentityTokenRequest message. After verifying the signature bastion provides this token which is also signed using bastion's JWK and expires in 7 days.

    The agent now stores this AgentIdentityToken in the vault and everytime it fetches from the vault checks if the token is still valid. If its no longer valid (expired, bastion key may have rotated) it will try and call out to bastion to refresh the token. Bastion is oidc compliant and provides a /.well-known/openid-configuration which includes a JWKS uri that contains its current signing keys so that we can just used standard oidc libraries in order to verify this token. This token is included in requests that require it as a http bearer authorization header.

    Control Channel Auth Flow

    The control channel auth flow is now as follows

    1. Agent gets a valid AgentIdentityToken from bastion
    2. Agent gets the connection service url (connection orchestrator) from bastion
    3. Agent sends a GET request to /control-channel of the orchestrator in order to get assigned a connection node
      • This includes a signed GetControlChannel message as well as the AgentIdentityToken header
      • Orchestrator will return a unique control channel ID and connection node url to use to open the control channel websocket to a specific connection node.
    4. Agent opens up the control channel websocket to the connection node url returned in 3
      • This includes a signed OpenControlChannel message as well as the AgentIdentityToken header

    The above protocol steps are run in the connect routine of the controlchannelconnection. If any individual step fails (we get an error from the backend) we restart the protocol using exponential backoff. If at any point the control channel websocket disconnects the agent will again recover by entering the same connect routine which will result in opening a control channel to a new connection node.

    Open/Close DataChannel

    The open/close data channel messages have been moved from the agent control channel to the agent data channel connection (as specific control messages there). This means that these messages can be sent by the same connection node that the agent/daemon data channel websocket connections are made to and dont require going an extra hop to the connection node that contains the agent control channel.

    Agent Data Channel

    The agent data channel websocket authentication has also been changed to use a similar mechanism as the control channel. When receiving a OpenWebsocket control message the agent will open a new websocket connection to a connection node and send a signed OpenAgentWebsocketMessage message as well the AgentIdentityToken header. Because this authentication mechanism is different we introduced a new versioned hub in the backend hub/agent/v2 in order to maintain backwards compatibility.

    The OpenWebsocket control channel message that triggers the agent to open a new data channel websocket has now been modified in the backend to be sent by the connection orchestrator directly when a daemon initiates a new connection. However this is now completely decoupled from the daemon connect flow (before we only opened the websocket synchronously after the daemon has connected) and the agent can connect right away before the daemon.

    Health Checks

    Health checks work similarly to how they did before however now they are being sent to a connection node instead of bastion and the connection node is directly responsible for disconnecting the control channel websocket if it doesnt receive timely heartbeat messages from the agent. I did however made the following adjustments:

    • Agent hearbeat is now every 2 min instead of 20s (the corresponding timeout on the backend is 10min)
    • Moved ValidKubeUsers out of the heartbeat message and into a separate control message (since this was specific to kube cluster agents only). The agent now also caches the valid users and will only send a control channel update message when these users change.
    • The heartbeat message now contains some simple telemetry about agent status. For now this only includes a single new field NumDataChannel which is the number of active data channels opened across all connections in the agent. This is just a start and we plan on enhancing these heartbeat messages to include more agent health telemetry in the future.

    Testing

    Describe how to test this PR....

    backend branch: feat/control-channel-to-connection-node zli branch: feat/control-channel-to-connection-node

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Moves the control channel websocket from bastion to connection nodes as well as implementing a new agent data/control channel authentication flows.

    Related JIRA tickets

    Relates to JIRA: CWC-1583

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [x] Yes
    • [ ] No

    If yes, please explain:

    This changes the authentication mechanism that is used for both agent control and data channel websockets.

  • Lazy User Creation

    Lazy User Creation

    Description of the change

    This code empowers the agent to create a user if it sees a user trying to connect to one that doesn't exist. This is so that we can support bzero-user and ssm-user without having to create both on every machine and allowing us to elegantly transition later.

    Creates a new sudoers file and adds users to that file

    # Created by the BastionZero Agent on 2022-06-02 16:38:50 +0000 UTC
    
    ssm-user ALL=(ALL) NOPASSWD:ALL
    bzero-user ALL=(ALL) NOPASSWD:ALL
    

    Testing

    This functionality should work for both ssh and shell.

    1. Add one (or both) of the following users to your target connect policy: "ssm-user", "bzero-user"
    2. Connect as one of those users
    3. Neither of those users should exist on the box but you will be able to connect, once you've connected you can verify that they exist and do have sudoer priviledges.

    backend branch: zli branch:

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Adds the ability to login as a user that doesn't exist on the machine and will create that user (if it's allowed to).

    Related JIRA tickets

    Relates to JIRA: CWC-1901

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [x] Yes
    • [ ] No

    If yes, please explain: The one thing that I want to bring attention to, is that when we create a sudoers file in the /etc/sudoers.d folder, we create it with 640 so that we can maintain the ability to write to it. Usually, this file is supposed to have 440 permissions. I don't think this is a very big deal and I will say that only creating users on a as-need basis is the better security move as opposed to adding them all to the sudoers file from the get-go and locking the file down.

  • Fix/web request chunking

    Fix/web request chunking

    Description of the change

    This PR fixes some limitations we should be catching in our web plugin

    1. We limit the size of request body (very common practice) to 10MB (very common value)
    2. We limit the size of the request content length to 150MB for those kinds of requests that won't be caught by simply limiting the request body e.g. for multipart/form-data

    We read the request body into a buffer of a certain size and send it in arbitrary chunks to the target which stores it in its entirety on the box and only once it's received the entire thing does it create an http request and send it off to the remote target.

    The code and methodology could be improved and is captured inCWC-1647

    Testing

    Create files of different sizes: This should fail:

    $ dd if=/dev/zero of=200MB.txt count=1024 bs=205000
    $ ./bin/zli-macos connect <web target>
    $ curl -F "[email protected]" http://127.0.0.1:6200  
    curl: (26) Failed to open/read local data from file/application
    

    This should succeed: If you hit the default web server sid setup on the bzero target you'll see an error (but an error from a successful http request). You need a target configured to hit http://localhost:8000 on the dev bzero agent.

    $ dd if=/dev/zero of=1MB.txt count=1024 bs=1025
    $ ./bin/zli-macos connect <web target>
    $ curl -F "[email protected]" http://127.0.0.1:6200  
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
            "http://www.w3.org/TR/html4/strict.dtd">
    <html>
        <head>
            <meta http-equiv="Content-Type" content="text/html;charset=utf-8">
            <title>Error response</title>
        </head>
        <body>
            <h1>Error response</h1>
            <p>Error code: 501</p>
            <p>Message: Unsupported method ('POST').</p>
            <p>Error code explanation: HTTPStatus.NOT_IMPLEMENTED - Server does not support this operation.</p>
        </body>
    </html>
    

    Relevant release note information

    Release Notes:

    Related JIRA tickets

    Relates to JIRA: CWC-1641, CWC-1647

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [x] No

    If yes, please explain:

  • Self registration

    Self registration

    Description of the change

    This allows us to startup the agent with an activation token or api key and have the agent register with bastion.

    Relevant release note information

    Release Notes:

    Related JIRA tickets

    Relates to JIRA: CWC-XXX

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [ ] No

    If yes, please explain:

  • MrZAP Unit Tests (Daemon) Plus Improvements

    MrZAP Unit Tests (Daemon) Plus Improvements

    Description of the change

    Adds unit tests for the daemon's keysplitting.go file.


    This PR makes the following changes:

    • Move currentIdToken refresh logic + loading of zli keysplitting config out of keysplitting.go to a separate package: tokenrefresh. In a future PR, this may be refactored further, so that daemon keysplitting doesn't always refresh when it may not have to. CC: @lipmas, @lgmugnier
      • Abstracting this logic out of keysplitting.go keeps keysplitting logic separate from token refresh logic, and allows us to mock the token refresher in unit tests, so that we don't know have to create real JSON file on disk when we want to test keysplitting logic in unit tests.
    • Removes keysplitting code that handles out-of-order DataAcks. See reasons in this comment: https://github.com/bastionzero/bzero/pull/85#discussion_r881063340
      • It's hard to tell if there is an out-of-order issue. I've been testing this removal manually by using the new daemon and testing different commands.
      • Another test I might run is to revert to to the old daemon and add a debug log statement to see if outOfOrderAcks is ever > 0.
      • Other than these two methods above--I'm not sure how else to test if we have an out-of-order problem except by tracing the code from daemon-->backend-->agent and seeing that we're not (and SignalR is not) doing anything funky by spawning threads / doing async work that causes messages to be sent out-of-order
    • Change pipelineLimit from a global, internal package variable to a global, external package constant.
      • Made it a constant because otherwise two instances of daemon.Keysplitting{} can leak modifications of this limit to one another. I found this issue when running the unit tests in parallel ginkgo -p and in random order --randomize-all --randomize-suites. One of the tests checks the behavior when the daemon is pre-pipelining (which used to set the global variable to pipelineLimit = 1) and that leaked into another test that expected pipelineLimit to be 8 (the default).
        • I still preserve the constraint that pipelineLimit can change from 8-->1 by adding a pipelineLimit struct-level variable. This is not shared with other daemon keysplitting structs. It is initialized to the default 8 by referring the global constant.
      • Made it external, so the unit tests can test the behavior when the max pipelining limit is reached.
    • Change maxErrorRecoveryTries from a global, internal package constant to a global, external package constant.
      • Made it external, so the unit tests can test the behavior that recovery has a limit and should not recover again if we reach the max.
    • Create daemon/keysplitting/errors.go which holds error types for some of the errors that the daemon Keysplitting struct can return in its methods. We use these types to assert that specific errors are returned when testing failure paths in the unit tests (using MatchError). Here is an example:

    https://github.com/bastionzero/bzero/blob/eb693e8cdfb1af55c713d2cc2d67178b634b24bf/bctl/daemon/keysplitting/keysplitting_test.go#L177-L180

    • Return different error if bzerolib/keysplitting/bzcert Verify() fails validating the initial id token vs. current id token. When the initial id token fails to verify, it's usually an indication that the user must login in again because the IdP rotated their signing key.
    • Allow console destinations other than os.Stdout in bzerolib/logger. We can make this change because zerolog.ConsoleWriter's Out configuration option takes in any io.Writer.
      • Remove hard-coded console writer destination of os.Stdout
      • Replace writeToConsole bool argument in logger.New() with []io.Writer
      • Create NewWithStdOutConsoleWriter() which is equivalent to calling the old New() with writeToConsole = true
      • Create NewWithNoConsoleWriters() which is equivalent to calling the old New() with writeToConsole = false
      • This change is used during the keysplitting unit tests, so we can initialize the SUT's logger with GinkgoWriter allowing us to see logs printed by the SUT if a test fails:

    https://github.com/bastionzero/bzero/blob/eb693e8cdfb1af55c713d2cc2d67178b634b24bf/bctl/daemon/keysplitting/keysplitting_test.go#L160-L162


    This PR fixes the following bugs:

    1. Agent responds with different schema version in recovery's SynAck:

    Previously, resent Data messages would use the schema version from the previous handshake. This has been fixed by setting schema version first before trying to resend.

    I've added a unit test to check this behavior.

    1. Data races due to not synchronizing usage of internal state variables like pipelineMap, recovering, errorRecoveryAttempt, and other state variables.

    Note: I found this data race by running the tests with the -race flag. We can't enable this flag by default until we fix all other data races (will create ticket. EDIT: CWC-1913) that the data racer complained about in other packages beside daemon/keysplitting.

    Fixed by locking stateLock (renamed from pipelineLock) mutex before accessing internal state variables that can be accessed/modified on different goroutines.

    In order to fix this, I also had to change the way the mutex was being used in BuildSyn. Previously, BuildSyn() would lock the mutex and not unlock it when the function returned; BuildSyn() now unlocks the mutex when it returns. I had to make this change otherwise I can't call lock again in Validate() (deadlock).

    I still preserve the behavior that one cannot send Data (call Inbox()) until handshake is complete by synchronizing on a new boolean isHandshakeComplete:

    https://github.com/bastionzero/bzero/blob/eb693e8cdfb1af55c713d2cc2d67178b634b24bf/bctl/daemon/keysplitting/keysplitting.go#L260-L271

    and I've added unit tests to check that I haven't broken this behavior.

    I've also changed the if to a for loop because that is the recommended behavior when using sync.Cond.Wait() as outlined in the Go documentation:

    Because c.L is not locked when Wait first resumes, the caller typically cannot assume that the condition is true when Wait returns. Instead, the caller should Wait in a loop:

    Source: https://pkg.go.dev/sync#Cond.Wait

    It says "typically", so we might actually be fine not using a for loop, but just in case I've changed it to a loop as recommended.

    Testing

    Describe how to test this PR....

    backend branch: zli branch:

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Adds keysplitting unit tests for daemon

    Related JIRA tickets

    Relates to JIRA: CWC-1847, CWC-1929, CWC-1930

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [X] No

    If yes, please explain:

  • Connection Idle Timeout

    Connection Idle Timeout

    Description of the change

    Every connection will now send an IdleTimeout in the DaemonConnectedMessage so that the agent can close idle connections after this timeout. This timeout is serialized as a number of nanoseconds by the backend so we use a custom json unmarshal function to convert this to a time.Duration.

    This timeout is currently hard-coded in the backend for every connection to 7 days when the connection is created. However in the future we can allow admins to control this value at connection granularity by adding additional context to connection policies or more simply by setting an organizational default.

    Testing

    See backend changes in https://github.com/bastionzero/webshell-backend/pull/1475 and testing instructions there.

    backend branch: feat/idle-timeout zli branch: charts branch:

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Adds an IdleTimeout to all connections that will close connections after no client activity is detected for the timeout duration.

    Related JIRA tickets

    Relates to JIRA: CWC-2257

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [x] No

    If yes, please explain:

  • Feat/pwdb source

    Feat/pwdb source

    Description of the change

    Description here (why is it needed, what does it do)....

    Testing

    Describe how to test this PR....

    backend branch: zli branch: charts branch:

    Ready to run system tests?

    • [ ] Yes

    Relevant release note information

    Release Notes:

    Related JIRA tickets

    Relates to JIRA: CWC-XXX

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [ ] No

    If yes, please explain:

  • Pwdb plugin

    Pwdb plugin

    Description of the change

    Description here (why is it needed, what does it do)....

    Testing

    Describe how to test this PR....

    backend branch: zli branch: charts branch:

    Ready to run system tests?

    • [ ] Yes

    Relevant release note information

    Release Notes:

    Related JIRA tickets

    Relates to JIRA: CWC-XXX

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [ ] No

    If yes, please explain:

  • Store split private keys in a per-agent configuration file

    Store split private keys in a per-agent configuration file

    This is one of those PRs that has no immediate effect, but supports other aspects of the solution and will be easier to review and merge in isolation

    Description of the change

    As part of the passwordless DB architecture, we need a way for agents to store mappings between key shards and the database targets to which they authenticate. To support this, the agent will use a new configuration object with this structure:

    /* oldest key (toy example) */
    [{"key":{"associatedPublicKey":{"n":null,"e":0},"d":null,"e":null},"targetIds":["62e7cbbe-f730-4aed-bb77-6f06749066be"]},
    /* more recent key (toy example) */
    {"key":{"associatedPublicKey":{"n":"2RxHnc7Yo7bAZbSNWrd4Qf1tXWLT0qPBQMNJiOcQdXkw9oRvjcD4LiBEtl3C4mjych/5s1OGwDCV5CNqeZMniqGL53vyEiRGfqAZes+L+1HmlYITzxAhFIISqNraWpTCVpKiSXV9Kd1+tLP7fJrviWPtPg1c86XR1MLdowEfk0xN5V0hc2ZRZqLgDlLCtLOlN3zD8AZF0lHyaVkbbBmsawej1y99o8fJlH56lmFcB3EB4HpQ9D0adg5R+qhH5A/mhevgISdsg+PHTzeGFG7tPRIWOc7b6sVZyXn8kswQcpaXosU8cVyCH91BZXGUEc8HV2Rtnglw1mXBE98uhevHOxGXD/0aA8nM2GnPI2Bb5l3YBTg4Iolt0EFqN52rC01sKqQDtLP18bE5pTrae+BCLzP3QKCl8fYCJdOqNK/9hN4BkDW+a78jdH9o1BB0WP4H+4kW6N20YDV9Z+/63ICl6JH0cSgAl4iEtukzqZKfxb2v5z1q9i7JQZbkc/ZmoIzsRBqv8QCd4DnTuUd4LhZftRGWdT6RKvxDsUFVceU5VK7qfjX/C+7fJuY1MmGI4KlegDh9yhut25LCaXO3In6FBravWuLKD9RDB/A/o9wgG4ZykqSQcvaZnU1yU6U3uWXMUu4KyhZU0G3yAKAGd6k+o9qwdPq6N/4znvp6jbq7n6c=","e":65537},"d":"MTIz","e":"NDU="},"targetIds":["62e7cbbe-f730-4aed-bb77-6f06749066be"]}]
    

    When a new key is added, it goes at the end of the list. When a user is trying to authenticate to a virtual target, the agent will select the most recent key available that maps to that target.

    For the customer beta, we will support customers updating agents manually using a new bzero command, but this is not required for the demo-ready alpha.

    Some other notes:

    1. Fixed a broken test in the agentconfig test suite
    2. Removed a ginkgo import from a non-test file that was messing up our help options
    3. Tweaked how the systemd config client was using locks, and added a convenience wrapper for the AcquireLock flow so that we don't have to be so verbose
    4. If the update to go1.18 goes through, I can simplify this code a bit so that we don't need separate methods for fetching KeyShardConfig and AgentConfig data

    Testing

    1. Check out the feat/generate-cert branches in ZLI and backend (this will require a database migration and redeploying dev)
    2. Make sure you have a database target set up using your agent as a proxy
    3. Run zlil generate certificate --all
    4. (You should have a cert successfully returned to you)
    5. But more importantly, connect to the agent machine as root and check cat /etc/bzero/bzero-user-keys.yaml
    6. You should see the key shard you just generated in /etc/bzero/keyshards.json

    backend branch: develop zli branch: develop charts branch:

    Ready to run system tests?

    • [x] Yes

    Relevant release note information

    Release Notes: Store split private keys in a per-agent configuration file

    Related JIRA tickets

    Relates to JIRA: CWC-2130

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [x] No

    If yes, please explain:

  • Allow user to provide orgId+provider flag when registering bzero agent

    Allow user to provide orgId+provider flag when registering bzero agent

    Description of the change

    We were only permitting user to pass in orgId when using Kubernetes agent. This PR makes it so bzero agent flag orgId is passed in to Registration struct. It also adds -orgProvider flag, so that user can pass that in as well.

    Testing

    Describe how to test this PR....

    backend branch: zli branch: charts branch:

    Ready to run system tests?

    • [X] Yes

    Relevant release note information

    Release Notes: Fix bug where -orgId flag was not respected. Add -orgProvider flag in case user wants to explicitly state their IdP provider.

    Related JIRA tickets

    Relates to JIRA: CWC-2263

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [X] Yes
    • [ ] No

    If yes, please explain:

    We should preserve the security requirement that lets users explicitly set these values that are used for BZCert verification. Otherwise, the agent always accepts whatever the Bastion tells it on initial registration.

  • Adds JWKS service accounts to bzero agent

    Adds JWKS service accounts to bzero agent

    Description of the change

    Adds JWKS service accounts to bzero agent. This is currently a draft as it likely needs some work to integrate with the daemon and bastion changes. The area of most concern for me right now is this line of code:

    https://github.com/bastionzero/bzero/blob/b573b862831ec086dfe79a6cf28653b3f378a0ab/bctl/daemon/keysplitting/bzcert/bzcert.go#L97

    Since the daemon hasn't configured verifier to know about the service account.

    Testing

    Describe how to test this PR....

    backend branch: zli branch:

    Ready to run system tests?

    • [ ] Yes

    Relevant release note information

    Release Notes:

    Related JIRA tickets

    Relates to JIRA: CWC-XXX

    Have you considered the security impacts?

    Does this PR have any security impact?

    • [ ] Yes
    • [ ] No

    If yes, please explain:

Related tags
Shoes-agent - Framework for myshoes provider using agent
Shoes-agent - Framework for myshoes provider using agent

shoes-agent Framework for myshoes provider using agent. agent: agent for shoes-a

Jan 8, 2022
Feb 17, 2022
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)
Integrated ssh-agent for windows. (pageant compatible. openSSH ssh-agent etc ..)

OmniSSHAgent About The chaotic windows ssh-agent has been integrated into one program. Chaos Map of SSH-Agent on Windows There are several different c

Dec 19, 2022
Kubernetes Reboot Daemon
Kubernetes Reboot Daemon

kured - Kubernetes Reboot Daemon Introduction Kubernetes & OS Compatibility Installation Configuration Reboot Sentinel File & Period Setting a schedul

Jan 3, 2023
A proof-of-concept project that makes accessible buildkitd daemon from macOS

buildkit-machine buildkit-machine allows you to make buildkitd daemon accessible in your macOS environment. To do so, it uses lima, which is a Linux s

Dec 21, 2022
This is a POC for a Falco Plugin allowing to gather events from a locale docker daemon.

Docker Events Plugin This is a POC for a Falco Plugin allowing to gather events from a locale docker daemon. ⚠️ This is a POC, don't use in Production

Apr 15, 2022
nerdctl daemon (Docker API)
nerdctl daemon (Docker API)

nerdctld This is a daemon offering a nerdctl.sock endpoint. It can be used with DOCKER_HOST=unix://nerdctl.sock. Normally the nerdctl tool is a CLI-on

Dec 15, 2022
CetusGuard is a tool that allows to protect the Docker daemon socket by filtering the calls to its API endpoints.

CetusGuard CetusGuard is a tool that allows to protect the Docker daemon socket by filtering the calls to its API endpoints. Some highlights: It is wr

Dec 23, 2022
Web user interface and service agent for the monitoring and remote management of WinAFL.
Web user interface and service agent for the monitoring and remote management of WinAFL.

WinAFL Pet WinAFL Pet is a web user interface dedicated to WinAFL remote management via an agent running as a system service on fuzzing machines. The

Nov 9, 2022
Sign Container Images with cosign and Verify signature by using Open Policy Agent (OPA)
 Sign Container Images with cosign and Verify signature by using Open Policy Agent (OPA)

Sign Container Images with cosign and Verify signature by using Open Policy Agent (OPA) In the beginning, I believe it is worth saying that this proje

Nov 30, 2022
ip-masq-agent-v2 aims to solve more specific networking cases, allow for more configuration options, and improve observability compared to the original.

ip-masq-agent-v2 Based on the original ip-masq-agent, v2 aims to solve more specific networking cases, allow for more configuration options, and impro

Aug 31, 2022
The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability

metrics-agent The metrics-agent collects allocation metrics from a Kubernetes cluster system and sends the metrics to cloudability to help you gain vi

Jan 14, 2022
Telegraf - An agent for collecting, processing, aggregating, and writing metrics

Telegraf Telegraf is an agent for collecting, processing, aggregating, and writi

Feb 11, 2022
A lightweight, cloud-native data transfer agent and aggregator
A lightweight, cloud-native data transfer agent and aggregator

English | 中文 Loggie is a lightweight, high-performance, cloud-native agent and aggregator based on Golang. It supports multiple pipeline and pluggable

Jan 6, 2023
nano-gpu-agent is a Kubernetes device plugin for GPU resources allocation on node.
nano-gpu-agent is a Kubernetes device plugin for GPU resources allocation on node.

Nano GPU Agent About this Project Nano GPU Agent is a Kubernetes device plugin implement for gpu allocation and use in container. It runs as a Daemons

Dec 29, 2022
runtime - an abstraction library on top of the Open Policy Agent (OPA)

runtime - an abstraction library on top of the Open Policy Agent (OPA) Introduction The "runtime" project is a library that sits on top of OPA. The go

Nov 7, 2022
A plugin for running Open Policy Agent (OPA) in AWS Lambda as a Lambda Extension.

opa-lambda-extension-plugin A custom plugin for running Open Policy Agent (OPA) in AWS Lambda as a Lambda Extension. To learn more about how Lambda Ex

Jan 2, 2023
Kubernetes operator for the Azure DevOps self-hosted pipe-line agent.

Kubernetes operator for the Azure DevOps self-hosted pipe-line agent. The operator adds an extra layer of configuration on top of the default images like: proxy settings, pool settings and auth keys.

Sep 1, 2022
A dynamic docker->redis->traefik discovery agent

traefik-kop A dynamic docker->redis->traefik discovery agent. Solves the problem of running a non-Swarm/Kubernetes multi-host cluster with a single pu

Dec 23, 2022