A server for TurboRepo Remote Cache to store cache artefacts in Google Cloud Storage or Amazon S3

Tapico Turborepo Remote Cache

This is an implementation of Vercel's Turborepo Remote Cache API endpoints used by the turborepo CLI command. This solution allows you to get control over where the cache arteficats are being stored.

The CLI tool currently supports the following targets for the cache arteficats:

  • gcs: Google Cloud Storage
  • s3: Amazon S3
  • local: The local file system

Running the application

You can execute this application by running when you want to store your cache artefacts on a Amazon S3 compatible cloud storage provider, it will start a HTTP server on port 8080:

./tapico-turborepo-remote-cache --kind="s3" --s3.endpoint="http://127.0.0.1:9000" --s3.accessKeyId="minio" --s3.secretKey="miniosecretkey" --s3.region="eu-west-1" --turbo-token="your-turbo-token"

Note: The above example can be used to test against the Minio instance of the docker-compose.yml file found in the dev-directory.

At this time the server doesn't support running over HTTPS, you might want to consider using a load balancer to expose the server over HTTPS to the internet.

Configuration

The server supports three kind of cloud storage, which are s3, gcs and local, the latter will store the cache artefects on the local file system on a relative path.

The configuration is currently handled via environment variables, the following are available:

  • CLOUD_PROVIDER_KIND: s3, gcs or local
  • GOOGLE_CREDENTIALS_FILE: location the google credentials json file
  • GOOGLE_PROJECT_ID: the project id
  • GOOGLE_ENDPOINT: the endpoint to use for Google Cloud Storage (e.g. for emulator)
  • AWS_ENDPOINT: the endpoint to connect to for Amazon S3
  • AWS_ACCESS_KEY_ID: the Amazon acces key id
  • AWS_SECRET_ACCESS_KEY: the Amazon secret access key
  • AWS_S3_REGION_NAME: the region for Amazon S3
  • CLOUD_SECURE: whether the endpoint is secure (https) or not, can be true or false
  • CLOUD_FILESYSTEM_PATH: the relative path to the file system
  • TURBO_TOKEN: comma seperated list of accepted TURBO_TOKENS

Alternatively, you can also use the CLI arguments:

usage: tapico-turborepo-remote-cache --turbo-token=TURBO-TOKEN [<flags>]

Flags:
      --help                     Show context-sensitive help (also try --help-long and --help-man).
  -v, --verbose                  Verbose mode.
      --kind="s3"                Kind of storage provider to use (s3, gcp, local). ($CLOUD_PROVIDER_KIND)
      --secure                   Enable secure access (or HTTPs endpoints).
      --turbo-token=TURBO-TOKEN  The comma separated list of TURBO_TOKEN that the server should accept ($TURBO_TOKEN)
      --google.endpoint="http://127.0.0.1:9000"
                                 API Endpoint of cloud storage provide to use ($GOOGLE_ENDPOINT)
      --google.project-id=GOOGLE.PROJECT-ID
                                 The project id relevant for Google Cloud Storage ($GOOGLE_PROJECT_ID).
      --local.project-id=LOCAL.PROJECT-ID
                                 The relative path to storage the cache artefacts when 'local' is enabled ($CLOUD_FILESYSTEM_PATH).
      --s3.endpoint=S3.ENDPOINT  The endpoint to use to connect to a Amazon S3 compatible cloud storage provider ($AWS_ENDPOINT).
      --s3.accessKeyId=S3.ACCESSKEYID
                                 The Amazon S3 Access Key Id ($AWS_ACCESS_KEY_ID).
      --s3.secretKey=S3.SECRETKEY
                                 The Amazon S3 secret key ($AWS_SECRET_ACCESS_KEY).
      --s3.region=S3.REGION      The Amazon S3 region($AWS_S3_REGION_NAME).
      --version                  Show application version.

Configuring Turbo

After you have started the server you need to change the configuration of Turbo to ensure it's pointing to our server for the API server. Currently, any of the login functionality is not implemented. You can adapt the .turbo/config.json-file in the root of your mono repo.

{
  "teamId": "team_blah",
  "apiUrl": "http://127.0.0.1:8080"
}

After this you should be able to run turbo e.g. turbo run build --force to force the generating of new cache artefacts and upload it to our server.

Alternatively, you can also use the arguments --api="http://127.0.0.1:8080" --token="xxxxxxxxxxxxxxxxx"

The teamId in .tubrbo/config.json or the --team-argument for turbo CLI is used to generate a bucket in the cloud storage provider, as the id might be an invalid name the team identifier a MD5 hash is generated and used as the bucket name.

Developing

In the dev directory you can find a docker compose file which starts, a Minio S3 compatible service, for testing the Amazon S3 integration, the path for this is: http://127.0.0.1:9000

Another service running is a fake Google Cloud Storage server on port http://127.0.0.1:9100. If you want to use this you need to make sure you set the following environment variable:

export STORAGE_EMULATOR_HOST=http://localhost:9100

The STORAGE_EMULATOR_HOST is used to activate a special code path in the Google Cloud Storage library for Go.

Tip: If the Remote Cache is not working as expected, you can use an application like ProxyMan and force turbo CLI the application's HTTP proxy so you can get insight in the outgoing HTTP requests. To do this, you can run turbo the following way HTTP_PROXY=192.168.1.98:9090 turbo run build.

You might need to use HTTPS_PROXY instead when the API server location is running over HTTPS instead of HTTP.

Acknowledgments

Thank you to the developers of the libraries used by this application, especially the authors of the stow, mux, kingpin, and the opentelemetry-go libraries.

Thank you to Jared Palmer for building the awesome Turborepo CLI tool!

Comments
  • Only throw if neither teamId nor slug is present

    Only throw if neither teamId nor slug is present

    Hello,

    I was trying this today and couldn't get it to work.

    After a nice debugging session I noticed it was always erroring out regardless of slug being present.

    The if condition seemed wrong as it should only enter the block if neither teamId nor slug is present.

  • Easier team setup

    Easier team setup

    Currently, server requires a separate bucket for each team, this makes team setup harder as someone or something needs to setup this bucket separately. Even is this functionality is added to the server, then server would need to have full admin permissions to manage buckets, which could be a risk if there is anything else hosted on the same cloud account.

    To make setup easier and more secure, I would propose to just use folders in the same bucket instead, for example:

    turbo-cache-bucket team_A asset1 asset2 team_B asset1 asset2 ...

    Of course team names and asset names should be still hashed.

    By using this approach, turbo cache server just needs permissions to manage content in one bucket and it is very easy to implement adding a new team. For adding a new team, server would just validate token and if it's correct, it would just accept any team name - any valid token works for any team. Later on if needed tokens could be improved to be issues and validated per team.

    What do you think?

  • Fixes teamId query param handling

    Fixes teamId query param handling

    Hi,

    Thanks for starting the project, I was playing around with the server and discovered couple of bugs along the way and this PR should fix them.

    First issue is with teamId query param it should be teamId instead of teamID, here is the reference from the client https://github.com/vercel/turborepo/blob/main/cli/internal/client/client.go#L76

    Second issue is with artifact download, I'm not a go expert, but it seems to me that current code is only writing first 5 bytes, so I used io.Copy to copy the whole stream to the output.

    I also added some notes in the docs which I had trouble with.

    Let me know what you think.

    Cheeers!

  • feat: allow using one bucket for multiple teams

    feat: allow using one bucket for multiple teams

    Allow to use one bucket for multiple teams, allow to set the name of the bucket to use by introducing the --bucketName-argument

    Enable feature to use a bucket per team, by default disabled, the argument to use is --enableBucketPerTeam were the teamId will be used as the name of the bucket

    Improve handling of Google Cloud Storage

    fixes #7

  • feat: add CLI arguments and add simple auth check

    feat: add CLI arguments and add simple auth check

    Introducing CLI arguments to pass the necessary arguments via environment variables or via CLI arguments

    Added new CLI argument named --turbo-token that accepts a comma-separated list of TURBO_TOKEN that the server should accept, if its not listed it will return a 401 Unauthorized header

  • Possibility to store on Azure Blob Storage

    Possibility to store on Azure Blob Storage

    For one of the project I am working at my company, I need a remote cache server for Azure Blob Storage, I can provisioned blob storage to test the changes.

    From the conversation here https://github.com/vercel/turborepo/discussions/381#discussioncomment-1849991, I believe we can make it work for Azure.

    Any pointers on how can I get started with it ?

  • Review changes needed for Analytics

    Review changes needed for Analytics

    Turborepo appears to try to send some analytics events (used on Vercel.com etc).

    Which endpoint do we exactly need to add to allow to silently ignore the sending of these events (for now)?

  • Review changes necessary to support HMAC signing

    Review changes necessary to support HMAC signing

    Turborepo appears to get support for encrypted build artefacts, see: https://github.com/vercel/turborepo/pull/892

    Which changes are necessary to support this?

  • Running in docker gives EOF error on PUT request

    Running in docker gives EOF error on PUT request

    Firstly - thanks for this extremely useful addon for Turborepo! Very impressive!

    I have no problems using the downloadable executable, but using the docker container from the registry or built locally gives this error when turbo tries to connect eg:

    2021-12-28T16:58:10.774+1000 [DEBUG] run: performing request: method=PUT url=http://127.0.0.1:8080/v8/artifacts/dad54db4bd311da2?teamId=team_test
    2021-12-28T16:58:10.807+1000 [ERROR] run: request failed: error="Put "http://127.0.0.1:8080/v8/artifacts/dad54db4bd311da2?teamId=team_test": EOF" method=PUT url=http://127.0.0.1:8080/v8/artifacts/dad54db4bd311da2?teamId=team_test
    

    That's running with an apiUrl and teamId set in ./turbo/config.json via turbo eg:

    TURBO_TOKEN=testtoken yarn turbo run build  -vvv --force
    

    Other things I've tried result in a connection refused error instead, so EOF seems to imply it's connecting but failing to PUT. I'm using S3 with valid credentials that work with the non-docker version. I've even modified the container to execute inside it manually various commands to make sure it's not that the command arguments are not being mistranslated somehow.

    This is what I've been using in docker-compose:

      tapico-turborepo-cache:
        container_name: dev_tapico-turborepo-cache
        image: ghcr.io/tapico/tapico-turborepo-remote-cache:sha-152ba94
        networks:
          - internal
        ports:
          - "8080:8080"
        environment:
          - AWS_ACCESS_KEY_ID
          - AWS_SECRET_ACCESS_KEY
        entrypoint: /bin/sh
        command: -c "/go/bin/tapico-turborepo-remote-cache --turbo-token=testtoken --kind=s3 --s3.region=us-east-1 --bucket=test-turborepo-cache -v"
    

    The is on Turbo 1.0.23 and 0.0.8.

    I'm a little rusty on my Docker and haven't used Go before but I couldn't see anything obviously wrong in either.

    Is it possible to show a working example of a docker-compose wrapped around tapico-turborepo-remote-cache that allows the host (eg MacOS) to run turbo against it successfully? Thanks!

  • Adding tests

    Adding tests

    How about adding tests to the project?

    Usually with project like this I would add integration tests with docker compose where in the tests your are simulating real requests to the server and checking if expected things happened in the database (in this case s3 or gcs).

    In nodejs I used https://www.npmjs.com/package/supertest to simulate http requests to the server without the need of actually server listening on a port, which makes it easy to run tests in parallel if needed. Not sure if same exists in golang land.

    Of course, very interested to help with writing of the tests.

    What do you think?

Gorsair hacks its way into remote docker containers that expose their APIs
Gorsair hacks its way into remote docker containers that expose their APIs

Gorsair Gorsair is a penetration testing tool for discovering and remotely accessing Docker APIs from vulnerable Docker containers. Once it has access

Dec 31, 2022
Work with remote images registries - retrieving information, images, signing content

skopeo skopeo is a command line utility that performs various operations on container images and image repositories. skopeo does not require the user

Jan 5, 2023
AutoK3s GEO collects metrics about locates remote IP-address and exposes metrics to InfluxDB.

AutoK3s GEO AutoK3s GEO collects metrics about locates remote IP-address and exposes metrics to InfluxDB. Thanks to https://freegeoip.live/ which prov

Jun 16, 2022
State observer - StateObserver used to synchronize the local(cached) state of the remote object with the real state

state observer StateObserver used to synchronize the local(cached) state of the

Jan 19, 2022
Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and traditional RESTful API.

Phalanx Phalanx is a cloud-native full-text search and indexing server written in Go built on top of Bluge that provides endpoints through gRPC and tr

Dec 25, 2022
Windows Store Installer for VS Code

vscode-winsta11er This repo contains the code for a simple Go-based installer for the new Windows store. Releases To create a release, create and push

Dec 9, 2022
Implement a toy in-memory store information service for a delivery company

Implement a toy in-memory store information service for a delivery company

Nov 22, 2021
An in-memory, key-value store HTTP API service

This is an in-memory key-value store HTTP API service, with the following endpoints: /get/{key} : GET method. Returns the value of a previously set ke

May 23, 2022
Simple distributed kv-store using ABD algorithm.

Distributed-kv-store Simple distributed kv-store using ABD algorithm. API GET /key Get value by key. 302 = found key. PUT /key Put key with value. 201

Dec 14, 2021
A project that provides an in-memory key-value store as a REST API. Also, it's containerized and can be used as a microservice.

Easy to Use In-Memory Key-Value Store A project that provides an in-memory key-value store as a REST API. Also, it's containerized and can be used as

Mar 6, 2022
Store - Read and write data structures

Store - Read and write data structures Store provides the ability to write the data structures to a file and read from a file in the Go programming la

Jan 3, 2022
A revamped Google's jump consistent hash

Overview This is a revamped Google's jump consistent hash. It overcomes the shortcoming of the original implementation - not being able to remove node

Dec 16, 2022
a quick golang implementation of google pubsub subscriber for testing with the emulator.

gosub a quick golang implementation of google pubsub subscriber for testing with the emulator. it does one thing which is subscribing to a topic and r

Oct 23, 2021
A repository for exploring google go

GoPractice A repository for exploring google go. There are tests in there. Q1) Write a function that sorts a bunch of words by the number of character

Nov 19, 2021
You could leverage Alfred and Google Sheets to track your time with ease.
You could leverage Alfred and Google Sheets to track your time with ease.

You could leverage Alfred and Google Sheets to track your time with ease. The goal is to track your time in a way that is easy to understand how much time you spend on.

Dec 25, 2022
Simple Golang API to demonstrate file upload to fireabase storage and retrieving url of uploaded file.

go-firebase-storage -Work in progress ??️ Simple Golang API that uses Firebase as its backend to demonstrate various firebase services using Go such a

Oct 4, 2021
A reverse-proxy cache for external images used on LinuxFr.org

External images on LinuxFr.org Our users can use images from external domains on LinuxFr.org. This component is a reverse-proxy / cache for these imag

May 14, 2021
Reads Sets and Deletes a key from Redis cache

Reads Sets and Deletes a key from Redis cache

Nov 2, 2021
Ecsgo - Cache friendly, Multi threading Entity Component System in Go (with Generic)

ECSGo ECSGo is an Entity Component System(ECS) in Go. This is made with Generic

Oct 19, 2022