A version control system to manage large files.

Apache License Go tests status

ArtiVC

ArtiVC (Artifacts Version Control) is a handy command-line tool for data versioning on cloud storage. With only one command, it helps you neatly snapshot your data and Switch data between versions. Even better, it seamlessly integrates your existing cloud environment. ArtiVC supports three major cloud providers (AWS S3, Google Cloud Storage, Azure Blob Storage) and the remote filesystem using SSH.

asciicast

Try it out from the Getting Started guide

Features

  • Data Versioning: Version your data like versioning code. ArtiVC supports commit history, commit message, and version tag. You can diff two commits, and pull data from the specific version.
  • Use your own storage: We are used to putting large files in NFS or S3. To use ArtiVC, you can keep putting your files on the same storage without changes.
  • No additional server is required: ArtiVC is a CLI tool. No server or gateway is required to install and operate.
  • Multiple backends support: ArtiVC natively supports local filesystem, remote filesystem (by SSH), AWS S3, Google Cloud Storage, and Azure Blob Storage as backend. And 40+ backends are supported through Rclone integration. Learn more
  • Painless Configuration: No one like to configure. So we leverage the original configuration as much as possible. Use .ssh/config for ssh access, and use aws configure, gcloud auth application-default login, az login for the cloud platforms.
  • Efficient storage and transfer: The file structure of the repository is stored and transferred efficiently by design. It prevents storing duplicated content and minimum the number of files to upload when pushing a new version. Learn more

Documentation

For more detail, please read the ArtiVC documentation

Owner
InfuseAI
An all-in-one machine learning platform for enterprises in a single click.
InfuseAI
Comments
  • [General Question] Does this CLI understand updates from other writers?

    [General Question] Does this CLI understand updates from other writers?

    Let’s say a S3 prefix is being used by multiple services. I want to update certain object content locally and write it back to the same S3 object key IFF no other service touch that one before my work. Would it possible with this CLI?

    Like for my own work, I interact with that S3 prefix with this CLI, will I be able to easily pull and tell objects added/updated/deleted by others?

    Curious how does the versioning work in such case?

  • Implement speed progress message

    Implement speed progress message

    $ art init s3://art-vcs/datasets/foobarbar
    $ art push
    upload objects: (125/125), speed: 1.07MB/s
    0 modified(M), 125 added(+), 0 deleted(-), 0 renamed(R)
    create commit: c1ff40606ec9d01eb3d33bddfc049f672eab5d69
    update ref: latest -> c1ff40606ec9d01eb3d33bddfc049f672eab5d69
    
  • support partial download

    support partial download

    support partial download

    # get
    art get -o output repo -- path/to/file1 path/to/file2 data/
    
    # pull
    art pull -- path/to/partia
    art pull v0.1.0 -- path/to/partia ...
    
  • add status command

    add status command

    art status shows the differences between the latest remote repo and the local workspace.

    Show status of the workspace
    
    Usage:
      art status
    
    Examples:
    	# check current status
    	art status
    
  • artiv enhancement - art clone

    artiv enhancement - art clone

    The original flow to download a repo to a workspace is art init + art pull.

    We add a art clone command to make it simpler.

    Usage:
      art clone <repository> [<dir>]
    
    Examples:
      # clone a workspace with local repository
      art clone /path/to/mydataset
    
      # clone a workspace with s3 repository
      art clone s3://mybucket/path/to/mydataset
    
  • Reduce binary size

    Reduce binary size

    Add linker flags -s and -w to reduce binary size by removing debug and DWARF symbols

    Saved around 6M before compression.

    Ref: https://pkg.go.dev/cmd/link

    -s Omit the symbol table and debug information. -w Omit the DWARF symbol table.

    Comparison

    # Before
    $ go build -o bin/avc -ldflags ' -X github.com/infuseai/artivc/cmd.tagVersion= -X github.com/infuseai/artivc/cmd.gitCommit=6895c74b6ca090770e0338447bf62d2c7a2bb1e3 -X github.com/infuseai/artivc/cmd.gitTreeState=clean ' main.go
    $ stat -f%z bin/avc
    29522322
    
    # After
    $ go build -o bin/avc -ldflags ' -X github.com/infuseai/artivc/cmd.tagVersion= -X github.com/infuseai/artivc/cmd.gitCommit=6895c74b6ca090770e0338447bf62d2c7a2bb1e3 -X github.com/infuseai/artivc/cmd.gitTreeState=dirty -s -w ' main.go
    $ stat -f%z bin/avc
    23132178
    

    Signed-off-by: Ash Wu [email protected]

  • Support gcs repoistory

    Support gcs repoistory

    Repo URL

    avc init gs://<bucket>/<path>
    

    Configuration

    1. Use default application credential by
      gcloud auth application-default login  
      
    2. or explicit credentials
      GOOGLE_APPLICATION_CREDENTIALS=<service account json>
      
  • feature request: avc log should not require server interaction

    feature request: avc log should not require server interaction

    It seems to me that avc log doesn't really require going back to the server for default output. It would be useful to either default to local-only or to have an optional argument that instructs it to only look at the local store.

  • Feature request: persistently delete files

    Feature request: persistently delete files

    avc doesn't track the deletion of files - that is, when you remove a file, it merely removes it from the current commit set. This can be problematic when you have multiple collaborating users or multiple working directories.

    Example:

    1. On computer X, create file A in avc repository and push it.
    2. On computer Y, pull the repository. Note that file A exists.
    3. On computer X, remove file A and push the update. A is not in the commit and isn't in the working directory.
    4. on computer Y, pull the repository. Note that file A still exists. If Y pushes without manually removing file A, file A will reappear in the repository.
  • What is the difference between the command

    What is the difference between the command "put" and "push" ?

    I'm new using the tools, I think is really interesting and useful. While using it I got confused by the command "put" and "push" I understand that both will upload the data to the repository but I'm not sure when should I use either of them.

  • feature request: record (or be able to retrieve) additional metadata for each file/directory

    feature request: record (or be able to retrieve) additional metadata for each file/directory

    Some additional metadata would be useful:

    • the user who pushed a commit
    • the user and date associated with pushing a specific version of a file or directory

    In principal, the date is retrievable by following the history back for a particular commit. The username is not persisted, though and it might be better to keep it as per-blob metadata or carry forward with each commit. As an alternative, each blob in a commit could point back to the commit that originated it.

  • Progress bar for push/pull commands

    Progress bar for push/pull commands

    It would be nice to have a progress bar for the push commands. We are testing it out on one of our large medical imaging dataset (0.5M objects = 0.6TB) and the data repository took 7h to push.

    I imagine this could probably extend to the other commands as well.

Cloudinsight Agent is a system tool that monitors system processes and services, and sends information back to your Cloudinsight account.

Cloudinsight Agent 中文版 README Cloudinsight Agent is written in Go for collecting metrics from the system it's running on, or from other services, and

Nov 3, 2022
System resource usage profiler tool which regularly takes snapshots of the memory and CPU load of one or more running processes so as to dynamically build up a profile of their usage of system resources.
System resource usage profiler tool which regularly takes snapshots of the memory and CPU load of one or more running processes so as to dynamically build up a profile of their usage of system resources.

Vegeta is a system resource usage tracking tool built to regularly take snapshots of the memory and CPU load of one or more running processes, so as to dynamically build up a profile of their usage of system resources.

Jan 16, 2022
EdgeLog is a lightweight log management system, and Agent is a part of EdgeLog system

EdgeLog is a lightweight log management system, and Agent is a part of EdgeLog system. It is installed on host machine and its main duty is to collect host program log statics.

Oct 10, 2022
Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.
Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Gowl is a process management and process monitoring tool at once. An infinite worker pool gives you the ability to control the pool and processes and monitor their status.

Nov 10, 2022
A golang implementation of the Open Pixel Control protocol

__ ___ ___ _____ ___ /'_ `\ / __`\ _______ / __`\/\ '__`\ /'___\ /\ \L\ \/\ \L\ \/\______\/\ \L\ \ \ \L\ \/\ \__/ \ \

Jul 3, 2022
Metrics dashboards on terminal (a grafana inspired terminal version)

Grafterm Visualize metrics dashboards on the terminal, like a simplified and minimalist version of Grafana for terminal. Features Multiple widgets (gr

Jan 6, 2023
A Golang library to manage your environment variables using structs.

genv A simple Golang library to manage your environment variables using structs. How to use package main import ( "github.com/vdgonc/genv" ) type E

Aug 15, 2022
The Prometheus monitoring system and time series database.

Prometheus Visit prometheus.io for the full documentation, examples and guides. Prometheus, a Cloud Native Computing Foundation project, is a systems

Dec 31, 2022
rtop is an interactive, remote system monitoring tool based on SSH

rtop rtop is a remote system monitor. It connects over SSH to a remote system and displays vital system metrics (CPU, disk, memory, network). No speci

Dec 30, 2022
distributed monitoring system
distributed monitoring system

OWL OWL 是由国内领先的第三方数据智能服务商 TalkingData 开源的一款企业级分布式监控告警系统,目前由 Tech Operation Team 持续开发更新维护。 OWL 后台组件全部使用 Go 语言开发,Go 语言是 Google 开发的一种静态强类型、编译型、并发型,并具有垃圾回

Dec 24, 2022
An Open Source video surveillance management system for people making this world a safer place.
An Open Source video surveillance management system for people making this world a safer place.

Kerberos Open Source Docker Hub | Documentation | Website Kerberos Open source (v3) is a cutting edge video surveillance management system made availa

Dec 30, 2022
A system and resource monitoring tool written in Golang!
A system and resource monitoring tool written in Golang!

Grofer A clean and modern system and resource monitor written purely in golang using termui and gopsutil! Currently compatible with Linux only. Curren

Jan 8, 2023
A tool to list and diagnose Go processes currently running on your system

gops gops is a command to list and diagnose Go processes currently running on your system. $ gops 983 980 uplink-soecks go1.9 /usr/local/bin/u

Dec 27, 2022
An open-source and enterprise-level monitoring system.
 An open-source and enterprise-level monitoring system.

Falcon+ Documentations Usage Open-Falcon API Prerequisite Git >= 1.7.5 Go >= 1.6 Getting Started Docker Please refer to ./docker/README.md. Build from

Jan 1, 2023
OS system statistics library for Go

OS system statistics library for Go This is a library to get system metrics like cpu load and memory usage. The library is created for mackerel-agent.

Dec 9, 2022
Distributed simple and robust release management and monitoring system.
Distributed simple and robust release management and monitoring system.

Agente Distributed simple and robust release management and monitoring system. **This project on going work. Road map Core system First worker agent M

Nov 17, 2022
checkah is an agentless SSH system monitoring and alerting tool.

CHECKAH checkah is an agentless SSH system monitoring and alerting tool. Features: agentless check over SSH (password, keyfile, agent) config file bas

Oct 14, 2022
An example logging system using Prometheus, Loki, and Grafana.
An example logging system using Prometheus, Loki, and Grafana.

Logging Example Structure Collector Export numerical data for Prometheus and log data for Promtail. Exporter uses port 8080 Log files are saved to ./c

Nov 21, 2022
Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system.

Cloudprober is a monitoring software that makes it super-easy to monitor availability and performance of various components of your system. Cloudprobe

Dec 30, 2022