Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

Optimus

test workflow build workflow Coverage Status License Version

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. It enables data analysts and engineers to transform their data by writing simple SQL queries and YAML configuration while Optimus handles dependency management, scheduling and all other aspects of running transformation jobs at scale.

Key Features

Discover why users choose Optimus as their main data transformation tool.

  • Warehouse management: Optimus allows you to create and manage your data warehouse tables and views through YAML based configuration.
  • Scheduling: Optimus provides an easy way to schedule your SQL transformation through a YAML based configuration.
  • Automatic dependency resolution: Optimus parses your data transformation queries and builds a dependency graphs automaticaly instead of users defining their source and taget dependencies in DAGs.
  • Dry runs: Before SQL query is scheduled for transformation, during deployment query will be dry-run to make sure it passes basic sanity checks.
  • Powerful templating: Optimus provides query compile time templating with variables, loop, if statements, macros, etc for allowing users to write complex tranformation logic.
  • Cross tenant dependency: Optimus is a multi-tenant service, if there are two tenants registered, serviceA and serviceB then service B can write queries eferencing serviceA as source and Optimus will handle this dependency as well.
  • Hooks: Optimus provides hooks for post tranformation logic. e,g. You can sink BigQuery tables to Kafka.
  • Extensibility: Optimus support Python transformation and allows for writing custom plugins.
  • Workflows: Optimus provides industry proven workflows using git based specification management and REST/GRPC based specification management for data warehouse management.

Usage

Optimus has two components, Optimus service that is the core orchestrator installed on server side, and a CLI binary used to interact with this service. You can install Optimus CLI using homebrew on macOS:

$ brew install odpf/taps/optimus
$ optimus --help

optimus v0.0.2-alpha.1

optimus is a scaffolding tool for creating transformation job specs

Usage:
  optimus [command]

Available Commands:
  config      Manage optimus configuration required to deploy specifications
  create      Create a new job/resource
  deploy      Deploy current project to server
  help        Help about any command
  render      convert raw representation of specification to consumables
  replay      re-running jobs in order to update data for older dates/partitions
  serve       Starts optimus service
  version     Print the client version information

Flags:
  -h, --help       help for optimus
      --no-color   disable colored output

Additional help topics:
  optimus validate check if specifications are valid for deployment

Use "optimus [command] --help" for more information about a command.

Documentation

Explore the following resources to get started with Optimus:

  • Guides provides guidance on using Optimus.
  • Concepts describes all important Optimus concepts.
  • Reference contains details about configurations, metrics and other aspects of Optimus.
  • Contribute contains resources for anyone who wants to contribute to Optimus.

Running locally

Optimus requires the following dependencies:

  • Golang (version 1.16 or above)
  • Git

Run the following commands to compile optimus from source

$ git clone [email protected]:odpf/optimus.git
$ cd optimus
$ make build

Use the following command to run

$ ./optimus version

Optimus service can be started with

$ ./optimus serve

serve command has few required configurations that needs to be set for it to start. Configuration can either be stored in .optimus.yaml file or set as environment variable. Read more about it in getting started.

Compatibility

Optimus is currently undergoing heavy development with frequent, breaking API changes. Current major version is zero (v0.x.x) to accommodate rapid development and fast iteration while getting early feedback from users (feedback on APIs are appreciated). The public API could change without a major version update before v1.0.0 release.

Contribute

Development of Optimus happens in the open on GitHub, and we are grateful to the community for contributing bugfixes and improvements. Read below to learn how you can take part in improving Optimus.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to Optimus.

To help you get your feet wet and get you familiar with our contribution process, we have a list of good first issues that contain bugs which have a relatively limited scope. This is a great place to get started.

License

Optimus is Apache 2.0 licensed.

Owner
Open Data Platform
Next-gen collaborative, domain-driven and distributed data platform
Open Data Platform
Comments
  • Support for External Sensor for Optimus Jobs

    Support for External Sensor for Optimus Jobs

    Currently Optimus Supports sensors for job dependencies which are within and the outside the project but they are managed by the same Optimus Server. It would be helpful if Optimus supports job sensors which are managed in a different Optimus, as with in an organisation there will be many deployments, checking for data availability may not always guarantee completeness & correctness of data which is guaranteed through Optimus dependencies.

    Expectation : The sensor provides checks for the status of the jobs b/w the input window.

    Configuration :

    dependencies : 
     job : 
     type : external
     project : 
     host : 
     start_time : // start time of the data that the job depends on
     end_time : // end time of the data that the job depends on.
    

    The Optimus Server which accepts the requests based on its window, schedule configuration checks for all the jobs which outputs the data for the given window

    This has the challenge of breaking the dependencies when job name changes.

  • Optimus Commands functions are all package scoped, it is better to group them under specific group command structs

    Optimus Commands functions are all package scoped, it is better to group them under specific group command structs

    The current cmd package contains many implementations in the scope of accepting the user input on how to execute Optimus. There are some issues being observed for the current approach:

    • many functionalities are defined within one package, making the package itself seems bigger with a lot of functions, variables, and constants that are accessible accross the package
    • with many components defined within one package, some of it could conflict with one another, and there were cases during development that two components (in this example, variables) were defined but served the same purpose
    • during development, IDE or text editor's suggestion could clutter with other functionalities

    To address the mentioned issues above, one approach that can be done is by restructuring the package to grouping some similar functionalities, like for job command, it is put in one place like struct and/or package.

  • Optimus provide a mechanism to register a project & namespace through cli

    Optimus provide a mechanism to register a project & namespace through cli

    Currently, project & namespace cannot be register through Optimus CLI, as most of the users are CLI users, it would be better if there is a mechanism to register project & namespace.

    Acceptance Criteria

    1. User should be able to register a project without any namespace.
    2. User should be able to register a namespace only.
    3. User should be able to register both project & namespace together.
    4. On deploy if any new project/namespace is modified it should be register/updated.
    5. remove existing config init command

    User experience

    1. optimus project/namespace register
  • feat: add labels on job spec as tags in airflow2 dag

    feat: add labels on job spec as tags in airflow2 dag

    Hi Maintainers,

    I need to organizing my dags/jobs on the Airflow UI, there's already that feature in airflow using tags https://airflow.apache.org/docs/apache-airflow/stable/howto/add-dag-tags.html and there's already labels on job spec

    so, I'll need that labels rendered as tags in optimus' rendered dag codes. I hope my code can be tested and reviewed to give this feature on optimus.

    thank you

  • Refactor observers across Optimus

    Refactor observers across Optimus

    As part of this card, we would expect observers usage and implementation to be standardized, currently the way observers is used involves some processing on the client side, rather it would be better if through observers all the information is passed in a direct consumable fashion such that clients just log.

    Scope

    1. Events are not standardized - event naming.
    2. Optimus CLI we are extra processing after consuming these events rather we can avoid and just the log the events.
  • Optimus Sensor with automated inference.

    Optimus Sensor with automated inference.

    Describe the solution you'd like An Optimus sensor should check given a resource and the start and end dates, will check the corresponding jobs are successful or not. If Optimus Sensor is used in the Optimus setup then it should be automatically inferred.

    Users

    1. Other Optimus Users with in the same organization managing a different
    2. Users using a different system other than optimus for managing the pipelines

    Describe alternatives you've considered Relying on respective storage sensors with automated inference, but it has its own challenges around data completeness and data quality.

    • [x] #398
    • [x] #399
    • [x] #400
  • feat: bind config with cobra flags to override the conf

    feat: bind config with cobra flags to override the conf

    • [x] overriding the config via flags provided by cobra
    • [x] mapping the flag with - delimiter instead of . delimiter (--project-name instead of --project.name)
    • [x] change flag names for each command that needs config overriding

    TODO next (can be done in separate PR):

    • [ ] bump salt version to support pflags
    • [ ] rename --project to --project-name on entrypoint.sh for each plugins
  • feat: enhance replay & backup to support multiple namespaces jobs

    feat: enhance replay & backup to support multiple namespaces jobs

    Users should be able to do backup and replay for downstream jobs with a different namespace, as long as authorized to do so.

    • Optimus CLI will accept allowed downstream namespaces that will be replayed/backup (as a flag)
    • "*" means downstream from all namespaces (within the same project) are allowed
    • the default will be empty. means, the allowed downstream are only from same namespace

    Also, adding ignore downstream option in Replay.

  • Introduce Scheduling bounded context

    Introduce Scheduling bounded context

    Description Core scheduler management is not properly encapsulated and can be accessed and modified everywhere. We need to encapsulate the scheduling functions to only be accessible by proper interfaces. As part of this we will introduce a domain for dealing with core scheduling bounded context.

    Depends on : 575

    Acceptance Criteria

    • [x] core scheduling related logic should be encapsulated
    • [x] Client should still rely on the same apis & there should be minimal change in the api contract.
    • [x] All service and handlers related to job runs will be part of this domain
    • [x] Remove the old api from Protos.
    • [x] Dag compilation should happen properly.
    • [x] Dag uploading/updating and deletion of Dags for removed jobs should happen properly.
    • [x] Priority resolution logic should not be changed.
    • [x] DAG deploying to Scheduler should be run independently of job spec creation, and dependency and priority resolution
    • [x] Alerting should be proper for SLA misses and Job Failures(only complete DAG failures/not operator failures)
    • [x] Relevant metrics and logging should be improved to address the debuggability concerns.
    • [x] Re assess the metric collection callbacks/events from the scheduler.
    • [x] Re assess the Data Base model for metrics collected for job_run and its breakup(sensors/tasks/hooks).

    Scope

    • [x] Introduction of the new scheduler bounded context with all the handlers & services.
    • [x] JobRunInput
    • [x] UploadToScheduler
    • [x] RegisterJobEvent [takes care of job state updation( scheduling metrics ) and alerting ]
    • [x] JobRun
    • [x] Introduction of new repositories to work with the new models being introduced as part of this BC.
    • [x] Improve the debuggability through logging, capturing the right metrics.

    Out of Scope

    1. Handling of replay functionality in Optimus
    2. Handling of backup
    3. GetJobTask API
    4. GetWindow API
    5. JobStatus API
    6. RunJob API
  • Event types like task, sensor, hook failure does not trigger slack alert

    Event types like task, sensor, hook failure does not trigger slack alert

    Describe the bug For any job, if we configure the slack alerts on failure, then for event type such as task, sensor, hook failure the slack alerts are not getting posted successfully.

    To Reproduce Steps to reproduce the behavior:

    1. Configure a job behaviour to notify on slack channel on failure
    2. Run the job and while the job is running, mark any task as failed.

    Expected behavior Failure message on configured slack channel.

    Started happening from tag v0.3.0

  • Add Proper Migration Up and Down

    Add Proper Migration Up and Down

    Background

    If we look at the latest commit (referring to this), internally we will have migration mechanism being used. The migration up is executed whenever we run the server. However, if we check even further, no down migration is provided. So, even if there's an issue with the current database schema, it's quite tricky to roll back.

    Proposal

    To address this, this issue proposes to have such functionality being provided in the Optimus. At the high level, more or less, it will be like the following:

    optimus server migrate up
    # it will execute all migrate `up`
    
    optimus server migrate down
    # it will execute all migrate `down`
    
    optimus server migrate {n}
    # n is integer number, with positive means `up` n-time and negative means `down` n-time
    

    The command or the mechanism is flexible, but the point is that the up and down are both defined properly.

    Additional Context

    Since the mentioned commit use golang-migrate/migrate, we can use .Steps(int) method for the n up or down times.

  • test: update benchmark test based on the latest implementation

    test: update benchmark test based on the latest implementation

    This PR is to update the benchmark tests based on the latest implementation. The benchmark tests being updated:

    • repositories for tenant (like secret, project, and namespace)
    • repositories for resource (like resource and backup)
    • repositories for job
    • repositories for scheduler (like job run)
  • Add ability to download all jobs/resources in a project

    Add ability to download all jobs/resources in a project

    Is your feature request related to a problem? Please describe. Optimus currently lacks ability to dump all the jobs present in a project, it can be used for inspecting or backing up the specs present in database. This can also include jobs which are deleted, can be sent back as archived.

    Describe the solution you'd like Add command to download all the resources and jobs in a project as yaml files.

    Describe alternatives you've considered Currently no other alternative other than taking a dump of database.

  • GetJobSpecification and GetJobSpecifications are not returning 404 if the job is not found

    GetJobSpecification and GetJobSpecifications are not returning 404 if the job is not found

    Describe the bug GetJobSpecification and GetJobSpecifications are not returning 404 if the job is not found.

    Expected behavior

    • If no job is found, should return 404 instead of 200 with an empty job.
    • Any caller can handle it gracefully based on the need
    • For external resource managers (being used when resolving job upstreams externally), if receiving a 404 response it will not return an error.
  • The database migration might not run in some scenarios

    The database migration might not run in some scenarios

    Describe the bug We are comparing the current optimus version and the previous version to decide whether to run migration or not. But the version is string so, the comparison behaviour is not guaranteed. eg 0.4.0-rc1 is greater than 0.4.0

    Expected behavior Migrations should run on the deployment of new release.

  • Update code to support airflow version > 2.2.0

    Update code to support airflow version > 2.2.0

    Is your feature request related to a problem? Please describe. Optimus currently runs with version 2.1.4 of airflow, it should support the newer versions of airflow.

    Describe the solution you'd like Currently the pod launcher used in base_dag.py in optimus uses pod_launcher which is deprecated in newer versions of airflow, hence the same dag will not work with never versions. We need to update the SuperKubernetesOperator to provide support for newer version of airflow.

  • Move plugin install command in server side

    Move plugin install command in server side

    Description The plugin command, optimus plugin install, which will be used on the server side should belongs to the server package. As part of making the client and server segregation, this command should be moved to the server command package

    Acceptance Criteria

    • [ ] Command optimus plugin install should be moved to the server command package

    Out of Scope N/A

    Tech Details TBD

A tool for secrets management, encryption as a service, and privileged access management
A tool for secrets management, encryption as a service, and privileged access management

Vault Please note: We take Vault's security and our users' trust very seriously. If you believe you have found a security issue in Vault, please respo

Jan 2, 2023
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Jan 6, 2023
Naabu - a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner
Naabu - a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner

Naabu is a port scanning tool written in Go that allows you to enumerate valid ports for hosts in a fast and reliable manner. It is a really simple tool that does fast SYN/CONNECT scans on the host/list of hosts and lists all ports that return a reply.

Jan 2, 2022
An easy-to-use XChaCha20-encryption wrapper for io.ReadWriteCloser (even lossy UDP) using ECDH key exchange algorithm, ED25519 signatures and Blake3+Poly1305 checksums/message-authentication for Go (golang). Also a multiplexer.

Quick start Prepare keys (on both sides): [ -f ~/.ssh/id_ed25519 ] && [ -f ~/.ssh/id_ed25519.pub ] || ssh-keygen -t ed25519 scp ~/.ssh/id_ed25519.pub

Dec 30, 2022
A fast and easy to use URL health checker ⛑️ Keep your links healthy during tough times
A fast and easy to use URL health checker ⛑️ Keep your links healthy during tough times

AreYouOK? A minimal, fast & easy to use URL health checker Who is AreYouOk made for ? OSS Package Maintainers ??️

Oct 7, 2022
An easy-to-use SHA-1 hash-cracker written in Golang.
An easy-to-use SHA-1 hash-cracker written in Golang.

wrench - An easy-to-use SHA-1 hash-cracker. Wrench is an SHA-1 hash-cracker that relies on wordlists for comparing hashes, and cracking them. Before W

Aug 30, 2021
Easy-to-use Fortnite Launcher for DLL Injection & SSL-Bypass
Easy-to-use Fortnite Launcher for DLL Injection & SSL-Bypass

Easy-to-use Fortnite Launcher for DLL Injection & SSL-Bypass

Dec 26, 2022
Product Analytics, Business Intelligence, and Product Management in a fully self-contained box
Product Analytics, Business Intelligence, and Product Management in a fully self-contained box

Engauge Concept It's not pretty but it's functional. Track user interactions in your apps and products in real-time and see the corresponding stats in

Nov 17, 2021
Sqreen's Application Security Management for the Go language
Sqreen's Application Security Management for the Go language

Sqreen's Application Security Management for Go After performance monitoring (APM), error and log monitoring it’s time to add a security component int

Dec 27, 2022
Secret management toolchain
Secret management toolchain

Harp TL;DR. Why harp? Use cases How does it work? Like a Data pipeline but for secret Immutable transformation What can I do? FAQ License Homebrew ins

Dec 11, 2022
Secretsmanager - Secrets management that allows you to store your secrets encrypted in git

I created secretsmanager to store some secrets within a repository. The secrets are encrypted at rest, with readable keys and editable JSON, so you can rename a key or delete it by hand. The cli tool handles the bare minumum of requirements.

May 6, 2022
step-ca is an online certificate authority for secure, automated certificate management.
step-ca is an online certificate authority for secure, automated certificate management.

??️ A private certificate authority (X.509 & SSH) & ACME server for secure automated certificate management, so you can use TLS everywhere & SSO for SSH.

Jan 6, 2023
Create strong passwords using words that are easy for you to remember
Create strong passwords using words that are easy for you to remember

Grasp Create strong passwords using words that are easy for you to remember A way to circumvent password complexity rules and restrictions while only

Nov 3, 2022
Golang library to make sandboxing AppImages easy

aisap AppImage SAndboxing Project: a Golang library to help sandbox AppImages with bwrap What is it? aisap intends to be a simple way to implement And

Nov 16, 2022
Nuclei is a fast tool for configurable targeted vulnerability scanning based on templates offering massive extensibility and ease of use.
Nuclei is a fast tool for configurable targeted vulnerability scanning based on templates offering massive extensibility and ease of use.

Fast and customisable vulnerability scanner based on simple YAML based DSL. How • Install • For Security Engineers • For Developers • Documentation •

Dec 30, 2022
High-Performance Shortlink ( Short URL ) app creator in Golang. For privacy reasons, you may prefer to host your own short URL app and this is the one to use.
High-Performance Shortlink ( Short URL ) app creator in Golang. For privacy reasons, you may prefer to host your own short URL app and this is the one to use.

About The Project Shortlink App in Golang Multiple Node based Architecture to create and scale at ease Highly performant key-value storage system Cent

Jan 3, 2023