Guardian is a tool for extensible and universal data access with automated access workflows and security controls across data stores, analytical systems, and cloud products.

Last update: Nov 7, 2022

Comments: 13

Guardian

Guardian is a data access management tool. It manages resources from various data providers along with the users’ access. Users required to raise an appeal in order to gain access to a particular resource. The appeal will go through several approvals before it is getting approved and granted the access to the user.

Key Features

Provider Management: Support various providers (currently only BigQuery, more coming up!) and multiple instances for each provider type
Resource Management: Resources from a provider are managed in Guardian's database. There is also an API to update resource's metadata to add additional information.
Appeal-based access: Users are expected to create an appeal for accessing data from registered providers. The appeal will get reviewed by the configured approvers before it gives the access to the user.
Configurable approval flow: Approval flow configures what are needed for an appeal to get approved and who are eligible to approve/reject. It can be configured and linked to a provider so that every appeal created to their resources will follow the procedure in order to get approved.
External Identity Manager: This gives the flexibility to use any third-party identity manager. User properties.

Usage

Explore the following resoruces to get started with Guardian:

Guides provides guidance on usage.
Concepts describes all important Guardian concepts including system architecture.
Reference contains details about configurations and other aspects of Guardian.
Contribute contains resources for anyone who wants to contribute to Guardian.

Running locally

Dependencies:

Git
Go 1.15 or above
PostgreSQL 13.2 or above

$ git clone [email protected]:odpf/guardian.git
$ cd guardian
$ go run main.go migrate
$ go run main.go serve

Running tests

$ make test

Contribute

Development of Guardian happens in the open on GitHub, and we are grateful to the community for contributing bugfixes and improvements. Read below to learn how you can take part in improving Guardian.

Read our contributing guide to learn about our development process, how to propose bugfixes and improvements, and how to build and test your changes to Guardian.

To help you get your feet wet and get you familiar with our contribution process, we have a list of good first issues that contain bugs which have a relatively limited scope. This is a great place to get started.

This project exists thanks to all the contributors.

License

Guardian is Apache 2.0 licensed.

Owner

Open DataOps Foundation

Building and promoting free and open-source modern data platform.

https://github.com/odpf/guardian https://odpf.github.io/guardian/

Comments

Allow provider to configure access duration

Summary Currently, we can't configure the request duration in the provider. The request duration option is defined by the users on the frontend side and every provider has the same duration option like 1 day, 3 days, 30 days, permanent etc

Proposed solution Change provider config to take request duration while registering a provider.

Decouple access management from appeal

Summary

Currently, appeal acts as the information holder for approval flow and user access. We can decouple those responsibilities by creating a new entity called access that manages user access and its lifecycle.

Changes:

1. Process

Once appeal is approved, guardian will create an access entry representing the current user access to the resource
Access revocation will be done to access instead of appeal
If a new appeal to the same resource created by the same user with the same role is approved, guardian will revoke the existing active access (if exists) and create a new one.

2. Entity

Appeal:

  type Appeal struct {
  	ID            string                 
  	ResourceID    string                 
  	PolicyID      string                 
  	PolicyVersion uint                   
  	Status        string                 
  	AccountID     string                 
  	AccountType   string                 
  	CreatedBy     string                 
  	Creator       interface{}            
  	Role          string                 
  	Permissions   []string               
  	Options       *AppealOptions         
  	Details       map[string]interface{} 
  	Labels        map[string]string      

- 	RevokedBy    string    
- 	RevokedAt    time.Time 
- 	RevokeReason string    

  	Policy    *Policy     
  	Resource  *Resource   
  	Approvals []*Approval 

  	CreatedAt time.Time 
  	UpdatedAt time.Time 
  }

Access:

+  type Access struct {
+ 	ID             string
+ 	Status         string // active | inactive
+ 	AccountID      string
+ 	AccountType    string
+ 	ResourceID     string
+ 	Permissions    []string
+ 	ExpirationDate *time.Time
+ 	AppealID       string
+ 	RevokedBy      string
+ 	RevokedAt      *time.Time
+ 	RevokeReason   string
+       CreatedBy      string
+ 	CreatedAt      time.Time
+ 	UpdatedAt      time.Time
+  }

3. Lifecycle

Appeal:

  1. pending
    2. canceled
-   2. active
+   2. approved
-     3. terminated
    2. rejected

Access:

+ 1. active
+   2. inactive

Appeal request on behalf of other user

Summary As a user, they want to raise an appeal for another user in Guardian. This could help the manager/supervisor to raise all access-request for a user and sometimes this access request is for integration account to other systems like metabase, tableau etc.

Proposed solution Allow the appeal-request flow to get another user's email as input. Will trigger the approval flow for giving another user and send appeal notification to both users(in case it's a non-human account).
Import pre-existing access from providers
Requirements

Guardian to collect all pre-existing access from each resource in the provider

User (admin) to be able to revoke that imported access if needed

Approach:

1. Fetching access

~~Option 1: add a flag in the provider config to import access~~ Pros : Under the assumption that after the provider is onboarded to guardian there will be no access created outside guardian, this approach would be just sufficient (simpler process) Cons : Will only run once when the provider is just created

Option 2: provide an API endpoint to trigger import access Pros : Can be triggered at any time Cons : Need to be triggered manually

~~Option 3: regularly collect existing access with jobs~~ Pros : Continuously control those access created outside guardian Cons : Might increase the scope to the Track access drift feature

2. Creating the appeals

The appeal going to be like this:

{ "id": "", // auto-generated "resource_id": "", // added by guardian "policy_id": null, "policy_version": 0, "status": "active", "account_type": "", // imported from provider "account_id": "", // imported from provider "role": "custom", // TODO: need to find a way to map the existing/imported permissions with the roles user defined in the provider "options": {}, "resource": {}, // added by guardian, "approvals": [], // depends on the policy "details": {}, "created_by": "", // TODO: for "user" account type, we can use that as the value here, but can't for serviceAccount or other account types "creator": null, // depends on the policy. Might be empty because iam config defined in the policy // new field(s): "imported": true, // imported flag to differentiate with normal (user-created) appeal "permissions": [] // https://github.com/odpf/guardian/issues/205 }

Things to note

Policy

Policy is going to be null since there's no approval flow for the imported access

Account Types

We are going to import pre-existing access for account types that are defined in the allowed_account_types field in the provider config only.

Role

Assuming this bug has been resolved,

Case 1: the imported access are predefined roles in the Provider Config

Suppose in the provider config we have defined a role named bq-admin which has two permissions: roles/bigquery.dataViewer and roles/bigquery.dataEditor.

In case a user has following access granted outside Guardian: roles/bigquery.dataViewer and roles/bigquery.dataEditor

Those access will be imported and mapped as an appeal with role:bq-admin only. Note that it will also have "permissions": ["roles/bigquery.dataViewer", "roles/bigquery.dataEditor"] in the appeal object

Case 2: the imported access don't have permissions defined in the provider config

A user has roles/bigquery.metadataViewer access in the bigquery, but that permission was not defined in the provider config.

In that case, each access will be mapped into a single appeal with "role" = "roles/bigquery.metadataViewer" and "permissions" = ["roles/bigquery.metadataViewer"]
Add Audit Logging
Requirements

track activities related to data changes & audit logs

Proposed solution

Create a new table in db to store audit logs as well as create its repository and service. Audit Service will be used across other services to log the activity.

database model

type AuditLog struct { Timestamp time.Time Action string // example: appeal.created, provider.created, provider.updated, etc. Actor string // example: [email protected] or system Data interface{} Metadata interface{} }

implementation

repository implementation of this interface can be created in salt for reusability.

type AuditRepository interface { List(filters map[string]interface{}) ([]*AuditLog, error) Create(*AuditLog) error BulkCreate([]*AuditLog) error }

guardian's audit service

type AuditAction string // typed audit action var ( ProviderCreated AuditAction = "provider.created" ResourceBulkCreated AuditAction = "resource.bulkCreated" // ... ) // typed audit data for each audit action type ProviderCreatedData domain.Provider type ResourceBulkCreatedData struct { CreatedResourceIDs []string RemovedResourceIDs []string } // audit service interface representation type AuditService interface { Log(actor, action, message string, payload interface{}) error } // usage auditService.Log("[email protected]", audit.ProviderCreated, "message example", audit.ProviderCreatedData{...}) auditService.Log("system", audit.ResourceBulkCreated, "message example", audit.ResourceBulkCreatedData{...})

guardian audit log list

Domain | Action | Actor | Payload -- | -- | -- | -- provider | create | authorized user | Provider provider | update | authorized user | Provider policy | create | authorized user | Policy resource | bulk insert | guardian | created resource IDs, (soft) deleted resource IDs resource | update | authorized user | Resource appeal | create | authorized user | Appeal appeal | cancel | authorized user | - appeal | approve | authorized user | - appeal | reject | authorized user | appeal id, reason appeal | revoke | authorized user | appeal id, reason appeal | automated revoke | guardian | - appeal | extend | authorized user | Appeal approval | approve | authorized user | approval name, appeal id approval | automated approve | guardian | approval name, appeal id approval | reject | authorized user | approval name, appeal id approval | automated reject | guardian | approval name, appeal id
Bulk appeal access revocation
Summary Guardian does not consider the user status like quit, inactive etc in org/company, there is a possibility that users can switch org while they still have active appeals in the system. These accounts pose a huge security threat. When they grow in number, they also become unmanageable for IT administrators. Hence it is essential to periodically check for inactive user accounts and take appropriate actions on them as soon as they're identified.

Proposed solution Guardian appeal-approval workflow takes identity providers(IAM) like Shied, Ldap, and HTTP to fetch user details like name, title, manager, department, etc.
As a solution we are proposing the following steps:

Identity providers to have a user status as metadata like accountStatus=active

a scheduled job in Guardian

To fetch all users having active appeals

fetch the status of a user using IAM

revoke appeal access if a user is not active

Additional context In the future, we want to trace whether a user is idle for more than a configured duration and will mark a user as inactive and revoke its access.
docs: update documentation
[ ] Documentation for the Provider plugins

[x] Add No-Op Provider

[x] Update BigQuery

[ ] Update Metabase

[x] Update GCS

[ ] Update Grafana

[x] Update GCloud IAM

[x] Add Server Jobs Reference

[x] Add descriptions with examples on the API reference page

[x] Update Introduction
Add custom questions in policy, to be asked when user creates an appeal
Summary While appeal creation the user can be required to answer some questions depending on the policy of a given resource. Currently, the questions asked aren't configurable according to the policy creator. For instance providers like Github/Gitlab will require additional information of userID/username in the appeal which will be required to grant/revoke membership in the organisation.

Requirements

Configurable questions for appeal creation

Policy creator should be able to add questions related to access or security, and it can be shown to the approvers for considering their approval. An example question is "What is the purpose of accessing this resource?"

Provider-specific information that requires to be filled by the appeal creator

this is more into the required information related to managing access to the provider itself. Like provider account username for granting access (for github)

Proposed solution

There are two approaches to making configurable questions.

Add questions at the policy level. We will be required to contain questions specific to the provider in the policy which is attached to each resource in the provider config.

Pros:

This option will allow us to ask resource-specific questions from the user since each policy is attached at the resource level in the provider configurations.

Cons:

Redundancy: According to the current use case for the BigQuery, suppose we use a single policy (ocean) with the list of some approvers. And now we want to use the same policy for another provider say, Github. Assuming the questions being asked in each policy, ocean will also contain the question What is your Github username. This will be a redundant question for other providers which use the same policy and doesn't require the Github username (say BigQuery).

There is no provision to check if the policy would contain a particular question will adding that policy to the resource. For example, if we are registering the policy for the Github resource(organisation), we can't check that the policy has the question to get the username.

Adding the Questions field in the policy.go.

type PolicyAppealConfig struct { DurationOptions []AppealDurationOption + Questions []struct{ + Key string + Question string + Required bool + Description string + } }

This is how the new policy config will look like:

id: bigquery_approval version: 1 steps: ... appeal_config: questions: - key: "reason" question: "Why do you need this resource?" required: true ...

And we are proposing to include the answers in the Options field in the Appeal struct, instead of creating a new field altogether. Thus no migrations will be required. We will not require to query the database for getting these questions, therefore we can embed the value inside the Options field.

type AppealOptions struct { ExpirationDate *time.Time Duration string + Questions []struct{ + Key string // identifier to the question defined in the policy + Value string // answer to the question, which was expected from the user in the appeal + } }

And this is how the final appeal from the user will look like:

{ "account_id": "[email protected]", "resources": [ { "id": "e12345ab-4345-4682-8756-88655317e9ea", "role": "WRITER", "options":[{ "duration":"48h", "questions":[{ "key":"foo", "value": "bar" }] }] } ] }

Add questions at the provider level. Whenever a new provider is registered via Guardian the custom question is defined for that during the creation itself.

Pros:

We will be able to configure the questions without any redundancy while using the same policy. We can also keep a check that the provider configurations contain the mandatory questions in the provider config.

Cons:

Questions can only be asked at the provider-level

Note: For this option, the answer to the question will still be in a similar way as discussed in option 1. The questions field in this case will be added to the providerConfig.

type ProviderConfig struct { Type string URN string AllowedAccountTypes []string Labels map[string]string Credentials interface{} Appeal *AppealConfig Resources []*ResourceConfig + Parameters []*ProviderParameter } + type ProviderParameter []struct{ + Key string + Label string + Required bool + Description string + }

Note:

A unique key / ID can be added to easily query the answer of the question later.

We might as well add a field (regex_validation)if we want to type check the value entered (say if a userID is expected to only be a number)

We will also be required to update the CLI and UI on Datlantis to accept answers to custom questions while creating an appeal

@rahmatrhd @ravisuhag @AkarshSatija @mabdh @bsushmith @singhvikash11

Add support for GCS provider

Summary We need to support access management of GCS.

Proposed solution

[x] Provider configuration for gcs
[x] GCS client
[x] GCS resource & access management (TODO: figure out what resources that need to be granted & revoked)
[x] Documentation

Proposed Provider Config:

type: gcs/gcloud_storage
urn: my-google-cloud-storage
allowed_account_types: 
    -user 
    -serviceAccount 
    -group
    -domain
credentials:
  Service_account_key: base64 encoded Service Account Key
  Resource_name: projects/gcs-project-id
resources:
-type: bucket
  policy:
    id: my_bucket_policy
    version: 1
  roles:
    - id: viewer
      name: viewer
      description: ...
      permissions:
        - roles/storage.objectViewer
    - id: owner
      name: OWNER
      description: ...
      permissions:
        - roles/storage.objectCreator
    - id: admin
      name: ADMIN
      description: ...
      permissions: 
         -roles/storage.objectAdmin
-type: object
  policy:
     id: my_object_policy
     version: 1
  roles:
    - id: viewer
      name: View
      description: ...
      permissions:
         - reader
    - id: owner
      name: OWNER
      description: ...
      permissions:
         - owner

Resource Config for Bucket

{
  "id": 1,
  "provider_type": "gcs",
  "provider_urn": "my-gcs",
  "type": "bucket",
  "urn": "my-bucket-name",
  "name": "my-bucket-name",
  "details": {
    "foo": "bar"
  }
}

Resource Config for Object

{
  "id": 1,
  "provider_type": "gcs",
  "provider_urn": "my-gcs",
  "type": "object",
  "urn": "folder/sub-folder/file.txt",
  "name": "file.txt",
  "details": {
    "foo": "bar"
  }
}

Unauthorised access tracking
Problem? Guardian approval flow with policies makes sure only approved users get access to resources. But if someone was authorized to access directly through the provider platform e.g users are given access to Metabase through its admin portal. Guardian does not have a way to track if access is given to the provider resources which are not being tracked or approved through guardian access workflows.

Solution

Periodically check authorized users at the provider level and track unauthorized access to resources.

Provide reports and alerts when unauthorized access is detected.

Raised alert severity can be based on the privacy level of resources.
feat: update missing cli features
Missing features to add in CLI:

[x] account_type flag in appeals create command

[x] command for showing the approval statuses (visualization)

[x] resource get metadata cmd

[x] appeal revoke

[x] appeal cancel
ability to trigger jobs through API end point

Summary support ability to trigger jobs such as grant_expiration_reminder & grant_expiration_revocation through an API endpoint whenever needed instead of only at a pre-determined cron schedule.
Register Dataplex policy-tags as provider

Summary Register DataCatalog policy tags as a provider so users can raise requests for policy tags with roles/datacatalog.categoryFineGrainedReader permission

Car guardian - web scrape used cars

?? [PROJECTNAME] ?? Short description of the project. ?? ABOUT Are you tired of repetetive searching for used cars? Let me fix your problem. This is k

Oct 28, 2022

provide api for cloud service like aliyun, aws, google cloud, tencent cloud, huawei cloud and so on

cloud-fitter 云适配 Communicate with public and private clouds conveniently by a set of apis. 用一套接口，便捷地访问各类公有云和私有云对接计划内部筹备中，后续开放，有需求欢迎联系。开发者社区开发者社区文档

Dec 20, 2022

Conduit - Data Integration for Production Data Stores

Conduit Data Integration for Production Data Stores. ?? Overview Conduit is a da

Jan 3, 2023

network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Network Node Manager network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of ku

Dec 18, 2022

Open Service Mesh (OSM) is a lightweight, extensible, cloud native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments.

Open Service Mesh (OSM) Open Service Mesh (OSM) is a lightweight, extensible, Cloud Native service mesh that allows users to uniformly manage, secure,

Jan 2, 2023

The AWS Enumerator was created for service enumeration and info dumping for investigations of penetration testers during Black-Box testing. The tool is intended to speed up the process of Cloud review in case the security researcher compromised AWS Account Credentials.

AWS Service Enumeration Disclaimer The tool is in beta stage (testing in progress), no destructive API Calls used ( read only actions ). I hope, there

Dec 23, 2022

Guardian is a tool for extensible and universal data access with automated access workflows and security controls across data stores, analytical systems, and cloud products.

Guardian

Key Features

Usage

Running locally

Running tests

Contribute

License

Owner

Open DataOps Foundation

Comments

Allow provider to configure access duration

Decouple access management from appeal

Summary

Changes:

1. Process

2. Entity

3. Lifecycle

Appeal request on behalf of other user

Import pre-existing access from providers

Requirements

Approach:

1. Fetching access

2. Creating the appeals

Things to note

Policy

Account Types

Role

Add Audit Logging

Requirements

Proposed solution

database model

implementation

guardian audit log list

Bulk appeal access revocation

docs: update documentation

Add custom questions in policy, to be asked when user creates an appeal

Add support for GCS provider

Unauthorised access tracking

feat: update missing cli features

ability to trigger jobs through API end point

Register Dataplex policy-tags as provider

Related tags

Car guardian - web scrape used cars

provide api for cloud service like aliyun, aws, google cloud, tencent cloud, huawei cloud and so on

Conduit - Data Integration for Production Data Stores

network-node-manager is a kubernetes controller that controls the network configuration of a node to resolve network issues of kubernetes.

Open Service Mesh (OSM) is a lightweight, extensible, cloud native service mesh that allows users to uniformly manage, secure, and get out-of-the-box observability features for highly dynamic microservice environments.

The AWS Enumerator was created for service enumeration and info dumping for investigations of penetration testers during Black-Box testing. The tool is intended to speed up the process of Cloud review in case the security researcher compromised AWS Account Credentials.

Cloud-Z gathers information and perform benchmarks on cloud instances in multiple cloud providers.

🐻 The Universal Service Mesh. CNCF Sandbox Project.

🐻 The Universal Service Mesh. CNCF Sandbox Project.

🔥 🔥 Open source cloud native security observability platform. Linux, K8s, AWS Fargate and more. 🔥 🔥

Dynamic Application Security Testing (DAST) for Cloud

An operator to support Haschicorp Vault configuration workflows from within Kubernetes

An operator to support Haschicorp Vault configuration workflows from within Kubernetes

toghsh translates github actions workflows to shell scripts

AwGo — A Go library for Alfred workflows

An extensible tool for creating your own in cluster health endpoints

DepCharge is a tool designed to help orchestrate the execution of commands across many directories at once.

Gola is a Golang tool for automated scripting purpose

go-ima is a tool that checks if a file has been tampered with. It is useful in ensuring integrity in CI systems