Storage Orchestration for Kubernetes

Rook

CNCF Status GitHub release Docker Pulls Go Report Card CII Best Practices Security scanning Slack Twitter Follow

What is Rook?

Rook is an open source cloud-native storage orchestrator for Kubernetes, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

Rook turns storage software into self-managing, self-scaling, and self-healing storage services. It does this by automating deployment, bootstrapping, configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. Rook uses the facilities provided by the underlying cloud-native container management, scheduling and orchestration platform to perform its duties.

Rook integrates deeply into cloud native environments leveraging extension points and providing a seamless experience for scheduling, lifecycle management, resource management, security, monitoring, and user experience.

For more details about the storage solutions currently supported by Rook, please refer to the project status section below. We plan to continue adding support for other storage systems and environments based on community demand and engagement in future releases. See our roadmap for more details.

Rook is hosted by the Cloud Native Computing Foundation (CNCF) as a graduated level project. If you are a company that wants to help shape the evolution of technologies that are container-packaged, dynamically-scheduled and microservices-oriented, consider joining the CNCF. For details about who's involved and how Rook plays a role, read the CNCF announcement.

Getting Started and Documentation

For installation, deployment, and administration, see our Documentation.

Contributing

We welcome contributions. See Contributing to get started.

Report a Bug

For filing bugs, suggesting improvements, or requesting new features, please open an issue.

Reporting Security Vulnerabilities

If you find a vulnerability or a potential vulnerability in Rook please let us know immediately at [email protected]. We'll send a confirmation email to acknowledge your report, and we'll send an additional email when we've identified the issues positively or negatively.

For further details, please see the complete security release process.

Contact

Please use the following to reach members of the community:

Community Meeting

A regular community meeting takes place every other Tuesday at 9:00 AM PT (Pacific Time). Convert to your local timezone.

Any changes to the meeting schedule will be added to the agenda doc and posted to Slack #announcements and the rook-dev mailing list.

Anyone who wants to discuss the direction of the project, design and implementation reviews, or general questions with the broader community is welcome and encouraged to join.

Project Status

The status of each storage provider supported by Rook can be found in the table below. Each API group is assigned its own individual status to reflect their varying maturity and stability. More details about API versioning and status in Kubernetes can be found on the Kubernetes API versioning page, but the key difference between the statuses are summarized below:

  • Alpha: The API may change in incompatible ways in a later software release without notice, recommended for use only in short-lived testing clusters, due to increased risk of bugs and lack of long-term support.
  • Beta: Support for the overall features will not be dropped, though details may change. Support for upgrading or migrating between versions will be provided, either through automation or manual steps.
  • Stable: Features will appear in released software for many subsequent versions and support for upgrading between versions will be provided with software automation in the vast majority of scenarios.
Name Details API Group Status
Ceph Ceph is a distributed storage system that provides file, block and object storage and is deployed in large scale production clusters. ceph.rook.io/v1 Stable

This repo is for the Ceph storage provider. The Cassandra and NFS storage providers moved to a separate repo to allow for each storage provider to have an independent development and release schedule.

Official Releases

Official releases of Rook can be found on the releases page. Please note that it is strongly recommended that you use official releases of Rook, as unreleased versions from the master branch are subject to changes and incompatibilities that will not be supported in the official releases. Builds from the master branch can have functionality changed and even removed at any time without compatibility support and without prior notice.

Licensing

Rook is under the Apache 2.0 license.

FOSSA Status

Owner
Rook
Open Cloud-Native Storage for Kubernetes
Rook
Comments
  • Very high CPU usage on Ceph OSDs (v1.0, v1.1)

    Very high CPU usage on Ceph OSDs (v1.0, v1.1)

    I am not sure where the problem is but I am seeing very high CPU usage since I started using v1.0.0. With three small clusters load average skyrockets to the 10s quite quickly making the nodes unusable. This happens while copying quite a bit of data to a volume mapped on the host bypassing k8s (to restore data from an existing non-k8s server). Nothing else is happening with the clusters at all. I am using low specs servers (2 cores, 8 GB of RAM) but I didn't see any of these high load issues with 0.9.3 on same-specs servers. Has something changed about Ceph or else that might explain this? I've also tried with two providers, Hetzner Cloud and UpCloud. Same issue when actually using a volume.

    Is it just me or is it happening to others as well? Thanks!

  • The rook-ceph-csi-config cm disappeared after host reboot

    The rook-ceph-csi-config cm disappeared after host reboot

    Is this a bug report or feature request?

    • Bug Report

    Deviation from expected behavior: After the host reboot, the configmap rook-ceph-csi-config disappeared. All the ceph-csi pods were in "ContainerCreating" state.

    Expected behavior: The configmap rook-ceph-csi-config should not be deleted in this condition or should be re-created if not found.

    How to reproduce it (minimal and precise):

    Run "reboot" on the one of the host in the cluster. The issue didn't surface each time but had occurred several times in our testing. Below is the failure condition.

    # knc get pods | egrep -v "Run|Com"
    NAME                                                            READY   STATUS              RESTARTS   AGE
    csi-cephfsplugin-6cb75                                          0/3     ContainerCreating   0          15h
    csi-cephfsplugin-9whpq                                          0/3     ContainerCreating   0          15h
    csi-cephfsplugin-bpn88                                          0/3     ContainerCreating   0          15h
    csi-cephfsplugin-gd6kk                                          0/3     ContainerCreating   0          15h
    csi-cephfsplugin-hbjkj                                          0/3     ContainerCreating   0          15h
    csi-cephfsplugin-jt48j                                          0/3     ContainerCreating   0          15h
    csi-cephfsplugin-mlj6w                                          0/3     ContainerCreating   0          15h
    csi-cephfsplugin-provisioner-67cdf965c6-764bx                   0/5     ContainerCreating   0          15h
    csi-cephfsplugin-provisioner-67cdf965c6-pq4wm                   0/5     ContainerCreating   0          15h
    csi-cephfsplugin-rx599                                          0/3     ContainerCreating   0          15h
    csi-rbdplugin-9v8kb                                             0/3     ContainerCreating   0          15h
    csi-rbdplugin-bccpt                                             0/3     ContainerCreating   0          15h
    csi-rbdplugin-bqlpc                                             0/3     ContainerCreating   0          15h
    csi-rbdplugin-f2fb9                                             0/3     ContainerCreating   0          15h
    csi-rbdplugin-h8hbc                                             0/3     ContainerCreating   0          15h
    csi-rbdplugin-l2wbz                                             0/3     ContainerCreating   0          15h
    csi-rbdplugin-njtt7                                             0/3     ContainerCreating   0          15h
    csi-rbdplugin-provisioner-78d6f54775-dq47m                      0/6     ContainerCreating   0          15h
    csi-rbdplugin-provisioner-78d6f54775-hc9nb                      0/6     ContainerCreating   0          15h
    csi-rbdplugin-tfn52                                             0/3     ContainerCreating   0          15h
    rook-ceph-detect-version-kpc2p                                  0/1     Init:0/1            0          1s
    
    # knc describe pod csi-rbdplugin-provisioner-78d6f54775-hc9nb
    ...
    Events:
      Type     Reason       Age                  From                                        Message
      ----     ------       ----                 ----                                        -------
      Warning  FailedMount  40m (x57 over 14h)   kubelet, tesla-cb0434-csd1-csd1-control-03  Unable to attach or mount volumes: unmounted volumes=[ceph-csi-config], unattached volumes=[rook-csi-rbd-provisioner-sa-token-q4kbn host-dev host-sys lib-modules ceph-csi-config keys-tmp-dir socket-dir]: timed out waiting for the condition
      Warning  FailedMount  36m (x58 over 14h)   kubelet, tesla-cb0434-csd1-csd1-control-03  Unable to attach or mount volumes: unmounted volumes=[ceph-csi-config], unattached volumes=[socket-dir rook-csi-rbd-provisioner-sa-token-q4kbn host-dev host-sys lib-modules ceph-csi-config keys-tmp-dir]: timed out waiting for the condition
      Warning  FailedMount  15m (x62 over 15h)   kubelet, tesla-cb0434-csd1-csd1-control-03  Unable to attach or mount volumes: unmounted volumes=[ceph-csi-config], unattached volumes=[keys-tmp-dir socket-dir rook-csi-rbd-provisioner-sa-token-q4kbn host-dev host-sys lib-modules ceph-csi-config]: timed out waiting for the condition
      Warning  FailedMount  67s (x453 over 15h)  kubelet, tesla-cb0434-csd1-csd1-control-03  MountVolume.SetUp failed for volume "ceph-csi-config" : configmap "rook-ceph-csi-config" not found
    
    
    # cephstatus
      cluster:
        id:     79580ff1-adf9-4d6a-a4c6-9dc44fe784c5
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum b,c,d (age 15h)
        mgr: a(active, since 15h)
        mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
        osd: 6 osds: 6 up (since 15h), 6 in (since 15h)
        rgw: 1 daemon active (rook.ceph.store.a)
     
      task status:
        scrub status:
            mds.myfs-a: idle
            mds.myfs-b: idle
     
      data:
        pools:   10 pools, 208 pgs
        objects: 295 objects, 22 MiB
        usage:   6.9 GiB used, 53 GiB / 60 GiB avail
        pgs:     208 active+clean
     
      io:
        client:   852 B/s rd, 1 op/s rd, 0 op/s wr
    
    

    File(s) to submit:

    • Cluster CR (custom resource), typically called cluster.yaml, if necessary
    • Operator's logs, if necessary
    2020-08-25 04:05:57.736755 I | rookcmd: starting Rook v1.3.9 with arguments '/usr/local/bin/rook ceph operator'
    2020-08-25 04:05:57.737076 I | rookcmd: flag values: --add_dir_header=false, --alsologtostderr=false, --csi-cephfs-plugin-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin.yaml, --csi-cephfs-provisioner-dep-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin-provisioner-dep.yaml, --csi-cephfs-provisioner-sts-template-path=/etc/ceph-csi/cephfs/csi-cephfsplugin-provisioner-sts.yaml, --csi-rbd-plugin-template-path=/etc/ceph-csi/rbd/csi-rbdplugin.yaml, --csi-rbd-provisioner-dep-template-path=/etc/ceph-csi/rbd/csi-rbdplugin-provisioner-dep.yaml, --csi-rbd-provisioner-sts-template-path=/etc/ceph-csi/rbd/csi-rbdplugin-provisioner-sts.yaml, --enable-discovery-daemon=true, --enable-flex-driver=false, --enable-machine-disruption-budget=false, --help=false, --kubeconfig=, --log-flush-frequency=5s, --log-level=INFO, --log_backtrace_at=:0, --log_dir=, --log_file=, --log_file_max_size=1800, --logtostderr=true, --master=, --mon-healthcheck-interval=45s, --mon-out-timeout=5m0s, --operator-image=, --service-account=, --skip_headers=false, --skip_log_headers=false, --stderrthreshold=2, --v=0, --vmodule=
    2020-08-25 04:05:57.737087 I | cephcmd: starting operator
    2020-08-25 04:05:57.801061 I | op-discover: rook-discover daemonset already exists, updating ...
    2020-08-25 04:05:57.828608 I | operator: rook-provisioner ceph.rook.io/block started using ceph.rook.io flex vendor dir
    I0825 04:05:57.828776      10 leaderelection.go:242] attempting to acquire leader lease  rook-ceph/ceph.rook.io-block...
    2020-08-25 04:05:57.828838 I | operator: rook-provisioner rook.io/block started using rook.io flex vendor dir
    ...
    2020-08-25 04:05:59.546300 I | op-k8sutil: ROOK_CSI_KUBELET_DIR_PATH="/var/lib/kubelet" (env var)
    2020-08-25 04:05:59.571085 E | ceph-block-pool-controller: failed to reconcile invalid pool CR "csireplpool" spec: failed to get crush map: failed to get crush map. Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
    : exit status 1
    2020-08-25 04:05:59.945701 I | op-mon: parsing mon endpoints: d=10.254.140.99:6789,b=10.254.27.205:6789,c=10.254.144.51:6789
    2020-08-25 04:06:00.229308 W | cephclient: failed to get ceph daemons versions, this likely means there is no cluster yet. failed to run 'ceph versions: exit status 1
    2020-08-25 04:06:00.345543 I | op-mon: parsing mon endpoints: d=10.254.140.99:6789,b=10.254.27.205:6789,c=10.254.144.51:6789
    2020-08-25 04:06:00.537510 E | ceph-file-controller: failed to reconcile invalid object filesystem "myfs" arguments: invalid metadata pool: failed to get crush map: failed to get crush map. Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
    : exit status 1
    2020-08-25 04:06:00.631651 I | ceph-csi: successfully created csi config map "rook-ceph-csi-config"
    2020-08-25 04:06:00.632192 I | ceph-csi: detecting the ceph csi image version for image "bcmt-registry:5000/csi/cephcsi:v2.1.2"
    2020-08-25 04:06:00.821972 I | op-k8sutil: CSI_PROVISIONER_TOLERATIONS="- effect: NoExecute\n  key: is_control\n  operator: Equal\n  value: \"true\"\n- effect: NoExecute\n  key: is_edge\n  operator: Equal\n  value: \"true\"\n- effect: NoExecute\n  key: is_storage\n  operator: Equal\n  value: \"true\"\n- effect: NoSchedule\n  key: node.cloudprovider.kubernetes.io/uninitialized\n  operator: Equal\n  value: \"true\"\n" (env var)
    2020-08-25 04:06:01.028714 W | cephclient: failed to get ceph daemons versions, this likely means there is no cluster yet. failed to run 'ceph versions: exit status 1
    2020-08-25 04:06:01.127681 E | ceph-block-pool-controller: failed to reconcile invalid pool CR "csireplpool" spec: failed to get crush map: failed to get crush map. Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
    : exit status 1
    2020-08-25 04:06:01.326682 E | ceph-object-controller: failed to reconcile invalid object store "rook-ceph-store" arguments: invalid metadata pool spec: failed to get crush map: failed to get crush map. Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
    : exit status 1
    2020-08-25 04:06:01.545112 I | op-mon: parsing mon endpoints: d=10.254.140.99:6789,b=10.254.27.205:6789,c=10.254.144.51:6789
    2020-08-25 04:06:01.545202 I | op-cluster: cluster info loaded for monitoring: &{FSID:79580ff1-adf9-4d6a-a4c6-9dc44fe784c5 MonitorSecret:AQDEckRfaObYIhAAfu5txBedGHfueBAZddUAzg== AdminSecret:AQDEckRfnro2LhAABI76dGcXtkM1BBtXpfHCDA== ExternalCred:{Username: Secret:} Name:rook-ceph Monitors:map[b:0xc00000c580 c:0xc00000c780 d:0xc00000c2c0] CephVersion:{Major:0 Minor:0 Extra:0 Build:0}}
    2020-08-25 04:06:01.545210 I | op-cluster: enabling cluster monitoring goroutines
    2020-08-25 04:06:01.545216 I | op-client: start watching client resources in namespace "rook-ceph"
    2020-08-25 04:06:02.146208 I | op-k8sutil: ROOK_OBC_WATCH_OPERATOR_NAMESPACE="true" (env var)
    2020-08-25 04:06:02.146238 I | op-bucket-prov: ceph bucket provisioner launched watching for provisioner "rook-ceph.ceph.rook.io/bucket"
    2020-08-25 04:06:02.147645 I | op-cluster: ceph status check interval is 60s
    I0825 04:06:02.147724      10 manager.go:118] objectbucket.io/provisioner-manager "msg"="starting provisioner"  "name"="rook-ceph.ceph.rook.io/bucket"
    2020-08-25 04:06:02.356831 I | op-mon: parsing mon endpoints: d=10.254.140.99:6789,b=10.254.27.205:6789,c=10.254.144.51:6789
    2020-08-25 04:06:02.737032 E | ceph-block-pool-controller: failed to reconcile invalid pool CR "csireplpool" spec: failed to get crush map: failed to get crush map. Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
    : exit status 1
    2020-08-25 04:06:02.821955 E | op-cluster: failed to get ceph status. failed to get status. . Error initializing cluster client: ObjectNotFound('error calling conf_read_file',): exit status 1
    2020-08-25 04:06:02.852749 I | op-config: CephCluster "rook-ceph" status: "Failure". "Failed to configure ceph cluster"
    2020-08-25 04:06:02.939369 W | cephclient: failed to get ceph daemons versions, this likely means there is no cluster yet. failed to run 'ceph versions: exit status 1
    2020-08-25 04:06:02.945847 I | op-mon: parsing mon endpoints: d=10.254.140.99:6789,b=10.254.27.205:6789,c=10.254.144.51:6789
    2020-08-25 04:06:03.463716 W | cephclient: failed to get ceph daemons versions, this likely means there is no cluster yet. failed to run 'ceph versions: exit status 1
    2020-08-25 04:06:03.523955 E | ceph-file-controller: failed to reconcile invalid object filesystem "myfs" arguments: invalid metadata pool: failed to get crush map: failed to get crush map. Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
    : exit status 1
    2020-08-25 04:06:03.737853 I | ceph-spec: ceph-block-pool-controller: CephCluster "rook-ceph" found but skipping reconcile since ceph health is &{"HEALTH_ERR" map["error":{"Urgent" "failed to get status. . Error initializing cluster client: ObjectNotFound('error calling conf_read_file',): exit status 1"}] "2020-08-25T04:06:02Z" "2020-08-25T04:06:02Z" "HEALTH_OK"}
    2020-08-25 04:06:03.755237 E | ceph-object-controller: failed to reconcile invalid object store "rook-ceph-store" arguments: invalid metadata pool spec: failed to get crush map: failed to get crush map. Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
    
    • Crashing pod(s) logs, if necessary

    To get logs, use kubectl -n <namespace> logs <pod name> When pasting logs, always surround them with backticks or use the insert code button from the Github UI. Read Github documentation if you need help.

    Environment:

    • OS (e.g. from /etc/os-release): Rhel 7.8
    • Kernel (e.g. uname -a): Linux tesla-cb0434-csd1-csd1-control-01 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Wed Feb 26 03:08:15 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
    • Cloud provider or hardware configuration: openstack
    • Rook version (use rook version inside of a Rook Pod): v1.3.9
    • Storage backend version (e.g. for ceph do ceph -v): v14.2.10
    • Kubernetes version (use kubectl version): v1.18.8
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Tectonic
    • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK
  • Operator not happy after upgrade

    Operator not happy after upgrade

    Is this a bug report or feature request?

    • Bug Report

    Deviation from expected behavior: When restarted, operator checks all OSDs are running, but at some point fails with:

    2018-08-17 01:22:22.635203 I | op-k8sutil: updating deployment rook-ceph-osd-id-130
    2018-08-17 01:22:24.666490 I | op-k8sutil: finished waiting for updated deployment rook-ceph-osd-id-130
    2018-08-17 01:22:24.666509 I | op-osd: started deployment for osd 130 (dir=false, type=bluestore)
    2018-08-17 01:22:24.669040 I | op-osd: osd orchestration status for node ps-100g.sdsu.edu is starting
    2018-08-17 01:22:24.669056 I | op-osd: osd orchestration status for node siderea.ucsc.edu is starting
    2018-08-17 01:22:24.669061 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:24.670869 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:24.770976 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:24.771848 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:24.811933 I | op-provisioner: Deleting volume pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994
    2018-08-17 01:22:24.811952 I | exec: Running command: rbd rm rbd/pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994 --cluster= --conf=/var/lib/rook/.config --keyring=/v
    ar/lib/rook/client.admin.keyring
    2018-08-17 01:22:24.812100 I | op-provisioner: Deleting volume pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994
    2018-08-17 01:22:24.812118 I | exec: Running command: rbd rm rbd/pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994 --cluster= --conf=/var/lib/rook/.config --keyring=/v
    ar/lib/rook/client.admin.keyring
    2018-08-17 01:22:24.812127 I | op-provisioner: Deleting volume pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994
    2018-08-17 01:22:24.812144 I | exec: Running command: rbd rm rbd/pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994 --cluster= --conf=/var/lib/rook/.config --keyring=/v
    ar/lib/rook/client.admin.keyring
    E0817 01:22:24.825364       7 controller.go:1044] Deletion of volume "pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994" failed: Failed to delete rook block image rbd/
    pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994: failed to delete image pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994 in pool rbd: Failed to complete '': exit status 1. g
    lobal_init: unable to open config file from search list /var/lib/rook/.config
    . output:
    E0817 01:22:24.825422       7 goroutinemap.go:165] Operation for "delete-pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994[6aec4ddb-8ea0-11e8-92ef-0cc47a6be994]" faile
    d. No retries permitted until 2018-08-17 01:24:26.825404385 +0000 UTC m=+497.173393014 (durationBeforeRetry 2m2s). Error: Failed to delete rook block image rb
    d/pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994: failed to delete image pvc-5b9e9df4-8ea0-11e8-92ef-0cc47a6be994 in pool rbd: Failed to complete '': exit status 1.
     global_init: unable to open config file from search list /var/lib/rook/.config
    . output:
    E0817 01:22:24.825753       7 controller.go:1044] Deletion of volume "pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994" failed: Failed to delete rook block image rbd/pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994: failed to delete image pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994 in pool rbd: Failed to complete '': exit status 1. global_init: unable to open config file from search list /var/lib/rook/.config
    . output:
    E0817 01:22:24.825791       7 goroutinemap.go:165] Operation for "delete-pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994[5ffa47ae-a0d0-11e8-8e30-0cc47a6be994]" failed. No retries permitted until 2018-08-17 01:24:26.825780946 +0000 UTC m=+497.173769575 (durationBeforeRetry 2m2s). Error: Failed to delete rook block image rbd/pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994: failed to delete image pvc-584f3e18-a0d0-11e8-8e30-0cc47a6be994 in pool rbd: Failed to complete '': exit status 1. global_init: unable to open config file from search list /var/lib/rook/.config
    . output:
    E0817 01:22:24.825958       7 controller.go:1044] Deletion of volume "pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994" failed: Failed to delete rook block image rbd/pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994: failed to delete image pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994 in pool rbd: Failed to complete '': exit status 1. global_init: unable to open config file from search list /var/lib/rook/.config
    . output:
    E0817 01:22:24.825996       7 goroutinemap.go:165] Operation for "delete-pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994[119e6a37-9f46-11e8-92ef-0cc47a6be994]" failed. No retries permitted until 2018-08-17 01:24:26.825984549 +0000 UTC m=+497.173973182 (durationBeforeRetry 2m2s). Error: Failed to delete rook block image rbd/pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994: failed to delete image pvc-0b1bb4e5-9f46-11e8-92ef-0cc47a6be994 in pool rbd: Failed to complete '': exit status 1. global_init: unable to open config file from search list /var/lib/rook/.config
    . output:
    2018-08-17 01:22:24.871944 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:24.872769 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:24.972851 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:24.973845 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.073929 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.074814 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.174891 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.175758 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.275835 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.276647 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.376740 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.377564 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.477646 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.478442 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.578526 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.579283 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.679364 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.680127 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.780216 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.781114 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.881202 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.882004 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:25.982083 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:25.982835 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:26.082912 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:26.083701 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:26.183800 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:26.184550 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:26.284634 I | op-osd: 12/14 node(s) completed osd provisioning
    2018-08-17 01:22:26.285400 I | op-osd: orchestration status config map result channel closed, will restart watch.
    2018-08-17 01:22:26.385449 I | op-osd: 12/14 node(s) completed osd provisioning
    

    After this it keeps printing this message about channel closed.

    Expected behavior: Operator function normally after the upgrade 0.7.1->0.8.1

    Environment:

    • OS (e.g. from /etc/os-release): CentOS 7.5
    • Kernel (e.g. uname -a): 4.14.14-1.el7.elrepo.x86_64
    • Cloud provider or hardware configuration: BAremetal
    • Rook version (use rook version inside of a Rook Pod): 0.8.1
    • Kubernetes version (use kubectl version): 1.11.2
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm
    • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_WARN noscrub,nodeep-scrub flag(s) set
  • cephfs storageclass does not work

    cephfs storageclass does not work

    Is this a bug report or feature request?

    • Bug Report

    Deviation from expected behavior: wp-pv-claim and mysql-pv-claim is Bound, but the cephfs-pvc is Pending

    Expected behavior: The cephfs-pvc works

    How to reproduce it (minimal and precise):

    # microk8s  v1.14.1 on Ubuntu 16.04.5 LTS (Xenial Xerus)
    kubectl apply -f ceph/common.yaml
    kubectl apply -f ceph/operator.yaml
    kubectl apply -f ceph/cluster-test.yaml
    kubectl apply -f ceph/toolbox.yaml
    kubectl apply -f ceph/csi/rbd/storageclass-test.yaml
    kubectl apply -f . # install mysql.yaml wordpress.yaml
    kubectl apply -f ceph/object-test.yaml
    kubectl apply -f ceph/object-user.yaml
    kubectl apply -f ceph/filesystem-test.yaml
    kubectl apply -f ceph/csi/cephfs/storageclass.yaml
    kubectl apply -f ceph/csi/cephfs/kube-registry.yaml
    

    File(s) to submit:

    • Cluster CR (custom resource), typically called cluster.yaml, if necessary
    • Operator's logs, if necessary
    • Crashing pod(s) logs, if necessary
    # kubectl -n kube-system describe pvc cephfs-pvc
    Name:          cephfs-pvc
    Namespace:     kube-system
    StorageClass:  csi-cephfs
    Status:        Pending
    Volume:        
    Labels:        <none>
    Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"cephfs-pvc","namespace":"kube-system"},"spec":{"acc...
                   volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
    Finalizers:    [kubernetes.io/pvc-protection]
    Capacity:      
    Access Modes:  
    VolumeMode:    Filesystem
    Mounted By:    kube-registry-5b9c9854c5-psdsv
                   kube-registry-5b9c9854c5-r2g4l
                   kube-registry-5b9c9854c5-rmqms
    Events:
      Type     Reason                Age                   From                                                                                                             Message
      ----     ------                ----                  ----                                                                                                             -------
      Warning  ProvisioningFailed    5m10s (x11 over 38m)  rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-f64c4574b-mvb7p_cc13f063-e0f5-11e9-813c-56d3e713ca9f  failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
      Normal   ExternalProvisioning  72s (x162 over 41m)   persistentvolume-controller                                                                                      waiting for a volume to be created, either by external provisioner "rook-ceph.cephfs.csi.ceph.com" or manually created by system administrator
      Normal   Provisioning          10s (x12 over 41m)    rook-ceph.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-f64c4574b-mvb7p_cc13f063-e0f5-11e9-813c-56d3e713ca9f  External provisioner is provisioning volume for claim "kube-system/cephfs-pvc"
    
    
    # kubectl -n rook-ceph logs csi-cephfsplugin-provisioner-f64c4574b-mvb7p -c csi-provisioner
    I0927 08:03:15.230702       1 connection.go:183] GRPC response: {}
    I0927 08:03:15.231643       1 connection.go:184] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
    I0927 08:03:15.231740       1 controller.go:979] Final error received, removing PVC 264abaa4-e0f7-11e9-bead-ac1f6b84bde2 from claims in progress
    W0927 08:03:15.231762       1 controller.go:886] Retrying syncing claim "264abaa4-e0f7-11e9-bead-ac1f6b84bde2", failure 11
    E0927 08:03:15.231801       1 controller.go:908] error syncing claim "264abaa4-e0f7-11e9-bead-ac1f6b84bde2": failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
    I0927 08:03:15.231842       1 event.go:209] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"cephfs-pvc", UID:"264abaa4-e0f7-11e9-bead-ac1f6b84bde2", APIVersion:"v1", ResourceVersion:"17010356", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "csi-cephfs": rpc error: code = DeadlineExceeded desc = context deadline exceeded
    
    # ceph -s    
      cluster:
        id:     9acd086e-493b-4ebd-a39f-2be2cce80080
        health: HEALTH_OK
     
      services:
        mon: 1 daemons, quorum a (age 62m)
        mgr: a(active, since 61m)
        mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay
        osd: 1 osds: 1 up (since 61m), 1 in (since 61m)
        rgw: 1 daemon active (my.store.a)
     
      data:
        pools:   9 pools, 72 pgs
        objects: 407 objects, 457 MiB
        usage:   1.1 TiB used, 611 GiB / 1.7 TiB avail
        pgs:     72 active+clean
     
      io:
        client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr
    

    Environment:

    • OS (e.g. from /etc/os-release): microk8s v1.14.1 on Ubuntu 16.04.5 LTS (Xenial Xerus)
    • Kernel (e.g. uname -a): Linux ubun 4.15.0-62-generic #69~16.04.1-Ubuntu SMP Fri Sep 6 02:43:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
    • Cloud provider or hardware configuration:
    • Rook version (use rook version inside of a Rook Pod): rook: v1.1.1
    • Storage backend version (e.g. for ceph do ceph -v): ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
    • Kubernetes version (use kubectl version):Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:02:58Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift):
    • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
  • Recovering Rook cluster after Kubernetes cluster loss

    Recovering Rook cluster after Kubernetes cluster loss

    Is this a bug report or feature request?

    • Feature Request

    Feature Request

    Are there any similar features already existing: n/a

    What should the feature do: Assuming I have backup of rook directory from nodes and I lost my kubernetes cluster I would like to be able to restore persistent volumes after kubernetes cluster redeployment making sure at the same time that existing PVs will be claimed by pods in new cluster

    What would be solved through this feature: Make possible to not lose all the data with lost of the kubernetes cluster

    Does this have an impact on existing features:

    Environment:

    • OS (e.g. from /etc/os-release):
    • Kernel (e.g. uname -a):
    • Cloud provider or hardware configuration:
    • Rook version (use rook version inside of a Rook Pod):
    • Kubernetes version (use kubectl version):
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Bare Metal/(VMs) with CoreOS
    • Ceph status (use ceph health in the Rook toolbox):
  • OSDs crashlooping after being OOMKilled: bind unable to bind

    OSDs crashlooping after being OOMKilled: bind unable to bind

    Is this a bug report or feature request?

    • Bug Report

    Deviation from expected behavior: Updated ceph from v14.2.2-20190722 to v14.2.4-20190917, which seems to have made some changes in memory management, and nodes started getting system OOMKilles followed by OSDs crashlooping.

    2019-09-26 01:20:10.118 7f70aa104dc0 -1 Falling back to public interface
    2019-09-26 01:20:10.128 7f70aa104dc0 -1  Processor -- bind unable to bind to v2:10.244.15.18:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-09-26 01:20:10.128 7f70aa104dc0 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
    2019-09-26 01:20:15.137 7f70aa104dc0 -1  Processor -- bind unable to bind to v2:10.244.15.18:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-09-26 01:20:15.137 7f70aa104dc0 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
    2019-09-26 01:20:20.144 7f70aa104dc0 -1  Processor -- bind unable to bind to v2:10.244.15.18:7300/0 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-09-26 01:20:20.144 7f70aa104dc0 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
    

    How to reproduce it (minimal and precise): Get an OSD OOMKilled by system

    • Rook version (use rook version inside of a Rook Pod): 1.1.1
    • Storage backend version (e.g. for ceph do ceph -v): v14.2.4-20190917
    • Kubernetes version (use kubectl version): 1.15.1
  • OSD Processor -- bind unable to bind to IP on any port in range 6800-7300: (99) Cannot assign requested address

    OSD Processor -- bind unable to bind to IP on any port in range 6800-7300: (99) Cannot assign requested address

    Is this a bug report or feature request?

    • Bug Report

    Deviation from expected behavior:

    I'm running OpenShift v3.11.82 on 3 nodes using RHEL7 with FLEXVOLUME_DIR_PATH set to /usr/libexec/kubernetes/kubelet-plugins/volume/exec(no other changes to *.yml files). I've encountered OSD error "Processor -- bind unable to bind to v2:IP:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address" on all my 3 OSD.

    Expected behavior:

    Bind to address and start to listen on port. rook+ceph should work as expexted.

    How to reproduce it (minimal and precise):

    Environment:

    • OS (e.g. from /etc/os-release): RHEL7
    • Kernel (e.g. uname -a): Linux node2.system10.vlan124.mcp 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GN
    • Cloud provider or hardware configuration: On-prem (VMVare based)
    • Rook version (use rook version inside of a Rook Pod): 1.0.0
    • Kubernetes version (use kubectl version): OpenShift v.3.11.82 (aka. K8s v1.11)
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): OpenShift
    • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

    LOGS

    rook-ceph-osd-0

    2019-05-08 17:01:38.770518 I | rookcmd: starting Rook v1.0.0-13.g05b0166 with arguments '/rook/rook ceph osd start -- --foreground --id 0 --osd-uuid c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false'
    2019-05-08 17:01:38.776349 I | rookcmd: flag values: --help=false, --log-flush-frequency=5s, --log-level=INFO, --osd-id=0, --osd-store-type=bluestore, --osd-uuid=c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e
    2019-05-08 17:01:38.776480 I | op-mon: parsing mon endpoints: 
    2019-05-08 17:01:38.776577 W | op-mon: ignoring invalid monitor 
    2019-05-08 17:01:38.784163 I | exec: Running command: stdbuf -oL ceph-volume lvm activate --no-systemd --bluestore 0 c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e
    2019-05-08 17:01:39.683863 I | Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-0
    2019-05-08 17:01:40.012259 I | Running command: /usr/sbin/restorecon /var/lib/ceph/osd/ceph-0
    2019-05-08 17:01:40.312664 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
    2019-05-08 17:01:40.648273 I | Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-aaa9dfa1-6146-4a9d-9b19-781bf0f71abe/osd-data-06f7e1bb-336c-4b7f-860f-39ce2217ef5c --path /var/lib/ceph/osd/ceph-0 --no-mon-config
    2019-05-08 17:01:41.200711 I | Running command: /bin/ln -snf /dev/ceph-aaa9dfa1-6146-4a9d-9b19-781bf0f71abe/osd-data-06f7e1bb-336c-4b7f-860f-39ce2217ef5c /var/lib/ceph/osd/ceph-0/block
    2019-05-08 17:01:41.508309 I | Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-0/block
    2019-05-08 17:01:41.813803 I | Running command: /bin/chown -R ceph:ceph /dev/mapper/ceph--aaa9dfa1--6146--4a9d--9b19--781bf0f71abe-osd--data--06f7e1bb--336c--4b7f--860f--39ce2217ef5c
    2019-05-08 17:01:42.097564 I | Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-0
    2019-05-08 17:01:42.388105 I | --> ceph-volume lvm activate successful for osd ID: 0
    2019-05-08 17:01:42.400127 I | exec: Running command: ceph-osd --foreground --id 0 --osd-uuid c05a22dc-97e5-4463-b4e6-ebd9b00d0f2e --conf /var/lib/rook/osd0/rook-ceph.config --cluster ceph --default-log-to-file false
    2019-05-08 17:01:43.178290 I | 2019-05-08 17:01:43.177 7fe1fd660d80 -1 Falling back to public interface
    2019-05-08 17:01:43.829540 I | 2019-05-08 17:01:43.829 7fe1fd660d80 -1 osd.0 69 log_to_monitors {default=true}
    2019-05-08 17:01:43.847126 I | 2019-05-08 17:01:43.846 7fe1efef6700 -1 osd.0 69 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
    2019-05-08 17:01:54.275215 I | 2019-05-08 17:01:54.274 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:01:54.275251 I | 2019-05-08 17:01:54.274 7fe1e5ee2700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
    2019-05-08 17:01:59.291147 I | 2019-05-08 17:01:59.290 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:01:59.291193 I | 2019-05-08 17:01:59.290 7fe1e5ee2700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
    2019-05-08 17:02:04.308557 I | 2019-05-08 17:02:04.308 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:02:04.308596 I | 2019-05-08 17:02:04.308 7fe1e5ee2700 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
    2019-05-08 17:02:04.320883 I | 2019-05-08 17:02:04.320 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:02:04.320920 I | 2019-05-08 17:02:04.320 7fe1e5ee2700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
    2019-05-08 17:02:09.332874 I | 2019-05-08 17:02:09.332 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:02:09.332911 I | 2019-05-08 17:02:09.332 7fe1e5ee2700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
    2019-05-08 17:02:14.345173 I | 2019-05-08 17:02:14.344 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:02:14.345214 I | 2019-05-08 17:02:14.344 7fe1e5ee2700 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
    2019-05-08 17:02:14.359483 I | 2019-05-08 17:02:14.358 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:02:14.359524 I | 2019-05-08 17:02:14.358 7fe1e5ee2700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
    2019-05-08 17:02:19.372082 I | 2019-05-08 17:02:19.371 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:02:19.372114 I | 2019-05-08 17:02:19.371 7fe1e5ee2700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds 
    2019-05-08 17:02:24.385045 I | 2019-05-08 17:02:24.384 7fe1e5ee2700 -1  Processor -- bind unable to bind to v2:10.130.0.1:7300/2027 on any port in range 6800-7300: (99) Cannot assign requested address
    2019-05-08 17:02:24.385079 I | 2019-05-08 17:02:24.384 7fe1e5ee2700 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
    2019-05-08 17:02:24.385338 I | 2019-05-08 17:02:24.384 7fe1f3786700 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
    2019-05-08 17:02:24.385351 I | 2019-05-08 17:02:24.384 7fe1f3786700 -1 osd.0 74 *** Got signal Interrupt ***
    

    osd-0 config

    [global]
    fsid                      = e9f14792-30db-427d-8537-73e5b73a91ac
    run dir                   = /var/lib/rook/osd0
    mon initial members       = c a b
    mon host                  = v1:172.30.246.255:6789,v1:172.30.152.231:6789,v1:172.30.117.179:6789
    public addr               = 10.130.0.19
    cluster addr              = 10.130.0.19
    mon keyvaluedb            = rocksdb
    mon_allow_pool_delete     = true
    mon_max_pg_per_osd        = 1000
    debug default             = 0
    debug rados               = 0
    debug mon                 = 0
    debug osd                 = 0
    debug bluestore           = 0
    debug filestore           = 0
    debug journal             = 0
    debug leveldb             = 0
    filestore_omap_backend    = rocksdb
    osd pg bits               = 11
    osd pgp bits              = 11
    osd pool default size     = 1
    osd pool default min size = 1
    osd pool default pg num   = 100
    osd pool default pgp num  = 100
    osd objectstore           = filestore
    crush location            = root=default host=node2-system10-vlan124-mcp
    rbd_default_features      = 3
    fatal signal handlers     = false
    
    [osd.0]
    keyring              = /var/lib/ceph/osd/ceph-0/keyring
    bluestore block path = /var/lib/ceph/osd/ceph-0/block
    
  • mgr pod in CrashLoop in 0.8.x

    mgr pod in CrashLoop in 0.8.x

    Is this a bug report or feature request? Bug Report

    Deviation from expected behavior: After rescheduling the mgr pod, it goes into a CrashLoop with the following:

    2018-09-30 19:26:48.956022 I | ceph-mgr: 2018-09-30 19:26:48.955809 7f0adebf4700  1 mgr send_beacon active
    2018-09-30 19:26:50.970860 I | ceph-mgr: 2018-09-30 19:26:50.970649 7f0adebf4700  1 mgr send_beacon active
    2018-09-30 19:26:52.985827 I | ceph-mgr: 2018-09-30 19:26:52.985611 7f0adebf4700  1 mgr send_beacon active
    2018-09-30 19:26:54.004538 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Bus STARTING
    2018-09-30 19:26:54.004566 I | ceph-mgr: CherryPy Checker:
    2018-09-30 19:26:54.004575 I | ceph-mgr: The Application mounted at '' has an empty config.
    2018-09-30 19:26:54.004581 I | ceph-mgr: 
    2018-09-30 19:26:54.004588 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Started monitor thread '_TimeoutMonitor'.
    2018-09-30 19:26:54.004594 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Bus STARTING
    2018-09-30 19:26:54.004600 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Started monitor thread '_TimeoutMonitor'.
    2018-09-30 19:26:54.004606 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Serving on :::7000
    2018-09-30 19:26:54.004611 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Bus STARTED
    2018-09-30 19:26:54.004624 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Serving on :::9283
    2018-09-30 19:26:54.004630 I | ceph-mgr: [30/Sep/2018:19:26:47] ENGINE Bus STARTED
    2018-09-30 19:26:54.004636 I | ceph-mgr: terminate called after throwing an instance of 'std::out_of_range'
    2018-09-30 19:26:54.004644 I | ceph-mgr:   what():  map::at
    failed to run mgr. failed to start mgr: Failed to complete 'ceph-mgr': signal: aborted (core dumped).
    

    Expected behavior: No crash loop ;)

    How to reproduce it (minimal and precise): Personally I've experienced it in several test clusters, but haven't had time to dig into it until tonight. Another user mentioned this a week ago on slack and @galexrt pointed to this bug in Ceph that seems to be related: https://tracker.ceph.com/issues/24982

    In that issue, people mention multiple RGWs and I'm running RGWs as a daemonset for these clusters. So I tried to scale the amount of RGWs down to 1, using NodeAffinity, and nor the mgr was able to start up. After it's started, I can scale the RGWs back up to full count (5 on testing cluster) in one go and the mgr stays up. Without knowing this in depth, it seems to me the RGWs are building up a history of metrics to deliver to the mgr when they can't reach it. When the mgr starts again, these historic metrics overwhelms it and it gets startled, not to confuse with started.

    Environment:

    • OS (e.g. from /etc/os-release): CoreOS
    • Kernel (e.g. uname -a): Something new
    • Cloud provider or hardware configuration: Bare metal
    • Rook version (use rook version inside of a Rook Pod): v0.8.1 - 99% sure I saw it on 0.8.2 as well, but had to vacate that due to a different bug.
    • Kubernetes version (use kubectl version): 1.11.3
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Kubespray
    • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): Seems happy, but no dashboard and metrics obviously.
  • OpenShift: insufficient permission inside the containers

    OpenShift: insufficient permission inside the containers

    Bug Report

    What happened:

    When trying to create a cluster the operator fails with:

    op-cluster: failed to create cluster in namespace rook. failed to start the mons. failed to initialize ceph cluster info. failed to get cluster info. failed to create mon secret
    s. failed to create dir /var/lib/rook/rook. mkdir /var/lib/rook: permission denied
    

    What you expected to happen:

    Cluster creation should succeed.

    Additional information:

    OpenShift uses the following feature to get fewer user privileges on application development where the expected user is 'root', see https://blog.openshift.com/jupyter-on-openshift-part-6-running-as-an-assigned-user-id/

    How to reproduce it (minimal and precise):

    Simply run kubectl create -f rook-cluster.yml

    Environment:

    • OS (e.g. from /etc/os-release): CentOS Linux release 7.4.1708 (Core)
    • Kernel (e.g. uname -a): Linux k8s-master.example.com 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
    • Cloud provider or hardware configuration: VM 1 CPU, 4GB RAM
    • Rook version (use rook version inside of a Rook Pod): v0.6.0-80.g3dfb151
    • Kubernetes version (use kubectl version): v1.7.6+a08f5eeb62
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): OpenShift
    • Ceph status (use ceph health in the Rook toolbox): no cluster yet
  • rook cluster fails after restart k8s node

    rook cluster fails after restart k8s node

    i'm created operator, cluster and filesystem:

                  apiVersion: rook.io/v1alpha1
                  kind: Filesystem
                  metadata:
                    name: rookfs
                    namespace: rook-cluster
                  spec:
                    metadataPool:
                      replicated:
                        size: 3
                    dataPools:
                      - erasureCoded:
                         codingChunks: 1
                         dataChunks: 2
                    metadataServer:
                      activeCount: 1
                      activeStandby: true
    

    and try to connect to it:

    apiVersion: v1
    kind: Pod
    metadata:
      name: ceph-tools
      namespace: rook-cluster
    spec:
      containers:
      - name: ceph-tools
        image: nginx:1.13.5-alpine
        imagePullPolicy: IfNotPresent
        volumeMounts:
          - name: ceph-store
            mountPath: /srv/ceph
      volumes:
        - name: ceph-store
          flexVolume:
            driver: rook.io/rook
            fsType: ceph
            options:
              fsName: rookfs
              clusterName: rook-cluster
              path: /
    

    and have such error:

      12m   1m      6       kubelet, 10.1.29.25             Warning FailedMount     Unable to mount volumes for pod "ceph-tools_rook-cluster(6ac879d0-d672-11e7-88d3-0050569d5b15)": timeout expired waiting for volumes to attach/mount for pod "rook-cluster"/"ceph-tools". list of unattached/unmounted volumes=[ceph-store]
      12m   1m      6       kubelet, 10.1.29.25             Warning FailedSync      Error syncing pod
      9m    13s     2       kubelet, 10.1.29.25             Warning FailedMount     MountVolume.SetUp failed for volume "ceph-store" : mount command failed, status: Failure, reason: failed to mount filesystem rookfs to /var/lib/kubelet/pods/6ac879d0-d672-11e7-88d3-0050569d5b15/volumes/rook.io~rook/ceph-store with monitor 10.3.183.65:6790,10.3.65.12:6790,10.3.131.61:6790:/ and options [name=admin secret=AQBNCiFa0oqjKhAA2WH+8VwHB0y17Irg6HCmIw== mds_namespace=rookfs]: mount failed: exit status 32
    Mounting command: mount
    Mounting arguments: 10.3.183.65:6790,10.3.65.12:6790,10.3.131.61:6790:/ /var/lib/kubelet/pods/6ac879d0-d672-11e7-88d3-0050569d5b15/volumes/rook.io~rook/ceph-store ceph [name=admin secret=AQBNCiFa0oqjKhAA2WH+8VwHB0y17Irg6HCmIw== mds_namespace=rookfs]
    Output: mount: mount 10.3.183.65:6790,10.3.65.12:6790,10.3.131.61:6790:/ on /var/lib/kubelet/pods/6ac879d0-d672-11e7-88d3-0050569d5b15/volumes/rook.io~rook/ceph-store failed: Connection timed out
    
  • Contribute rook-ceph operator to Community OKD/OpenShift Operators

    Contribute rook-ceph operator to Community OKD/OpenShift Operators

    Is this a bug report or feature request?

    • Feature Request

    What should the feature do: Include the rook-ceph operator as a Community OpenShift Operator in the OKD and OCP OperatorHub.

    What is use case behind this feature: As a OKD/OCP user, I want to install the rook-ceph operator on my cluster.

    Environment:

    OKD/OCP 3.11 or 4.1

    I see PR https://github.com/operator-framework/community-operators/pull/78 which implements this feature was recently closed without explanation. Since the upstream rook-ceph operator does not work on OKD/OCP, it is important that it be included in the catalog as an unsupported community operator. The strimzi-kafka-operator seems similar. It appears in both the upstream and community catalogs. Then the downstream Red Hat certified and supported amq-streams distribution of the strimzi-kafka-operator appears in the certified red hat catalog. Is this not the case for the rook operator as well?

  • helm: add the missing config in helm for external cluster

    helm: add the missing config in helm for external cluster

    Closes: https://github.com/rook/rook/issues/11480 Signed-off-by: parth-gr [email protected]

    Description of your changes:

    Which issue is resolved by this Pull Request: Resolves #

    Checklist:

    • [ ] Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
    • [ ] Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
    • [ ] Reviewed the developer guide on Submitting a Pull Request
    • [ ] Pending release notes updated with breaking and/or notable changes for the next minor release.
    • [ ] Documentation has been updated, if necessary.
    • [ ] Unit tests have been added, if necessary.
    • [ ] Integration tests have been added, if necessary.
  • Error mounting a volume

    Error mounting a volume "rpc error: code = Internal desc = missing required field monitors"

    When I try to create:

    • a volume: wp-dev-volume-backup (with the storage class: rook-cephfs)
    • a persistent volume claim: wp-dev-pvc-backup
    • a sidecar: backup-sidecar (that mounts the previously created volume) with this command:
    kubectl apply -f .\test.yaml 
    persistentvolume/wp-dev-volume-backup created
    persistentvolumeclaim/wp-dev-pvc-backup created
    deployment.apps/backup-sidecar created
    

    I have this error:

    20s         Normal    SuccessfulAttachVolume   pod/backup-sidecar-7896b7c9fd-s7f7z       AttachVolume.Attach succeeded for volume "wp-dev-volume-backup"
    1s          Warning   FailedMount              pod/backup-sidecar-7896b7c9fd-s7f7z       MountVolume.MountDevice failed for volume "wp-dev-volume-backup" : rpc error: code = Internal desc = rpc error: code = Internal desc = missing required field monitors
    

    It seems that the volume is attacched but it is not mounted.

    All the items are defined in the file test.yaml:

    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      annotations:
        pv.kubernetes.io/provisioned-by: rook-ceph.cephfs.csi.ceph.com
        volume.kubernetes.io/provisioner-deletion-secret-name: rook-csi-cephfs-provisioner
        volume.kubernetes.io/provisioner-deletion-secret-namespace: rook-ceph
      finalizers:
      - kubernetes.io/pv-protection
      name: wp-dev-volume-backup
    spec:
      accessModes:
      - ReadWriteMany
      capacity:
        storage: 10Gi
      claimRef:
        apiVersion: v1
        kind: PersistentVolumeClaim
        name: wp-dev-pvc-backup
        namespace: wordpress-dev
      csi:
        controllerExpandSecretRef:
          name: rook-csi-cephfs-provisioner
          namespace: rook-ceph
        driver: rook-ceph.cephfs.csi.ceph.com
        nodeStageSecretRef:
          name: rook-csi-cephfs-node
          namespace: rook-ceph
        volumeAttributes:
          clusterID: rook-ceph
          fsName: cephfs
          pool: cephfs-replicated
        volumeHandle: wp-dev-volume-backup
      persistentVolumeReclaimPolicy: Delete
      storageClassName: rook-cephfs
      volumeMode: Filesystem
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      annotations:
        volume.beta.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
        volume.kubernetes.io/storage-provisioner: rook-ceph.cephfs.csi.ceph.com
      creationTimestamp: "2023-01-05T09:34:57Z"
      finalizers:
      - kubernetes.io/pvc-protection
      name: wp-dev-pvc-backup
      namespace: wordpress-dev
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 10Gi
      storageClassName: rook-cephfs
      volumeMode: Filesystem
      volumeName: wp-dev-volume-backup
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      generation: 1
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-wordpress-dev-backup-sidecar  
      name: backup-sidecar
      namespace: wordpress-dev
    spec:
      progressDeadlineSeconds: 600
      replicas: 1
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          workload.user.cattle.io/workloadselector: apps.deployment-wordpress-dev-backup-sidecar
      strategy:
        rollingUpdate:
          maxSurge: 25%
          maxUnavailable: 25%
        type: RollingUpdate
      template:
        metadata:
          labels:
            workload.user.cattle.io/workloadselector: apps.deployment-wordpress-dev-backup-sidecar
        spec:
          affinity: {}
          containers:
          - image: php:fpm
            imagePullPolicy: Always
            name: container-0
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /backup
              name: volume-backup
          dnsPolicy: ClusterFirst
          restartPolicy: Always
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
          volumes:
          - name: volume-backup
            persistentVolumeClaim:
              claimName: wp-dev-pvc-backup
    ---
    

    If I create the volume, the pvc and the container from Rancher it works and the volume is mounted, but if I use kubectl and the yaml files I have the previous error. I have to use yaml files because I'm creating an Helm chart.

    Environment: I'm using Rancher/Kubernetes on the Azure cloud:

    • Rancher: v2.6.9
    • Kubernetes: v1.24.3
    • Rook version: v1.10.7
    • Ceph: v17.2.5
    • csi-node-driver-registrar: v2.5.1
  • object: Move bucket notifications to stable

    object: Move bucket notifications to stable

    Description of your changes: Bucket notifications and topics have been implemented since v1.8 and have been stable. Therefore, with v1.11 we move the feature to stable.

    Which issue is resolved by this Pull Request: Related to #11484

    Checklist:

    • [ ] Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide).
    • [ ] Skip Tests for Docs: If this is only a documentation change, add the label skip-ci on the PR.
    • [ ] Reviewed the developer guide on Submitting a Pull Request
    • [ ] Pending release notes updated with breaking and/or notable changes for the next minor release.
    • [ ] Documentation has been updated, if necessary.
    • [ ] Unit tests have been added, if necessary.
    • [ ] Integration tests have been added, if necessary.
  • Unable to upgrade from 1.0.4 to 1.1.9 (using all devices in more than one namespace is not supported)

    Unable to upgrade from 1.0.4 to 1.1.9 (using all devices in more than one namespace is not supported)

    Is this a bug report or feature request?

    • Bug Report

    Deviation from expected behavior:

    • Error [errno 13] error connecting to the cluster is faced after the upgrade when the ceph status with kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
    • Error op-cluster: failed to configure local ceph cluster. using all devices in more than one namespace is not supported is faced when we check the Operator logs
    • Error in the csi-provisioner as follows is faced:
    kubectl -n rook-ceph logs csi-rbdplugin-provisioner-7cdb456cdc-wxtxt csi-provisioner
    I0103 12:40:30.619889       1 leaderelection.go:246] failed to acquire lease rook-ceph/rook-ceph-rbd-csi-ceph-com
    I0103 12:40:40.373596       1 leaderelection.go:350] lock is held by csi-rbdplugin-provisioner-7cdb456cdc-8fcnj and has not yet expired
    
    • Network connection cannot be established.:
    $  kubectl -n rook-ceph get svc -l app=rook-ceph-mon
    NAME              TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
    rook-ceph-mon-a   ClusterIP   10.96.3.68   <none>        6789/TCP,3300/TCP   170m
    camila@camila-ubuntu-2204:~$ kubectl -n rook-ceph exec -ti deploy/csi-cephfsplugin-provisioner -c csi-cephfsplugin -- bash
    [root@csi-cephfsplugin-provisioner-9bd478589-d2hxb /]# curl 10.96.3.68 2>/dev/null
    

    NOTE: The version 1.0.4 is installed using the default namespaces. Then, I am trying to migrate/upgrade to 1.1.9 to be able to move forward up to 1.4

    Expected behavior:

    Be able to upgrade from 1.0.4 to 1.1.9 without face an error so that I can move forward until the latest version releases

    How to reproduce it (minimal and precise):

    Install 1.0.4 default/examples and try to upgrade to 1.1.9

    File(s) to submit:

    The config to install 1.0.4 can be found in:

    • cluster manifests: https://github.com/replicatedhq/kURL/tree/main/addons/rook/1.0.4/cluster (from examples)
    • operator: https://github.com/replicatedhq/kURL/tree/main/addons/rook/1.0.4/operator (from examples)

    Logs to submit:

    $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
    [root@camila-ubuntu-2204-test /]# ceph status
    [errno 13] error connecting to the cluster
    
    • Operator's logs, if necessary
    2022-12-30 10:59:15.522511 I | exec: Running command: ceph mon_status --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json --out-file /tmp/695696045
    2022-12-30 10:59:15.675808 I | exec: [errno 13] error connecting to the cluster
    2022-12-30 10:59:15.675981 E | op-cluster: failed to create cluster in namespace rook-ceph. failed to start the mons. failed to start mon pods. failed to check mon quorum a. failed to wait for mon quorum. exceeded max retry count waiting for monitors to reach quorum
    2022-12-30 10:59:15.676003 E | op-cluster: failed to configure local ceph cluster. giving up waiting for cluster creating. timed out waiting for the condition
    2022-12-30 10:59:15.676284 I | op-cluster: Update event for uninitialized cluster rook-ceph. Initializing...
    2022-12-30 10:59:15.676311 I | op-cluster: CephCluster rook-ceph status: Error. using all devices in more than one namespace is not supported
    2022-12-30 10:59:15.689390 E | op-cluster: failed to configure local ceph cluster. using all devices in more than one namespace is not supported
    2022-12-30 10:59:15.689468 I | op-cluster: Update event for uninitialized cluster rook-ceph. Initializing...
    2022-12-30 10:59:15.689482 I | op-cluster: CephCluster rook-ceph status: Error. using all devices in more than one namespace is not supported
    2022-12-30 10:59:15.701223 E | op-cluster: failed to configure local ceph cluster. using all devices in more than one namespace is not supported
    

    Cluster Status to submit:

    $ kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath=‘{range .items[*]}{“rook-version=“}{.metadata.labels.rook-version}{“\n”}{end}’ | sort | uniq
    rook-version=v1.0.4
    rook-version=v1.1.9
    
    camila@camila-ubuntu-2204-test:~$ export ROOK_SYSTEM_NAMESPACE="rook-ceph"
    camila@camila-ubuntu-2204-test:~$ export ROOK_NAMESPACE="rook-ceph"
    camila@camila-ubuntu-2204-test:~$ kubectl -n $ROOK_SYSTEM_NAMESPACE get pods
    NAME                                                  READY   STATUS      RESTARTS   AGE
    csi-cephfsplugin-9f4h6                                3/3     Running     0          87m
    csi-cephfsplugin-provisioner-9bd478589-6kskn          4/4     Running     0          87m
    csi-cephfsplugin-provisioner-9bd478589-8vmqk          4/4     Running     0          87m
    csi-rbdplugin-provisioner-7cdb456cdc-649b6            5/5     Running     0          87m
    csi-rbdplugin-provisioner-7cdb456cdc-csnjf            5/5     Running     0          87m
    csi-rbdplugin-shznz                                   3/3     Running     0          87m
    rook-ceph-agent-mq2px                                 1/1     Running     0          87m
    rook-ceph-mgr-a-656d74c8f9-c74bh                      1/1     Running     0          118m
    rook-ceph-mon-a-74d96dcbdd-m7hcv                      1/1     Running     0          86m
    rook-ceph-operator-86f766bbb8-xwfnn                   1/1     Running     0          87m
    rook-ceph-osd-1-bc99bf785-bpg7q                       1/1     Running     0          95m
    rook-ceph-osd-prepare-camila-ubuntu-2204-test-h57gh   0/2     Completed   1          96m
    rook-ceph-rgw-rook-ceph-store-a-6647fff9cc-cj8ck      1/1     Running     0          117m
    rook-ceph-tools-d5dc67475-vp6r2                       1/1     Running     0          87m
    rook-discover-pl2p7                                   1/1     Running     0          87m
    camila@camila-ubuntu-2204-test:~$ kubectl -n $ROOK_NAMESPACE get pods
    NAME                                                  READY   STATUS      RESTARTS   AGE
    csi-cephfsplugin-9f4h6                                3/3     Running     0          87m
    csi-cephfsplugin-provisioner-9bd478589-6kskn          4/4     Running     0          87m
    csi-cephfsplugin-provisioner-9bd478589-8vmqk          4/4     Running     0          87m
    csi-rbdplugin-provisioner-7cdb456cdc-649b6            5/5     Running     0          87m
    csi-rbdplugin-provisioner-7cdb456cdc-csnjf            5/5     Running     0          87m
    csi-rbdplugin-shznz                                   3/3     Running     0          87m
    rook-ceph-agent-mq2px                                 1/1     Running     0          87m
    rook-ceph-mgr-a-656d74c8f9-c74bh                      1/1     Running     0          118m
    rook-ceph-mon-a-74d96dcbdd-m7hcv                      1/1     Running     0          87m
    rook-ceph-operator-86f766bbb8-xwfnn                   1/1     Running     0          87m
    rook-ceph-osd-1-bc99bf785-bpg7q                       1/1     Running     0          95m
    rook-ceph-osd-prepare-camila-ubuntu-2204-test-h57gh   0/2     Completed   1          96m
    rook-ceph-rgw-rook-ceph-store-a-6647fff9cc-cj8ck      1/1     Running     0          117m
    rook-ceph-tools-d5dc67475-vp6r2                       1/1     Running     0          87m
    rook-discover-pl2p7                                   1/1     Running     0          87m
    camila@camila-ubuntu-2204-test:~$ TOOLS_POD=$(kubectl -n $ROOK_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}')
    camila@camila-ubuntu-2204-test:~$ kubectl -n $ROOK_NAMESPACE exec -it $TOOLS_POD -- ceph status
    [errno 13] error connecting to the cluster
    command terminated with exit code 13
    
    $ kubectl -n rook-ceph exec deploy/rook-ceph-operator -- curl $(kubectl -n rook-ceph get svc -l app=rook-ceph-mon -o jsonpath='{.items[0].spec.clusterIP}'):3300 2>/dev/null
    ceph v2
    
     kubectl -n rook-ceph get pod -l app=rook-ceph-mon
    NAME                               READY   STATUS    RESTARTS   AGE
    rook-ceph-mon-a-6bbf7d74ff-gsnml   1/1     Running   0          32m
    
    kubectl -n rook-ceph get pod -l app=csi-rbdplugin-provisioner
    NAME                                         READY   STATUS    RESTARTS   AGE
    csi-rbdplugin-provisioner-7cdb456cdc-8fcnj   5/5     Running   0          33m
    csi-rbdplugin-provisioner-7cdb456cdc-wxtxt   5/5     Running   0          33m
    
    • Output of krew commands, if necessary
    $ kubectl rook-ceph health
    Info:  Checking if at least three mon pods are running on different nodes
    Warning:  At least three mon pods should running on different nodes
    rook-ceph-mon-a-6bbf7d74ff-gsnml                  1/1     Running     0          33m
    
    Info:  Checking mon quorum and ceph health details
    [errno 13] error connecting to the cluster
    command terminated with exit code 13
    
    Info:  Checking if at least three osd pods are running on different nodes
    Warning:  At least three osd pods should running on different nodes
    rook-ceph-osd-1-7df7bd4bf7-s2trz                  1/1     Running     0          42m
    
    Info:  Pods that are in 'Running' status
    NAME                                              READY   STATUS    RESTARTS   AGE
    csi-cephfsplugin-ngmzq                            3/3     Running   0          34m
    csi-cephfsplugin-provisioner-9bd478589-d2hxb      4/4     Running   0          34m
    csi-cephfsplugin-provisioner-9bd478589-n69bd      4/4     Running   0          34m
    csi-rbdplugin-provisioner-7cdb456cdc-8fcnj        5/5     Running   0          34m
    csi-rbdplugin-provisioner-7cdb456cdc-wxtxt        5/5     Running   0          34m
    csi-rbdplugin-xv8r9                               3/3     Running   0          34m
    rook-ceph-agent-jbv6z                             1/1     Running   0          34m
    rook-ceph-mgr-a-79c5cc6fd5-945f6                  1/1     Running   0          165m
    rook-ceph-mon-a-6bbf7d74ff-gsnml                  1/1     Running   0          33m
    rook-ceph-operator-86f766bbb8-g7vst               1/1     Running   0          34m
    rook-ceph-osd-1-7df7bd4bf7-s2trz                  1/1     Running   0          42m
    rook-ceph-rgw-rook-ceph-store-a-df4455ccf-gdj7k   1/1     Running   0          163m
    rook-ceph-tools-d5dc67475-4nrtb                   1/1     Running   0          34m
    rook-discover-2vzkp                               1/1     Running   0          34m
    
    Warning:  Pods that are 'Not' in 'Running' status
    NAME                                             READY   STATUS      RESTARTS   AGE
    
    Info:  checking placement group status
    [errno 13] error connecting to the cluster
    command terminated with exit code 13
    Warning:  
    
    Info:  checking if at least one mgr pod is running
    rook-ceph-mgr-a-79c5cc6fd5-945f6                  Running     camila-ubuntu-2204
    
    $ kubectl rook-ceph ceph status
    [errno 13] error connecting to the cluster
    command terminated with exit code 13
    

    Environment:

    • OS (e.g. from /etc/os-release): Ubuntu 22.04
    • Kernel (e.g. uname -a): 5.15.0-1025-gcp #32-Ubuntu SMP Wed Nov 23 21:46:01 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    • Cloud provider or hardware configuration:
    • Rook version (use rook version inside of a Rook Pod): 1.0.4
    • Storage backend version (e.g. for ceph do ceph -v):
    • Kubernetes version (use kubectl version): 1.19.16
    • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): vanilla / installed with kubeadm
    • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):
    [root@camila-ubuntu-2204 /]# ceph status
    [errno 13] error connecting to the cluster
    [root@camila-ubuntu-2204 /]# ceph osd status
    [errno 13] error connecting to the cluster
    [root@camila-ubuntu-2204 /]# ceph df
    [errno 13] error connecting to the cluster
    

    Additionally (it shows a bug fixed in the 1.2)

    By searching for the error I could find failed to configure local ceph cluster. using all devices in more than one namespace is not supported

    • It seems that was sorted out here https://github.com/rook/rook/pull/4692 ( issue https://github.com/rook/rook/issues/4633 )
    • But it was not backport to 1.1.9 https://github.com/rook/rook/blob/v1.1.9/pkg/operator/ceph/cluster/controller.go#L103
  • RBD Mirror Daemon not syncing images

    RBD Mirror Daemon not syncing images

    Is this a bug report or feature request?

    • Bug Report

    Deviation from expected behavior:

    Expected behavior: RBD mirror daemon should sync images.

    How to reproduce it (minimal and precise):

    • install helm operator
    • create cluster cr
    • create block pool
    • enable mirroring
    • create test image
    • enable image mirroring via volume replication

    I followed the documentation on how to setup the rbd mirroring between two ceph clusters. After creating the volume replication object for a test pvc, the mirror daemon outputs the following error:

    debug 2023-01-02T14:56:46.915+0000 7f0f929a26c0 0 rbd::mirror::PoolReplayer: 0x561b025b9800 init_rados: reverting global config option override: keyring: /etc/ceph/keyring-store/keyring -> /etc/ceph/96677e18-48e8-4fb8-8a5f-0dcc7d7f1eb9.client.rbd-mirror-peer.keyring,/etc/ceph/96677e18-48e8-4fb8-8a5f-0dcc7d7f1eb9.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin
    debug 2023-01-02T14:56:46.915+0000 7f0f929a26c0 0 rbd::mirror::PoolReplayer: 0x561b025b9800 init_rados: reverting global config option override: mon_host: [v2:10.0.0.24:3300,v1:10.0.0.24:6789],[v2:10.0.0.7:3300,v1:10.0.0.7:6789],[v2:10.0.0.3:3300,v1:10.0.0.3:6789],[v2:10.0.0.4:3300,v1:10.0.0.4:6789] ->
    debug 2023-01-02T14:56:46.915+0000 7f0f929a26c0 -1 Errors while parsing config file!
    debug 2023-01-02T14:56:46.915+0000 7f0f929a26c0 -1 can't open 96677e18-48e8-4fb8-8a5f-0dcc7d7f1eb9.conf: (2) No such file or directory
    debug 2023-01-02T14:56:47.183+0000 7f0f929a26c0 0 rbd::mirror::PoolReplayer: 0x561b03e9bb00 init_rados: reverting global config option override: keyring: /etc/ceph/keyring-store/keyring -> /etc/ceph/96677e18-48e8-4fb8-8a5f-0dcc7d7f1eb9.client.rbd-mirror-peer.keyring,/etc/ceph/96677e18-48e8-4fb8-8a5f-0dcc7d7f1eb9.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin
    debug 2023-01-02T14:56:47.183+0000 7f0f929a26c0 0 rbd::mirror::PoolReplayer: 0x561b03e9bb00 init_rados: reverting global config option override: mon_host: [v2:10.0.0.24:3300,v1:10.0.0.24:6789],[v2:10.0.0.7:3300,v1:10.0.0.7:6789],[v2:10.0.0.3:3300,v1:10.0.0.3:6789],[v2:10.0.0.4:3300,v1:10.0.0.4:6789] ->
    debug 2023-01-02T14:56:47.183+0000 7f0f929a26c0 -1 Errors while parsing config file!
    debug 2023-01-02T14:56:47.183+0000 7f0f929a26c0 -1 can't open 96677e18-48e8-4fb8-8a5f-0dcc7d7f1eb9.conf: (2) No such file or directory
    debug 2023-01-02T14:57:17.864+0000 7f0f7a509700 -1 librbd::managed_lock::GetLockerRequest: 0x561b05779110 handle_get_lockers: failed to retrieve lockers: (2) No such file or directory
    debug 2023-01-02T14:57:17.880+0000 7f0f79d08700 -1 librbd::managed_lock::GetLockerRequest: 0x561b05779490 handle_get_lockers: failed to retrieve lockers: (2) No such file or directory
    debug 2023-01-02T15:57:18.011010780+01:00 2023-01-02T14:57:18.004+0000 7f0f87d24700 -1 librbd::managed_lock::GetLockerRequest: 0x561b05779570 handle_get_lockers: failed to retrieve lockers: (2) No such file or directory2023-01-02T15:57:18.011021681+01:00
    debug 2023-01-02T15:57:18.129115001+01:00 2023-01-02T14:57:18.128+0000 7f0f87523700 -1 librbd::managed_lock::GetLockerRequest: 0x561b057795e0 handle_get_lockers: failed to retrieve lockers: (2) No such file or directory2023-01-02T15:57:18.129151050+01:00
    

    I have no idea where to problem is coming from. None of the sidecars of the csi provisioner is reporting any error. Any idea what might be the problem? I can enable the image mirroring via the ceph dashboard and the mirroring process seems to be allright.

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files

Website | Documentation | Download | Contributing | Changelog | Installation | Forum Rclone Rclone ("rsync for cloud storage") is a command-line progr

Jan 9, 2023
QingStor Object Storage service support for go-storage

go-services-qingstor QingStor Object Storage service support for go-storage. Install go get github.com/minhjh/go-service-qingstor/v3 Usage import ( "

Dec 13, 2021
Rook is an open source cloud-native storage orchestrator for Kubernetes

Rook is an open source cloud-native storage orchestrator for Kubernetes, providing the platform, framework, and support for a diverse set of storage solutions to natively integrate with cloud-native environments.

Oct 25, 2022
High Performance, Kubernetes Native Object Storage
High Performance, Kubernetes Native Object Storage

MinIO Quickstart Guide MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Ama

Jan 2, 2023
Cloud-Native distributed storage built on and for Kubernetes
Cloud-Native distributed storage built on and for Kubernetes

Longhorn Build Status Engine: Manager: Instance Manager: Share Manager: Backing Image Manager: UI: Test: Release Status Release Version Type 1.1 1.1.2

Jan 1, 2023
An encrypted object storage system with unlimited space backed by Telegram.

TGStore An encrypted object storage system with unlimited space backed by Telegram. Please only upload what you really need to upload, don't abuse any

Nov 28, 2022
Storj is building a decentralized cloud storage network
Storj is building a decentralized cloud storage network

Ongoing Storj v3 development. Decentralized cloud object storage that is affordable, easy to use, private, and secure.

Jan 8, 2023
tstorage is a lightweight local on-disk storage engine for time-series data
tstorage is a lightweight local on-disk storage engine for time-series data

tstorage is a lightweight local on-disk storage engine for time-series data with a straightforward API. Especially ingestion is massively opt

Jan 1, 2023
storage interface for local disk or AWS S3 (or Minio) platform

storage interface for local disk or AWS S3 (or Minio) platform

Apr 19, 2022
SFTPGo - Fully featured and highly configurable SFTP server with optional FTP/S and WebDAV support - S3, Google Cloud Storage, Azure Blob

SFTPGo - Fully featured and highly configurable SFTP server with optional FTP/S and WebDAV support - S3, Google Cloud Storage, Azure Blob

Jan 4, 2023
Terraform provider for the Minio object storage.

terraform-provider-minio A Terraform provider for Minio, a self-hosted object storage server that is compatible with S3. Check out the documenation on

Dec 1, 2022
A Redis-compatible server with PostgreSQL storage backend

postgredis A wild idea of having Redis-compatible server with PostgreSQL backend. Getting started As a binary: ./postgredis -addr=:6380 -db=postgres:/

Nov 8, 2021
CSI for S3 compatible SberCloud Object Storage Service

sbercloud-csi-obs CSI for S3 compatible SberCloud Object Storage Service This is a Container Storage Interface (CSI) for S3 (or S3 compatible) storage

Feb 17, 2022
Void is a zero storage cost large file sharing system.

void void is a zero storage cost large file sharing system. License Copyright © 2021 Changkun Ou. All rights reserved. Unauthorized using, copying, mo

Nov 22, 2021
This is a simple file storage server. User can upload file, delete file and list file on the server.
This is a simple file storage server.  User can upload file,  delete file and list file on the server.

Simple File Storage Server This is a simple file storage server. User can upload file, delete file and list file on the server. If you want to build a

Jan 19, 2022
Perkeep (née Camlistore) is your personal storage system for life: a way of storing, syncing, sharing, modelling and backing up content.

Perkeep is your personal storage system. It's a way to store, sync, share, import, model, and back up content. Keep your stuff for life. For more, see

Dec 26, 2022
s3git: git for Cloud Storage. Distributed Version Control for Data.
s3git: git for Cloud Storage. Distributed Version Control for Data.

s3git: git for Cloud Storage. Distributed Version Control for Data. Create decentralized and versioned repos that scale infinitely to 100s of millions of files. Clone huge PB-scale repos on your local SSD to make changes, commit and push back. Oh yeah, it dedupes too and offers directory versioning.

Dec 27, 2022
A High Performance Object Storage released under Apache License
A High Performance Object Storage released under Apache License

MinIO Quickstart Guide MinIO is a High Performance Object Storage released under Apache License v2.0. It is API compatible with Amazon S3 cloud storag

Sep 30, 2021
The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your container orchestrator

fortress-csi The Container Storage Interface (CSI) Driver for Fortress Block Storage This driver allows you to use Fortress Block Storage with your co

Jan 23, 2022