CLI tool for manipulating Ceph's upmap exception table.

Last update: Aug 10, 2022

Comments: 14

pgremapper

When working with Ceph clusters, there are actions that cause backfill (CRUSH map changes) and cases where you want to cause backfill (moving data between OSDs or hosts). Trying to manage backfill via CRUSH is difficult because changes to the CRUSH map cause many ancillary data movements that can be wasteful.

Additionally, controlling the amount of in-progress backfill is difficult, and having PGs in backfill_wait state has consequences:

Any PG performing recovery or backfill must obtain local and remote reservations.
A PG in a wait state may hold some of its necessary reservations, but not all. This may, in turn, block other recoveries or backfills that could otherwise make independent progress.
For EC pools, the source of a backfill read is likely not the primary, and this is not considered as a part of the reservation scheme. A single OSD could have any number of backfills reading from it; no knobs outside of recovery sleep can be used to mitigate this. Pacific's mclock scheduler should theoretically improve this situation.
There are no reservation slots held for recoveries, meaning that a recovery could be waiting behind another backfill (or several backfills if they stack in a wait state).

The primary control knob for backfills, osd-max-backfills, sets the number of local and remote reservations available on a given OSD. Given the above, this knob is not sufficient given the way that backfill can pile up in the face of a large-scale change; one sometimes has to set it unacceptably high to achieve backfill concurrency across many OSDs.

This tool, pgremapper, is intended to aid with all of the above usecases and problems. It operates by manipulating the pg-upmap exception table available in Luminous+ to override CRUSH decisions based on a number of algorithms exposed as commands, outlined below. Many of these commands are intended to be run in a loop in order to achieve some target state.

Acknowledgments

The initial version of this tool, which became the cancel-backfill command below, was heavily inspired by techniques developed by CERN IT.

Requirements

As mentioned above, the upmap exception table was introduced in Luminous, and this is a hard requirement for pgremapper. However, there were significant improvements to the upmap code throughout the Luminous series. When working with upmaps, it's recommended that you are running Luminous v12.2.13 (the last release), Mimic v13.2.7+, Nautilus v14.2.5+, or newer, at least on the mons/mgrs.

We've used pgremapper on a variety of versions of Luminous and Nautilus.

Caveats

When running older versions of Luminous or Mimic, it's possible for stale upmap entries that have no effect to accumulate. pgremapper can become confused by these stale entries and fail. See the Requirements section above for recommended versions.
If the system is still processing osdmaps and peering, pgremapper can become confused and fail for pretty much the same reason as above (upmap entries at the mon layer may not yet be reflected in current PG state).
Given a recent enough Ceph version, CRUSH cannot be violated by an upmap entry. This is good, but it can make certain manipulations impossible; consider a case where a backfill is swapping EC chunks between two racks. To the best of our knowledge today, no upmap entry can be created to counteract such a backfill, as Ceph will evaluate the correctness of the upmap entry in parts, rather than as a whole. (If you have evidence to the contrary or this is actually possible in newer versions of Ceph, let us know!)

Bug Reports

If you find a situation where pgremapper isn't working right, please file a report with a clear description of how pgremapper was invoked and any of its output, what the system was doing at the time, and output from the following Ceph commands:

ceph osd dump -f json
ceph osd tree -f json
ceph pg dump pgs_brief -f json
If a specific PG is named in pgremapper error output, then ceph pg <pgid> query -f json

Building

If you have a Go environment configured, you can use go get:

go get github.com/digitalocean/pgremapper

Otherwise, clone this repository and use a golang Docker container to build:

docker run -v $(pwd):/pgremapper -w /pgremapper golang:1.16.3 go build -o pgremapper .

Usage

pgremapper makes no changes by default and has some global options:

$ ./pgremapper [--concurrency <n>] [--yes] [--verbose] <command>

--concurrency: For commands that can be issued in parallel, this controls the concurrency. This is set at a reasonable default that generally doesn't lead to too much concurrent peering in the cluster when manipulating the pg-upmap table.
--yes: Apply changes instead of emitting the diff output that would show which changes would be applied.
--verbose: Display Ceph commands being run, for debugging purposes.

osdspec

For commands or options that take a list of OSDs, pgremapper uses the concept of an osdspec (inspired by Git's refspec) to simplify the command line. An osdspec can either be an OSD ID (e.g. 42) or a CRUSH bucket prefixed by bucket: (e.g. bucket:rack1 or bucket:host4). In the latter case, all OSDs found under that CRUSH bucket are included.

balance-bucket

This is essentially a small, targeted version of Ceph's own upmap balancer (though not as sophisticated - it doesn't prioritize undoing existing pg-upmap entries, for example), useful for cases where general enablement of the balancer either isn't possible or is undesirable. The given CRUSH bucket must directly contain OSDs.

$ ./pgremapper balance-bucket <bucket> [--max-backfills <n>] [--target-spread <n>]

<bucket>: A CRUSH bucket that directly contains OSDs.
--max-backfills: The total number of backfills that should be allowed to be scheduled that affect this CRUSH bucket. This takes pre-existing backfills into account.
--target-spread: The goal state in terms of the maximum difference in PG counts across OSDs in this bucket.

Example

Schedule 10 backfills on the host named data11, trying to achieve a maximum PG spread of 3 between the fullest and emptiest OSDs (in terms of PG counts) within that host:

$ ./pgremapper balance-bucket data11 --max-backfills 10 --target-spread 3

cancel-backfill

This command iterates the list of PGs in a backfill state, creating, modifying, or removing upmap exception table entries to point the PGs back to where they are located now (i.e. makes the up set the same as the acting set). This essentially reverts whatever decision led to this backfill (i.e. CRUSH change, OSD reweight, or another upmap entry) and leaves the Ceph cluster with no (or very little) remapped PGs (there are cases where Ceph disallows such remapping due to violation of CRUSH rules).

Notably, pgremapper knows how to reconstruct the acting set for a degraded backfill (provided that complete copies exist for all indexes of that acting set), which can allow one to convert a degraded+backfill{ing,_wait} into degraded+recover{y,_wait}, at the cost of losing whatever backfill progress has been made so far.

$ ./pgremapper cancel-backfill [--exclude-backfilling] [--include-osds <osdspec>,...] [--exclude-osds <osdspec>,...] [--pgs-including <osdspec>,...]

--exclude-backfilling: Constrain cancellation to PGs that are in a backfill_wait state, ignoring those in a backfilling state.
--include-osds: Cancel backfills containing one of the given OSDs as a backfill source or target only.
--exclude-osds: The inverse of --include-osds - cancel backfills that do not contain one of the given OSDs as a backfill source or target.
--pgs-including: Cancel backfills for PGs that include the given OSDs in their up or acting set, whether or not the given OSDs are backfill sources or targets in those PGs.

Example - Cancel all backfill in the system as a part of an augment

This is useful during augment scenarios, if you want to control PG movement to the new nodes via the upmap balancer (a technique based on this CERN talk.

# Make sure no data movement occurs when manipulating the CRUSH map.
$ ceph osd set nobackfill
$ ceph osd set norebalance

<perform augment CRUSH changes>

$ ./pgremapper cancel-backfill --yes

$ ceph osd unset norebalance
$ ceph osd unset nobackfill

<enable the upmap balancer to begin gentle data movements>

Example - Cancel backfill that has a CRUSH bucket as a source or target, but not backfill including specified OSDs

You may want to reduce backfill load on a given host so that only a few OSDs on that host make progress. This will cancel backfill where host data04 is a source or target, but not if OSD 21 or 34 is the source or target.

$ ./pgremapper cancel-backfill --include-osds bucket:data04 --exclude-osds 21,34

Example - Cancel backfill for any PGs that include a given host

Due to a failure, we know that data10 is going to need a bunch of recovery, so let's make sure that the recovery can happen without any backfills entering a degraded state:

$ ./pgremapper cancel-backfill --pgs-including bucket:data10

drain

Remap PGs off of the given source OSD, up to the given maximum number of scheduled backfills. No attempt is made to balance the fullness of the target OSDs; rather, the least busy target OSDs and PGs will be selected.

$ ./pgremapper drain <source OSD> --target-osds <osdspec>[,<osdspec>] [--allow-movement-across <bucket type>] [--max-backfill-reservations default_max[,osdspec:max]] [--max-source-backfills <n>]

<source OSD>: The OSD that will become the backfill source.
--target-osds: The OSD(s) that will become the backfill target(s).
--allow-movement-across: Constrain which type of data movements will be considered if target OSDs are given outside of the CRUSH bucket that contains the source OSD. For example, if your OSDs all live in a CRUSH bucket of type host, passing host here will allow remappings across hosts as long as the source and target host live within the same CRUSH bucket themselves. Target CRUSH buckets will be not be considered for a given PG if they already contain replicas/chunks of that PG. By default, if this option isn't given, data movements are allowed only within the direct CRUSH bucket containing the source OSD.
--max-backfill-reservations: Consume only the given reservation maximums for backfill. You'll commonly want to set this below your osd-max-backfills setting so that any scheduled recoveries may clear without waiting for a backfill to complete. A default value is specified first, and then per-osdspec values for cases where you want to allow more backfill or have non-uniform osd-max-backfills settings.
--max-source-backfills: Allow the source OSD to have this maximum number of backfills scheduled. TODO: This option works for EC systems, where the given OSD truly will be the backfill source; in replicated systems, the primary OSD is the source and thus source concurrency must be controlled via --max-backfill-reservations.

Example - Offload some PGs from one OSD to another

Schedule backfills to move 5 PGs from OSD 4 to OSD 21:

$ ./pgremapper drain 4 --target-osds 21 --max-source-backfills 5

Example - Move PGs off-host

Schedule backfills to move 8 PGs from OSD 15 to any combination of OSDs on host data12, ensuring we don't exceed 2 backfill reservations anywhere:

$ ./pgremapper drain 15 --target-osds bucket:data12 --allow-movement-across host --max-backfill-reservations 2 --max-source-backfills 8

export-mappings

Export all upmaps for the given OSD spec(s) in a json format usable by import-mappings. Useful for keeping the state of existing mappings to restore after destroying a number of OSDs, or any other CRUSH change that will cause upmap items to be cleaned up by the mons.

Note that the mappings exported will be just the portions of the upmap items pertaining to the selected OSDs (i.e. if a given OSD is the From or To of the mapping).

$ ./pgremapper export-mappings <osdspec> ... [--output <file>]

<osdspec> ...: The OSDs for which mappings will be exported.
--output: Write output to the given file path instead of stdout.

import-mappings

Import all upmaps from the given JSON input (probably from export-mappings) to the cluster. Input is stdin unless a file path is provided.

JSON format example, remapping PG 1.1 from OSD 100 to OSD 42:

[
  {
    "pgid": "1.1",
    "mapping": {
      "from": 100,
      "to": 42,
    }
  }
]

$ ./pgremapper import-mappings [<file>]

<file>: Read from the given file path instead of stdin.

remap

Modify the upmap exception table with the requested mapping. Like other subcommands, this takes into account any existing mappings for this PG, and is thus safer and more convenient to use than 'ceph osd pg-upmap-items' directly.

$ ./pgremapper remap <pg ID> <source osd ID> <target osd ID>

undo-upmaps

Given a list of OSDs, remove (or modify) upmap items such that the OSDs become the source (or target if --target is specified) of backfill operations (i.e. they are currently the "To" ("From") of the upmap items) up to the backfill limits specified. Backfill is spread across target and primary OSDs in a best-effort manner.

This is useful for cases where the upmap rebalancer won't do this for us, e.g., performing a swap-bucket where we want the source OSDs to totally drain (vs. balance with the rest of the cluster). It also achieves a much higher level of concurrency than the balancer generally will.

$ ./pgremapper undo-upmaps <osdspec>[,<osdspec>] [--max-backfill-reservations default_max[,osdspec:max]] [--max-source-backfills <n>] [--target]

--max-backfill-reservations: Consume only the given reservation maximums for backfill. You'll commonly want to set this below your osd-max-backfills setting so that any scheduled recoveries may clear without waiting for a backfill to complete. A default value is specified first, and then per-osdspec values for cases where you want to allow more backfill or have non-uniform osd-max-backfills settings.
--max-source-backfills: Allow a given source OSD to have this maximum number of backfills scheduled. TODO: This option works for EC systems, where the given OSD truly will be the backfill source; in replicated systems, the primary OSD is the source and thus source concurrency must be controlled via --max-backfill-reservations.
--target: The given list of OSDs should serve as backfill targets, rather than the default of backfill sources.

Example - Move PGs back after an OSD recreate

A common usecase is reformatting an OSD - we want to move all data off of that OSD to another, recreate the first OSD in the new format, and then move the data back. There was an example above of using drain to move data from OSD 4 to OSD 21; now let's start moving it back:

$ ./pgremapper undo-upmaps 21 --max-source-backfills 5

Or:

$ ./pgremapper undo-upmaps 4 --target --max-source-backfills 5

(Note that drain could be used for this as well, since it will happily remove upmap entries as needed.)

Example - Move PGs off of a host after a swap-bucket

Let's say you swapped data01 and data04, where data04 is an empty replacement for data01. You use cancel-backfill to revert the swap, and then can start scheduling backfill in controlled batches - 2 per source OSD, not exceeding 2 backfill reservations except for data04 where 3 backfill reservations are allowed (more target concurrency):

$ ./pgremapper undo-upmaps bucket:data01 --max-backfill-reservations 2,bucket:data04:3 --max-source-backfills 2

Development

Testing

Because pgremapper is stateless and should largely make the same decisions each run (modulo some randomization that occurs in remapping commands to ensure a level of fairness), the majority of testing can be done in unit tests that simulate Ceph responses. If you're trying to accomplish something specific while a cluster is in a certain state, the best option is to put a Ceph cluster in that state and capture relevant output from it for unit tests.

Owner

DigitalOcean

https://github.com/digitalocean/pgremapper

Comments

panic runtime error index out of range [1] with length 1
Background information: Since there are no releases, I'm am basing this issue on having tried to follow the build instructions. The result of the docker method and the result of the go method produced different sized binaries, but both seem to have the same issue.

Issue When running pgremapper with different commands (pgremapper cancel-backfill --verbose, pgremapper export-mappings --verbose 12, pgremapper drain 12 --verbose), the binary fails with very similar output:

** executing: ceph osd dump -f json ** executing: ceph pg dump pgs_brief -f json ** executing: ceph osd dump -f json panic: runtime error: index out of range [1] with length 1 goroutine 1 [running]: main.computeBackfillSrcsTgts(0xc00277b10) (export mapings) - or (0xc0025dc58) (cancel-backfill), (0xc00277c28) (drain) < trace into different modules >

I know very little about Go so I don't where to start troubleshooting the issue.

panic: pg 18.48: conflicting mapping(s) found when trying to map from 72 to 527

Hi! What I've done:

drained osd waited all active+clean did norebalance, nobackfill flags stopped this osd did ceph osd purge for this osd I got remapped pgs When I tried to do a cancell-backfill I got a panic:

./pgremapper cancel-backfill --verbose
'--yes' is not provided, running in report mode

** executing: ceph osd dump -f json
** executing: ceph pg dump pgs_brief -f json
** executing: ceph osd dump -f json
** executing: ceph pg dump pgs_brief -f json
** executing: ceph osd dump -f json
** executing: ceph osd dump -f json
panic: pg 18.48: conflicting mapping(s) found when trying to map from 72 to 527

goroutine 82 [running]:
main.(*mappingState).remap(0xc0000123f0, 0xc000e5d605, 0x5, 0x48, 0x20f)
	/pgremapper/mappingstate.go:76 +0x7a5
main.calcPgMappingsToUndoBackfill.func3(0xc00094a000, 0xc000012300, 0xc0000126c0, 0xc000182cc0, 0xc000948000, 0xc000948010, 0xc001240040)
	/pgremapper/main.go:703 +0x3d3
created by main.calcPgMappingsToUndoBackfill
	/pgremapper/main.go:646 +0x1a5

All dumps are needed in the attachment dump.zip

support for device-class

thank you for this very powerful tool!

do you think to add support for device-class?

currently balance-bucket on a host bucket is not working (imho) if the host has a mix of nvme and hdd osd's. two issues:

should not move pg's to another device class
should not move pg's from an almost empty osd to an almost full osd

cheers

ewceph1-prov01-prod:~ # ceph osd df tree name ewos2-ceph10-prod
ID  CLASS WEIGHT   REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS TYPE NAME                      
-52       56.63736        -  57 TiB  11 TiB  11 TiB  64 GiB   58 GiB  45 TiB 19.69 1.00   -                host ewos2-ceph10-prod 
182   hdd  3.65140  1.00000 3.7 TiB 3.0 TiB 3.0 TiB  24 KiB   14 GiB 693 GiB 81.46 4.14  54     up             osd.182            
191   hdd  3.65140  1.00000 3.7 TiB 2.5 TiB 2.5 TiB 510 KiB   12 GiB 1.1 TiB 68.57 3.48  43     up             osd.191            
200   hdd  3.65140  1.00000 3.7 TiB 2.2 TiB 2.2 TiB  63 KiB   12 GiB 1.4 TiB 61.22 3.11  43     up             osd.200            
208   hdd  3.65140  1.00000 3.7 TiB 2.0 TiB 1.9 TiB  32 KiB   10 GiB 1.7 TiB 53.77 2.73  39     up             osd.208            
216  nvme  4.67020  1.00000 4.7 TiB 181 GiB 151 GiB  19 GiB  1.5 GiB 4.5 TiB  3.79 0.19 141     up             osd.216            
226  nvme  4.67020  1.00000 4.7 TiB 181 GiB 146 GiB  24 GiB  1.0 GiB 4.5 TiB  3.78 0.19 136     up             osd.226            
232  nvme  4.67020  1.00000 4.7 TiB 174 GiB 145 GiB  18 GiB  1.1 GiB 4.5 TiB  3.63 0.18 138     up             osd.232            
241  nvme  4.67020  1.00000 4.7 TiB 162 GiB 148 GiB  23 KiB 1024 MiB 4.5 TiB  3.38 0.17 138     up             osd.241            
250  nvme  4.67020  1.00000 4.7 TiB 162 GiB 147 GiB 1.3 GiB  979 MiB 4.5 TiB  3.38 0.17 138     up             osd.250            
259  nvme  4.67020  1.00000 4.7 TiB 159 GiB 145 GiB 521 MiB  1.1 GiB 4.5 TiB  3.32 0.17 137     up             osd.259            
268  nvme  4.67020  1.00000 4.7 TiB 161 GiB 147 GiB 647 MiB  1.2 GiB 4.5 TiB  3.36 0.17 136     up             osd.268            
277  nvme  4.67020  1.00000 4.7 TiB 166 GiB 152 GiB 617 MiB  1.1 GiB 4.5 TiB  3.47 0.18 139     up             osd.277            
286  nvme  4.67020  1.00000 4.7 TiB 164 GiB 150 GiB  51 KiB 1024 MiB 4.5 TiB  3.44 0.17 141     up             osd.286            
                      TOTAL  57 TiB  11 TiB  11 TiB  64 GiB   58 GiB  45 TiB 19.69                                                
MIN/MAX VAR: 0.17/4.14  STDDEV: 29.68
ewceph1-prov01-prod:~ # /root/go/bin/pgremapper balance-bucket ewos2-ceph10-prod --max-backfills 10 --target-spread 1
The following changes would be made to the upmap exception table:
pg 50.292: [256->221,+216->208]
pg 50.296: [+286->208]
pg 50.299: [+232->208]
pg 50.29d: [+216->208]
pg 50.29f: [+277->208]
pg 50.2a7: [+286->191]
pg 50.2a8: [235->217,213->249,+216->200]
pg 50.2ab: [154->151,+286->208]
pg 50.2b2: [213->285,+216->200]
pg 50.2ba: [+286->191]
Legend: +new mapping - -removed mapping - !stale mapping (will be removed) - kept mapping

No changes made - use --yes to apply changes.

panic: pg 1.38f: conflicting mapping(s) found when trying to map from 66 to 19

When try to verify #4 I found a panic for pgremapper :D

# ./pgremapper cancel-backfill --verbose
** executing: ceph osd dump -f json
** executing: ceph pg dump pgs_brief -f json
** executing: ceph pg dump pgs_brief -f json
panic: pg 1.38f: conflicting mapping(s) found when trying to map from 66 to 19

goroutine 86 [running]:
main.(*mappingState).remap(0xc000012030, 0xc0004ac354, 0x5, 0x42, 0x13)
        /pgremapper/mappingstate.go:76 +0x7a5
main.calcPgMappingsToUndoBackfill.func3(0xc00010e0c0, 0x5abe00, 0xc00011acc0, 0xc0000b4010, 0xc0000b4020, 0xc0000db5f4)
        /pgremapper/main.go:690 +0x373
created by main.calcPgMappingsToUndoBackfill
        /pgremapper/main.go:633 +0x185

All outputs are attached Archive.zip

drain for source osdspec idea
What we have:

full upmap balanced cluster

need to redeploy OSD host (full host): out -> purge -> deploy -> fill

pgremapper drain can drain only OSD's, not osdspec

Can we achieve this currently:

ceph osd crush reweight-subtree 0

for e 'in ceph osd df tree name host1.example.com | grep osd\. | awk '{ print $1}'' ; do ceph osd out osd.${e} ; done

balance:

ceph osd getmap -o osd.map osdmaptool osd.map \ --upmap-deviation 1 \ --upmap-max 10000 \ --upmap upmap.sh | sh

How we look the flow - one commad

pgremapper drain bucket:host1.example.com

This issued (may be not all steps, can be scripted):

ceph osd crush reweight-subtree 0

ceph osd out for this subtree

make upmaps to another buckets to respect crush rule

What do you think?
Added device-class filter for balance-bucket command

Added device-class filter for support balancing buckets contains multiple device-classes, should resolves #19

Balance rack without device class:

Balance rack with device class:

Balance rack with another device class:

Test TestDeviceClassFilter added also
Have cancel-backfill undo conflicting upmaps

I've been using pgremapper primarily for the purpose of being able to add more storage, new disks in existing hosts and/or new hosts, without causing a huge reshuffling.

My problem is that when I run cancel-backfill, I get a panic - conflicting mapping(s) found when trying to map from X to Y. My purpose of running cancel-backfill is to just replace all the remappings in the crush map with remappings keeping all the data where it is, so that the balancer can gradually move everything to where it should be (the CERN way :))

As such, my workaround is to run undo-upmaps Y --max-backfill-reservations 9999 --max-source-backfills 9999 --yes, then rerun cancel-backfill, get a new panic and repeat the undo-upmaps manually for, sometimes, several hours.

I actually have watch cancel-backfill --yes --concurrency 25 running in one window and a small script in another window letting me just type in the osd to undo-upmaps. With every undo-upmaps I get more and more misplaced pgs. When all the conflicting mappings have finally been found and undone, the watch cancel-backfill command will cancel everything and leave all the pgs where they are. Then I can run ceph osd unset norebalance&&ceph osd unset nobackfill&&ceph balancer on and watch how my cluster gently rebalances.

Maybe I'm just totally missing something, but what I would like is for pgremapper to automatically undo the conflicting upmaps when running cancel-backfill or at least take those into consideration when generating the new mappings. It would really cut down on the time I spend manually undoing upmaps!

proposal: report mode on dry-run by default

Hi,

ceph-volume lvm batch command, when --yes is not defined will output brief --repot mode by default. This is very useful on scripting, for example:

#!/bin/bash

CHECK=$1

if [[ ${CHECK} != "--yes" ]]
  then
    echo -e "\n'--yes' is not provided, running in report mode\n"
    MODE="--report"
  else
    MODE="${CHECK}"
fi

ceph-volume lvm batch --no-auto --crush-device-class=nvme_ms ${MODE} \
  /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 \
  /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 \
  /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 \
  /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1 \
  /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1

When Ceph operator want a report - it's just run the script, if brief_check_dry-run_report mode is okay, just add --yes to script will deploy OSD host

I think it will be useful for pgremapper too, instead y/n dialog without --yes

Thanks

Update go.mod,

There is a bug in x/sys that says files are writeable when they are not.

https://osv.dev/vulnerability/GO-2022-0493 has more info on it. I have tested it builds, but not more than that.

min_compat_client warning is not handled by pgremapper

This panic:

# ./upmap_cancel_backfill.sh --yes
** executing: ceph osd dump -f json
** executing: ceph pg dump pgs_brief -f json
** executing: ceph osd dump -f json
** executing: ceph pg dump pgs_brief -f json
** executing: ceph osd dump -f json
** executing: ceph osd dump -f json
** executing: ceph osd pg-upmap-items 3.0 39 7 6 37 66 48
** executing: ceph osd pg-upmap-items 3.1 6 88
** executing: ceph osd pg-upmap-items 3.101 89 90 123 124
** executing: ceph osd pg-upmap-items 3.10 130 84 93 134
** executing: ceph osd pg-upmap-items 3.100 35 19 4 49 143 84
panic: failed to execute command: ceph osd pg-upmap-items 3.1 6 88: exit status 1

goroutine 66 [running]:
main.runOrDie(0xc0001e2000, 0x6, 0x8, 0x4, 0x6)
        /pgremapper/main.go:993 +0x117
main.(*pgUpmapItem).do(0xc000458fa0)
        /pgremapper/ceph.go:259 +0x2c5
main.(*mappingState).apply.func1(0xc000547d40, 0xc000556b50)
        /pgremapper/mappingstate.go:200 +0x3f
created by main.(*mappingState).apply
        /pgremapper/mappingstate.go:198 +0xaa

Actual issue is:

# ceph osd pg-upmap-items 3.1 6 88
Error EPERM: min_compat_client jewel < luminous, which is required for pg-upmap. Try 'ceph osd set-require-min-compat-client luminous' before using the new interface

It will be good if error stderr is handled and printed instead panic

pgremapper: Consider upmaps when reordering up sets.

Consider the following upmap entry:

        {
            "pgid": "1.38f",
            "mappings": [
                {
                    "from": 51,
                    "to": 3
                },
                {
                    "from": 19,
                    "to": 11
                }
            ]
        },

And the following PG state:

        {
            "pgid": "1.38f",
            "state": "active+remapped+backfill_wait",
            "up": [
                3,
                66,
                11
            ],
            "acting": [
                3,
                19,
                39
            ],
            "up_primary": 3,
            "acting_primary": 3
        },

Prior to this change, pgremapper would attempt to remap 66->19, which would conflict with the existing mapping 19->11. After this change, pgremapper is wise enough to use the upmap item to realize that 19 and 11 are paired and re-order the up set accordingly for proper processing.

panic - multiple complete shards at index
Trying to cancel-backfill on a single OSD:

$ pgmapper cancel-backfill --include-osds 100 panic: PGID: multiple complete shards at index 15

Not sure how to address that, any suggestions?
cancel-backfill distinguish between source and target

It would be nice if cancel-backfill could distinguish between source and target, e.g., for a nearly full OSD cancel backfill for which it is a target but not a source.
drain host is not filling target OSDs evenly

First: thanks for the tool. It's really useful. We use it to drain one host at a time (with pgremapper looping over its OSDs). We noticed however that the target OSDs are not evenly filled up (from a PG count view, or usage for that matter). This in-balance might get really big (i.e. more that 40 PGs). We can straighten that out mid-process by stopping new remaps and use the "balance-host" option. But it's a waste of time. It would be better to get those PGs on the right OSD the first time.

I have taken a look at the code and if I understand correctly the decision to make what OSD is used as the target is handled in function "calcPgMappingsToUndoUpmaps". It does not seem to take into account how many PGs are already on the OSD. Is that correct? Or is there some heuristic that does take this into account?

Note: we do not clear any upmaps on the target host before running pgremapper. Might this influence mapping decisions?

Related tags

Command Line pgremapper

CLI tool for manipulating GitHub Labels across multiple repositories

takolabel Installation Mac $ brew install tommy6073/tap/takolabel Other platforms Download from Releases page in this repository. Usage Set variables

Nov 3, 2022

jt is a CLI tool for viewing and manipulating JIRA issues.

jt - jira-tool jt is a CLI tool for viewing and manipulating JIRA issues. One common example usage to transition an issue to a new status: jt "In Prog

Aug 9, 2022

ASCII table in golang

ASCII Table Writer Generate ASCII table on the fly ... Installation is simple as go get github.com/olekukonko/tablewriter Features Automatic Padding

Jan 1, 2023

Nebula Diagnosis CLI Tool is an information diagnosis cli tool for the nebula service and the node to which the service belongs.

Jan 12, 2022

Symfony-cli - The Symfony CLI tool For Golang

Symfony CLI Install To install Symfony CLI, please download the appropriate vers

Dec 28, 2022

bcrypt-cli is the CLI tool for hashing passwords with bcrypt.

bcrypt-cli bcrypt-cli is the CLI tool for hashing passwords with bcrypt. Install go install github.com/ryicoh/bcrypt-cli Usage It can be used like bas

Jan 9, 2023

GTDF-CLI - The official CLI tool to operate with Getting Things Done Framework

This is the official CLI tool to operate with Getting Things Done Framework. How

Feb 14, 2022

This tool is a CLI-interactive tool for TA who use eeclass platform

NTHU eeclass TA helper. This tool is a CLI-interactive tool for TA who use eeclass platform. It helps TA to download all the submitted homework, and use CSV to record the score and comment, and upload CSV score directly to the eeclass platform with just 2 Enter key!

Dec 11, 2021

Elegant CLI wrapper for kubeseal CLI

Overview This is a wrapper CLI ofkubeseal CLI, specifically the raw mode. If you just need to encrypt your secret on RAW mode, this CLI will be the ea

Jan 8, 2022

CLI to run a docker image with R. CLI built using cobra library in go.

BlueBeak Installation Guide Task 1: Building the CLI The directory structure looks like Fastest process: 1)cd into bbtools 2)cd into bbtools/bin 3)I h

Dec 20, 2021

A wrapper of aliyun-cli subcommand alidns, run aliyun-cli in Declarative mode.

aliyun-dns A wrapper of aliyun-cli subcommand alidns, run aliyun-cli in Declarative mode. Installation Install aliyun-cli. Usage $ aliyun-dns -h A wra

Dec 21, 2021

Go-file-downloader-ftctl - A file downloader cli built using golang. Makes use of cobra for building the cli and go concurrent feature to download files.

ftctl This is a file downloader cli written in Golang which uses the concurrent feature of go to download files. The cli is built using cobra. How to

Jan 2, 2022

Go-file-downloader-ftctl - A file downloader cli built using golang. Makes use of cobra for building the cli and go concurrent feature to download files

ftctl This is a file downloader cli written in Golang which uses the concurrent