The cluster fails to run after the upgrade. These are the states of nodes:
- node 0 - runs successfully
- node 1 - panics
- node 2 - panics
This is the log from one of the failing nodes:
time="2022-12-05 10:03:46" level=info msg="Liftbridge Version: v1.9.0"
time="2022-12-05 10:03:46" level=info msg="Server ID: cluster-liftbridge-2"
time="2022-12-05 10:03:46" level=info msg="Namespace: liftbridge-default"
time="2022-12-05 10:03:46" level=info msg="NATS Servers: [nats://nats_client:xxx@nats-client:4222]"
time="2022-12-05 10:03:46" level=info msg="Default Retention Policy: [Age: 1 week, Compact: false]"
time="2022-12-05 10:03:46" level=info msg="Default Partition Pausing: disabled"
time="2022-12-05 10:03:46" level=info msg="Starting Liftbridge server on 0.0.0.0:9292..."
time="2022-12-05 10:03:46" level=debug msg="fsm: Restoring Raft state from snapshot..."
time="2022-12-05 10:03:46" level=debug msg="Server becoming leader for partition [subject=ems.ble.omm, stream=ems.ble.omm, partition=0], epoch: 81585"
time="2022-12-05 10:03:46" level=warning msg="Received log leader epoch assignment for an epoch < latest epoch. This implies messages have arrived out of order. New: {epoch:81585, offset:6596311}, Previous: {epoch:83904, offset:6592917} for log [subject=ems.ble.omm, stream=ems.ble.omm, partition=0]"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xe972fb]
goroutine 1 [running]:
github.com/liftbridge-io/liftbridge/server.(*partition).startReplicating(0xc000250d00, 0x13eb1, 0xc00021a120)
/workspace/server/partition.go:1311 +0x45b
github.com/liftbridge-io/liftbridge/server.(*partition).becomeLeader(0xc000250d00, 0x13eb1)
/workspace/server/partition.go:824 +0x245
github.com/liftbridge-io/liftbridge/server.(*partition).startLeadingOrFollowing(0xc000250d00)
/workspace/server/partition.go:749 +0x2de
github.com/liftbridge-io/liftbridge/server.(*partition).SetLeader(0xc000250d00, {0xc000208318, 0x14}, 0x13eb1)
/workspace/server/partition.go:721 +0x133
Reverting to v1.8.0 restores the cluster to a healthy state.