High-Performance server for NATS, the cloud native messaging system.

NATS - The Cloud Native Messaging System

Last update: Jan 8, 2023

Comments: 16

NATS is a simple, secure and performant communications system for digital systems, services and devices. NATS is part of the Cloud Native Computing Foundation (CNCF). NATS has over 40 client language implementations, and its server can run on-premise, in the cloud, at the edge, and even on a Raspberry Pi. NATS can secure and simplify design and operation of modern distributed systems.

Documentation

Official documentation
FAQ
Watch a video overview of NATS to learn more about its origin story and design philosophy.

Contact

Twitter: Follow us on Twitter!
Google Groups: Where you can ask questions
Slack: Click here to join. You can ask question to our maintainers and to the rich and active community.

Contributing

If you are interested in contributing to NATS, read about our...

Security

Security Audit

A third party security audit was performed by Cure53, you can see the full report here.

Reporting Security Vulnerabilities

If you've found a vulnerability or a potential vulnerability in the NATS server, please let us know at nats-security.

License

Unless otherwise noted, the NATS source files are distributed under the Apache Version 2.0 license found in the LICENSE file.

Owner

NATS - The Cloud Native Messaging System

NATS is a simple, secure and performant communications system for digital systems, services and devices.

https://github.com/nats-io/gnatsd https://nats.io

Comments

subscription count in subsz is wrong
SInce updating one of my brokers to 2.0.0 I noticed a slow increate in subscription counts - I also did a bunch of other updates like move to the newly renamed libraries etc so in order to find the cause I eventually concluded the server is just counting things wrongly.

Ignoring the annoying popup, you can see a steady increase in subscriptions.

Data below is from the below dependency embedded in another go process:

github.com/nats-io/nats-server/v2 v2.0.1-0.20190701212751-a171864ae7df

$ curl -s http://localhost:6165/varz|jq .subscriptions 29256

I then tried to verify this number, and assuming I have no bugs in the script below I think the varz counter is off by a lot, comparing snapshots of connz over time I see no growth reflected there not in connection counts nor subscriptions:

$ curl "http://localhost:6165/connz?limit=200000&subs=1"|./countsubs.rb Connections: 3659 Subscriptions: 25477

I also captured connz output over time 15:17, 15:56 and 10:07 the next day:

$ cat connz-1562685506.json|./countsubs.rb Connections: 3657 Subscriptions: 25463 $ cat connz-1562687791.json|./countsubs.rb Connections: 3658 Subscriptions: 25463 $ cat connz-1562687791.json|./countsubs.rb Connections: 3658 Subscriptions: 25463

Using the script here:

require "json" subs = JSON.parse(STDIN.read) puts "Connections: %d" % subs["connections"].length count = 0 subs["connections"].each do |conn| count += subs.length if subs = conn["subscriptions_list"] end puts "Subscriptions: %d" % count
Performance issues with locks and sublist cache
[ ] Defect

[x] Feature Request or Change Proposal

Feature Requests

Use Case:

We are using gnatsd 1.4.1 (compiled go 1.11.5). During benchmark, we observed non-trivial latency (500 ms+, usually seconds) at gnatsd cluster.

As there is no slow consumers (with default 2 seconds threshold) and the OS rcv buffer got full and TCP window went to 0, it seems that the gnatsd server is somehow slow in read loop. We are trying to slow down the sender for one connection but we believe that gnatsd can also be improved. If you need more proofs of slowness of read loop, we might be able to provide some tcpdump snippets and tracing logs of gnatsd.

We also observe some parser errors happened rarely when gnatsd is under high load of reading. The client is using cnats. However we are not sure who (cnats, OS, or gnatsd) was not doing right. After we found it out, we may open another issue to address the problem.

[8354] 2019/04/01 12:17:11.695815 [ERR] 10.228.255.129:44588 - cid:1253 - Client parser ERROR, state=0, i=302: proto='"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"...'

By the way, as gnatsd could detect slow consumer, is that possible for gnatsd to know itself becomes a slow consumer (slow read)? The only idea I come up is to adjust OS buffer and let the upstream to know the pressure. If you have any suggestions, please let me know.

Proposed Change:

Improve locks. https://github.com/nats-io/gnatsd/compare/branch_1_4_0...azrle:enhance/processMsg_lock Comparison of read loops between high load and low load: Sync blocking graph:

Ability to adjust sublist cache size or disable it. https://github.com/nats-io/gnatsd/compare/branch_1_4_0...azrle:feature/opts-sublist_cache_size According to our application characteristic, it doing sub/unsub very frequently and most of subjects are single-used. The hit rate of cache is under 0.5%. However, it can cost gnatsd to maintenance the sublist cache. Besides locks for the cache, reduceCacheCount is noticeable. Compared to other function's goroutines which are less than 50, the number of goroutines for server.(*Sublist).reduceCacheCount can climb up to near 18,000.

Who Benefits From The Change(s)?

Clients send messages heavily to gnatsd. And subscription changes frequently. Under our test cases (with enough servers), the 99.9%tile of latency drops from 1500ms to 500ms (it's still slow though).

I noticed that gnatsd v2 is coming. And the implementation changes a lot. But I am afraid that we may not have time to wait for it to get production-ready.

I sincerely hope the performance can be improved for v1.4.

Thank you in advance!

Consumer stopped working after errPartialCache (nats-server oom-killed)

Defect

Make sure that these boxes are checked before submitting your issue -- thank you!

[x] Included nats-server -DV output
[ ] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

Versions of `nats-server` and affected client libraries used:

# nats-server -DV
[92] 2021/12/06 15:16:05.235349 [INF] Starting nats-server
[92] 2021/12/06 15:16:05.235397 [INF]   Version:  2.6.6
[92] 2021/12/06 15:16:05.235401 [INF]   Git:      [878afad]
[92] 2021/12/06 15:16:05.235406 [DBG]   Go build: go1.16.10
[92] 2021/12/06 15:16:05.235416 [INF]   Name:     NASX72BQAFBIH4QBLZ36RADTPKSO6LCKRDEAS37XRJ7SYZ53RYYOFHHS
[92] 2021/12/06 15:16:05.235436 [INF]   ID:       NASX72BQAFBIH4QBLZ36RADTPKSO6LCKRDEAS37XRJ7SYZ53RYYOFHHS
[92] 2021/12/06 15:16:05.235457 [DBG] Created system account: "$SYS"

Image:         nats:2.6.6-alpine
    Limits:
      cpu:     200m
      memory:  256Mi
    Requests:
      cpu:      200m
      memory:   256Mi

go library:

github.com/nats-io/nats.go v1.13.1-0.20211018182449-f2416a8b1483

OS/Container environment:

Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.4", GitCommit:"b695d79d4f967c403a96986f1750a35eb75e75f1", GitTreeState:"clean", BuildDate:"2021-11-17T15:42:41Z", GoVersion:"go1.16.10", Compiler:"gc", Platform:"linux/amd64"}

CONTAINER-RUNTIME
cri-o://1.21.4

Steps or code to reproduce the issue:

Start nats cluster (3 replicas) with Jetstream enabled. JS Config:

jetstream {
  max_mem: 64Mi
  store_dir: /data

  max_file:10Gi
}

Start to push messages into stream. Stream config:

Configuration:

             Subjects: widget-request-collector
     Acknowledgements: true
            Retention: File - WorkQueue
             Replicas: 3
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false
     Maximum Messages: unlimited
        Maximum Bytes: 1.9 GiB
          Maximum Age: 1d0h0m0s
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Shutdown one of the nats nodes for a while and rate limit consumer (or shutdown consumer) for collecting messages in file storage.
Wait until storage reached it's maximum capacity (1.9G).
Bring up nats server. (Do not bring up consumer)

Expected result:

Outdated node should become current.

Actual result:

Outdated node tries to become current, gets messages from stream leader, but reached memory limit and killed by OOM. It restarts again, and again killed by OOM.

Cluster Information:

                 Name: nats
               Leader: promo-widget-collector-event-nats-2
              Replica: promo-widget-collector-event-nats-1, outdated, OFFLINE, seen 2m8s ago, 13,634 operations behind
              Replica: promo-widget-collector-event-nats-0, current, seen 0.00s ago

State:

             Messages: 2,695,412
                Bytes: 1.9 GiB
             FirstSeq: 3,957,219 @ 2021-12-06T14:04:00 UTC
              LastSeq: 6,652,630 @ 2021-12-06T15:09:36 UTC
     Active Consumers: 1

Crashed pod info:

    State:          Waiting                                                                                                                                                                                                                                                                                                                                                                                                              
      Reason:       CrashLoopBackOff                                                                                                                                                                                                                                                                                                                                                                                                     
    Last State:     Terminated                                                                                                                                                                                                                                                                                                                                                                                                           
      Reason:       OOMKilled                                                                                                                                                                                                                                                                                                                                                                                                            
      Exit Code:    137                                                                                                                                                                                                                                                                                                                                                                                                                  
      Started:      Mon, 06 Dec 2021 14:30:26 +0000                                                                                                                                                                                                                                                                                                                                                                                      
      Finished:     Mon, 06 Dec 2021 14:31:08 +0000                                                                                                                                                                                                                                                                                                                                                                                      
    Ready:          False                                                                                                                                                                                                                                                                                                                                                                                                                
    Restart Count:  3

Is it possible to configure memory limits for nats-server to prevent memory overeating?

jetstream could not pull message after nats-server restart
i was testing jetstream on nats-server v2.3.2. one sender and one receiver program are running for quite a long time.

this is what my stream look like :

_, err = js.AddStream(&nats.StreamConfig{ Name: streamName, Subjects: []string{streamSubjects}, Storage: nats.FileStorage, Replicas: 3, Retention: nats.WorkQueuePolicy, Discard: nats.DiscardNew, MaxMsgs: -1, MaxAge: time.Hour * 24 * 365, })

this is how i create the consumer:

if _, err := js.AddConsumer(streamName, &nats.ConsumerConfig{ Durable: durableName, DeliverPolicy: nats.DeliverAllPolicy, AckPolicy: nats.AckExplicitPolicy, ReplayPolicy: nats.ReplayInstantPolicy, FilterSubject: subjectName, AckWait: time.Second * 30, MaxDeliver: -1, MaxAckPending: 1000, }); err != nil && !strings.Contains(err.Error(), "already in use") { log.Println("AddConsumer fail") return }

this is what the subscriber look like:

sub, err := js.PullSubscribe("ORDERS.created", durableName, nats.Bind("ORDERS", durableName)) if err != nil { fmt.Println(" PullSubscribe:", err) return } msgs, err := sub.Fetch(1000, nats.MaxWait(10*time.Second))

when i restart my nats-server cluster nodes(upgrade to nats-server 2.3.3), the consumer can no longer pull messages even if i restart my consumer program. the Fetch call just return : "nats: timeout", but i'm sure there are lots of message in the working queue. only if i delete the consumer by calling js.DeleteConsumer(streamName, durableName), recreate it, my program can resume fetching messages. actually, every time i restart nats-server nodes, my consumer program encouter the same problem.

there is another issue, after i restart nats-server nodes, restart my program, it sometimes reports : "PullSubscribe: nats: JetStream system temporarily unavailable"

I expect nats-server nodes restarting action not impacting jetstream clients fetching messages.
Client Auth API

Nats seems perfect for our needs, however having auth hard coded on service start isn't very practical when we are adding and removing users while its running.

Implementing some go code to handle this is 1 option, another is to use an external service for authorization. Whether it's HTTP basic auth, etc. Being able to set an authentication endpoint would be very handy. Especially since we only allow a user to be logged in with 1 session.

If this is possible now please let me know, but I couldn't find it in the docs anywhere.

Thanks!

memory increase in clustered mode

This is a follow on from https://github.com/nats-io/nats-server/issues/1065

While looking into the above issue I noticed memory growth, we wanted to focus on one issue at a time so with 1065 done I looked at the memory situation. The usage patterns and so forth is identical to 1065.

12 hours

Above is 12 hours, now as you know I embed your broker into one of my apps and I run a bunch of things in there. However in order to isolate the problem I did a few things:

Same version of everything with the same usage pattern on a single unclustered broker does not show memory growth
Turning off all the related feature in my code where I embed nats-server when clustered I still see the growth
I made my code respond to SIGQUIT to write memory profiles on demand so I can interrogate a running nats server

The nats-server is github.com/nats-io/nats-server/v2 v2.0.3-0.20190723153225-9cf534bc5e97

From above memory dumps when comparing 6 hours apart dumps I see:

8am:

(pprof) top10
Showing nodes accounting for 161.44MB, 90.17% of 179.04MB total
Dropped 66 nodes (cum <= 0.90MB)
Showing top 10 nodes out of 51
      flat  flat%   sum%        cum   cum%
   73.82MB 41.23% 41.23%    73.82MB 41.23%  github.com/nats-io/nats-server/v2/server.(*client).queueOutbound
   29.18MB 16.30% 57.53%    29.68MB 16.58%  github.com/nats-io/nats-server/v2/server.(*Server).createClient
   19.60MB 10.95% 68.48%    19.60MB 10.95%  math/rand.NewSource
   15.08MB  8.42% 76.90%   140.30MB 78.37%  github.com/nats-io/nats-server/v2/server.(*client).readLoop
    6.50MB  3.63% 80.53%       12MB  6.70%  github.com/nats-io/nats-server/v2/server.(*client).processSub
    5.25MB  2.93% 83.46%    11.25MB  6.28%  github.com/nats-io/nats-server/v2/server.(*Sublist).Insert
    4.01MB  2.24% 85.70%    65.85MB 36.78%  github.com/nats-io/nats-server/v2/server.(*client).processInboundClientMsg
    3.50MB  1.95% 87.65%     3.50MB  1.95%  github.com/nats-io/nats-server/v2/server.newLevel
    2.50MB  1.40% 89.05%     2.50MB  1.40%  github.com/nats-io/nats-server/v2/server.newNode
       2MB  1.12% 90.17%        2MB  1.12%  github.com/nats-io/nats-server/v2/server.(*client).addSubToRouteTargets

1pm

(pprof) top10
Showing nodes accounting for 185.64MB, 90.87% of 204.29MB total
Dropped 69 nodes (cum <= 1.02MB)
Showing top 10 nodes out of 46
      flat  flat%   sum%        cum   cum%
   86.33MB 42.26% 42.26%    86.33MB 42.26%  github.com/nats-io/nats-server/v2/server.(*client).queueOutbound
   30.19MB 14.78% 57.04%    30.69MB 15.02%  github.com/nats-io/nats-server/v2/server.(*Server).createClient
   25.75MB 12.60% 69.64%   165.05MB 80.79%  github.com/nats-io/nats-server/v2/server.(*client).readLoop
   19.60MB  9.59% 79.24%    19.60MB  9.59%  math/rand.NewSource
    6.50MB  3.18% 82.42%    12.55MB  6.14%  github.com/nats-io/nats-server/v2/server.(*client).processSub
    5.25MB  2.57% 84.99%    11.25MB  5.51%  github.com/nats-io/nats-server/v2/server.(*Sublist).Insert
    4.02MB  1.97% 86.95%    73.70MB 36.08%  github.com/nats-io/nats-server/v2/server.(*client).processInboundClientMsg
    3.50MB  1.71% 88.67%     3.50MB  1.71%  github.com/nats-io/nats-server/v2/server.newLevel
    2.50MB  1.22% 89.89%     2.50MB  1.22%  github.com/nats-io/nats-server/v2/server.newNode
       2MB  0.98% 90.87%        2MB  0.98%  github.com/nats-io/nats-server/v2/server.(*client).addSubToRouteTargets

Suggest repair actions for JetStream cluster consumer NO quorum issue

Environment

NATS version: 2.2.6 with jetstream enabled
Number of nodes nodes in the cluster : 3
Deploy on OKD 3.11 by nats helm chart 0.8.0

Event description

Getting jetstream stream info successfully, but failed on getting jetstream consumer info by natscli
[Pub] OK, [Sub] Failed. NATS sub client can't connect to NATS cluster after 7/7 00:18
The cluster has been running for more than a month, and there were no errors until 7/7. It was confirmed that there were no network or hardware problems.
Attached logs and tried actions, please suggest other repair actions. Thanks.

NATS server logs

nats instance 0

[1] 2021/07/07 00:18:44.650787 [WRN] JetStream cluster stream '$G > MY-STREAM2' has NO quorum, stalled.
[1] 2021/07/07 00:18:44.651098 [WRN] JetStream cluster consumer '$G > MY-STREAM2 > consumer5' has NO quorum, stalled.
[1] 2021/07/07 00:18:47.433327 [INF] JetStream cluster new metadata leader
[1] 2021/07/07 00:18:47.930284 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM2 > consumer5'
[1] 2021/07/07 00:18:51.306199 [WRN] JetStream cluster stream '$G > MY-STREAM' has NO quorum, stalled.
[1] 2021/07/07 00:18:51.652389 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
[1] 2021/07/07 00:18:56.555042 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer2'
[1] 2021/07/07 00:19:00.462077 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer3'
[1] 2021/07/07 00:19:00.870001 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
[1] 2021/07/07 00:19:01.024537 [WRN] Resetting stream '$G > MY-STREAM'
[1] 2021/07/07 00:19:01.292724 [INF] JetStream cluster new stream leader for '$G > MY-STREAM'

nats instance 1

[1] 2021/07/07 00:18:48.190309 [INF] JetStream cluster new stream leader for '$G > MY-STREAM2'
[1] 2021/07/07 00:18:53.343597 [INF] JetStream cluster new metadata leader
[1] 2021/07/07 00:18:56.820943 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM2 > consumer5'
[1] 2021/07/07 00:18:57.098682 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer1'
[1] 2021/07/07 00:18:57.572857 [INF] JetStream cluster new stream leader for '$G > MY-STREAM2'
[1] 2021/07/07 00:18:57.679975 [INF] JetStream cluster new stream leader for '$G > MY-STREAM'
[1] 2021/07/07 00:19:00.710121 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
[1] 2021/07/07 00:19:00.909870 [WRN] Resetting stream '$G > MY-STREAM'
[1] 2021/07/08 03:30:19.175389 [WRN] Did not receive all stream info results for "$G"

nats instance 2

[1] 2021/07/07 00:18:57.508614 [INF] JetStream cluster new consumer leader for '$G > MY-STREAM > consumer4'
[1] 2021/07/07 00:19:00.710399 [WRN] Got stream sequence mismatch for '$G > MY-STREAM'
[1] 2021/07/07 00:19:00.907675 [WRN] Resetting stream '$G > MY-STREAM'

Tried Actions

Try to execute "nats consumer cluster step-down" [Failed]

nats consumer list MY-STREAM
# Consumers for Stream MY-STREAM:

#         consumer1
#         consumer2
#         consumer3
#         consumer4

nats consumer cluster step-down --trace 
# 13:11:04 >>> $JS.API.STREAM.NAMES
# {"offset":0}

# 13:11:05 <<< $JS.API.STREAM.NAMES
# {"type":"io.nats.jetstream.api.v1.stream_names_response","total":2,"offset":0,"limit":1024,"streams":["MY-STREAM","MY-STREAM2"]}

# ? Select a Stream MY-STREAM
# 13:11:13 >>> $JS.API.CONSUMER.NAMES.MY-STREAM
# {"offset":0}

# 13:11:13 <<< $JS.API.CONSUMER.NAMES.MY-STREAM
# {"type":"io.nats.jetstream.api.v1.consumer_names_response","total":4,"offset":0,"limit":1024,"consumers":["consumer1","consumer2","consumer3","consumer4"]}

# ? Select a Consumer consumer2
# 13:11:16 >>> $JS.API.CONSUMER.INFO.MY-STREAM.consumer2


# 13:11:21 <<< $JS.API.CONSUMER.INFO.MY-STREAM.consumer2: context deadline exceeded

# nats.exe: error: context deadline exceeded, try --help

Try to request CONSUMER STEPDOWN API directly [Failed]

nats req '$JS.API.CONSUMER.LEADER.STEPDOWN.MY-STREAM.consumer3' "" --trace

# 05:20:43 Sending request on "$JS.API.CONSUMER.LEADER.STEPDOWN.MY-STREAM.consumer3"
# nats: error: nats: timeout, try --help

Try to restart NATS server [Still failed to get consumer]

kubectl rollout restart statefulset nats -n mynamespace

nats con info --trace
# 05:43:02 >>> $JS.API.STREAM.NAMES
# {"offset":0}

# 05:43:02 <<< $JS.API.STREAM.NAMES
# {"type":"io.nats.jetstream.api.v1.stream_names_response","total":2,"offset":0,"limit":1024,"streams":["MY-STREAM","MY-STREAM2"]}

# ? Select a Stream MY-STREAM
# 05:43:03 >>> $JS.API.CONSUMER.NAMES.MY-STREAM
# {"offset":0}

# 05:43:03 <<< $JS.API.CONSUMER.NAMES.MY-STREAM
# {"type":"io.nats.jetstream.api.v1.consumer_names_response","total":4,"offset":0,"limit":1024,"consumers":["consumer1","consumer2","consumer3","consumer4"]}

# ? Select a Consumer consumer1
# 05:43:05 >>> $JS.API.CONSUMER.INFO.MY-STREAM.consumer1


# 05:43:05 <<< $JS.API.CONSUMER.INFO.MY-STREAM.consumer1
# {"type":"io.nats.jetstream.api.v1.consumer_info_response","error":{"code":503,"description":"JetStream system temporarily unavailable"}}

# nats: error: could not load Consumer MY-STREAM > consumer1: JetStream system temporarily unavailable

nats-0 server have a lot of JetStream WRAN logs

[1] 2021/07/08 05:40:33.345825 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
[1] 2021/07/08 05:40:34.027116 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer2' has NO quorum, stalled.
[1] 2021/07/08 05:40:34.542920 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
[1] 2021/07/08 05:40:35.494354 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
[1] 2021/07/08 05:40:55.586260 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
[1] 2021/07/08 05:40:57.300211 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
[1] 2021/07/08 05:40:58.005908 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.
[1] 2021/07/08 05:40:58.324828 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer2' has NO quorum, stalled.
[1] 2021/07/08 05:41:16.664240 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer4' has NO quorum, stalled.
[1] 2021/07/08 05:41:17.659280 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer1' has NO quorum, stalled.
[1] 2021/07/08 05:41:20.245055 [WRN] JetStream cluster consumer '$G > MY-STREAM > consumer3' has NO quorum, stalled.

NATS stream report have MY-STREAM nats-0 failed status

nats stream report

Obtaining Stream stats

+--------------------------------------------------------------------------------------------------------------------+
|                                                   Stream Report                                                    |
+-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
| Stream                      | Storage | Consumers | Messages | Bytes   | Lost | Deleted | Replicas                 |
+-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+
| MY-STREAM2 | File    | 1         | 0        | 0 B     | 0    | 0       | nats-0, nats-1, nats-2*  |
| MY-STREAM                  | File    | 0         | 500      | 3.9 MiB | 0    | 0       | nats-0!, nats-1, nats-2* |
+-----------------------------+---------+-----------+----------+---------+------+---------+--------------------------+

Try to remove nats-0 peer for MY-STREAM [Failed]

nats stream cluster peer-remove
# ? Select a Stream MY-STREAM
# ? Select a Peer nats-0
# 06:16:31 Removing peer "nats-0"
# nats: error: peer remap failed, try --help

Service crossing accounts and leaf nodes can't send message back to requester.

[X] Defect
[ ] Feature Request or Change Proposal

Defects

Make sure that these boxes are checked before submitting your issue -- thank you!

[X] Included nats-server -DV output

c1          | [1372] 2020/01/10 15:17:46.476336 [INF] Starting nats-server version 2.1.2
c1          | [1372] 2020/01/10 15:17:46.476336 [DBG] Go build version go1.12.13
c1          | [1372] 2020/01/10 15:17:46.476336 [INF] Git commit [679beda]
c1          | [1372] 2020/01/10 15:17:46.476336 [WRN] Plaintext passwords detected, use nkeys or bcrypt.
c1          | [1372] 2020/01/10 15:17:46.478337 [INF] Starting http monitor on 0.0.0.0:8222
c1          | [1372] 2020/01/10 15:17:46.478337 [INF] Listening for leafnode connections on 0.0.0.0:7422
c1          | [1372] 2020/01/10 15:17:46.478337 [DBG] Get non local IPs for "0.0.0.0"
c1          | [1372] 2020/01/10 15:17:46.485338 [DBG]  ip=172.18.206.186
c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Listening for client connections on 0.0.0.0:4244
c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Server id is ND2MSDWDWTMJEX2V7TDS2O53Q5ZEY3W3ORS6T53HOM3PR5BBP6ZSYCA6
c1          | [1372] 2020/01/10 15:17:46.488338 [INF] Server is ready
c1          | [1372] 2020/01/10 15:17:46.488338 [DBG] Get non local IPs for "0.0.0.0"
c1          | [1372] 2020/01/10 15:17:46.492338 [DBG]  ip=172.18.206.186
c2          | [1372] 2020/01/10 15:17:48.537218 [INF] Starting nats-server version 2.1.2
c2          | [1372] 2020/01/10 15:17:48.537218 [DBG] Go build version go1.12.13
c2          | [1372] 2020/01/10 15:17:48.537218 [INF] Git commit [679beda]
c2          | [1372] 2020/01/10 15:17:48.537218 [WRN] Plaintext passwords detected, use nkeys or bcrypt.
c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Starting http monitor on 0.0.0.0:8222
c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Listening for client connections on 0.0.0.0:4244
c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Server id is NCIHCZWAIQUH3OK624BMEV62WEEX6IEBKUFXAPRFRCE3GVEWRRNC5WBX
c2          | [1372] 2020/01/10 15:17:48.539218 [INF] Server is ready
c2          | [1372] 2020/01/10 15:17:48.539218 [DBG] Get non local IPs for "0.0.0.0"
c2          | [1372] 2020/01/10 15:17:48.545215 [DBG]  ip=172.18.194.70
c2          | [1372] 2020/01/10 15:17:48.556228 [DBG] Trying to connect as leafnode to remote server on "c1:7422" (172.18.206.186:7422)
c1          | [1372] 2020/01/10 15:17:48.560110 [DBG] 172.18.194.70:49157 - lid:1 - Leafnode connection created
c2          | [1372] 2020/01/10 15:17:48.560661 [DBG] 172.18.206.186:7422 - lid:1 - Remote leafnode connect msg sent
c2          | [1372] 2020/01/10 15:17:48.560661 [DBG] 172.18.206.186:7422 - lid:1 - Leafnode connection created
c2          | [1372] 2020/01/10 15:17:48.560661 [INF] Connected leafnode to "c1"
c1          | [1372] 2020/01/10 15:17:48.561188 [TRC] 172.18.194.70:49157 - lid:1 - <<- [CONNECT {"tls_required":false,"name":"NCIHCZWAIQUH3OK624BMEV62WEEX6IEBKUFXAPRFRCE3GVEWRRNC5WBX"}]
c1          | [1372] 2020/01/10 15:17:48.562131 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ test.service.1]
c1          | [1372] 2020/01/10 15:17:48.562131 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ lds.qtioyTeG9dZPgE8uYM7rsy]
c2          | [1372] 2020/01/10 15:17:48.561759 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ test.service.1]
c2          | [1372] 2020/01/10 15:17:48.562839 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ lds.qtioyTeG9dZPgE8uYM7rsy]
c1          | [1372] 2020/01/10 15:17:49.489505 [DBG] 10.35.68.24:62849 - cid:2 - Client connection created
c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"a","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Responder","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PING]
c1          | [1372] 2020/01/10 15:17:49.491212 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PONG]
c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - <<- [SUB test.service.1 NATS-RPLY-22 1]
c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PING]
c1          | [1372] 2020/01/10 15:17:49.491563 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PONG]
c2          | [1372] 2020/01/10 15:17:49.636028 [DBG] 172.18.206.186:7422 - lid:1 - LeafNode Ping Timer
c2          | [1372] 2020/01/10 15:17:49.636282 [TRC] 172.18.206.186:7422 - lid:1 - ->> [PING]
c1          | [1372] 2020/01/10 15:17:49.636909 [TRC] 172.18.194.70:49157 - lid:1 - <<- [PING]
c1          | [1372] 2020/01/10 15:17:49.636909 [TRC] 172.18.194.70:49157 - lid:1 - ->> [PONG]
c2          | [1372] 2020/01/10 15:17:49.637613 [TRC] 172.18.206.186:7422 - lid:1 - <<- [PONG]
c1          | [1372] 2020/01/10 15:17:49.732680 [DBG] 172.18.194.70:49157 - lid:1 - LeafNode Ping Timer
c1          | [1372] 2020/01/10 15:17:49.732680 [TRC] 172.18.194.70:49157 - lid:1 - ->> [PING]
c2          | [1372] 2020/01/10 15:17:49.717524 [TRC] 172.18.206.186:7422 - lid:1 - <<- [PING]
c2          | [1372] 2020/01/10 15:17:49.717524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [PONG]
c1          | [1372] 2020/01/10 15:17:49.732680 [TRC] 172.18.194.70:49157 - lid:1 - <<- [PONG]
c1          | [1372] 2020/01/10 15:17:51.714580 [DBG] 10.35.68.24:62849 - cid:2 - Client Ping Timer
c1          | [1372] 2020/01/10 15:17:51.714580 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PING]
c1          | [1372] 2020/01/10 15:17:51.714580 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PONG]
c1          | [1372] 2020/01/10 15:18:00.301474 [DBG] 10.35.68.24:62850 - cid:3 - Client connection created
c1          | [1372] 2020/01/10 15:18:00.302611 [TRC] 10.35.68.24:62850 - cid:3 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"a","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [PING]
c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - ->> [PONG]
c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [SUB _INBOX.W7P0kJjrbQVrbmzAqqk6V1.*  1]
c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- [PUB test.service.1 _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 3]
c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62850 - cid:3 - <<- MSG_PAYLOAD: ["foo"]
c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62849 - cid:2 - ->> [PING]
c1          | [1372] 2020/01/10 15:18:00.302866 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 3]
c1          | [1372] 2020/01/10 15:18:00.303903 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PONG]
c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 13]
c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
c1          | [1372] 2020/01/10 15:18:00.304384 [TRC] 10.35.68.24:62850 - cid:3 - ->> [MSG _INBOX.W7P0kJjrbQVrbmzAqqk6V1.9cn7513D 1 13]
c1          | [1372] 2020/01/10 15:18:00.305527 [DBG] 10.35.68.24:62850 - cid:3 - Client connection closed
c1          | [1372] 2020/01/10 15:18:00.307546 [TRC] 10.35.68.24:62850 - cid:3 - <-> [DELSUB 1]
c1          | [1372] 2020/01/10 15:18:03.175280 [DBG] 10.35.68.24:62865 - cid:4 - Client connection created
c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"b","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [PING]
c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - ->> [PONG]
c1          | [1372] 2020/01/10 15:18:03.176364 [TRC] 10.35.68.24:62865 - cid:4 - <<- [SUB _INBOX.4ynIPqChOQMSroNEZqndLx.*  1]
c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS+ _INBOX.4ynIPqChOQMSroNEZqndLx.*]
c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62865 - cid:4 - <<- [PUB test.service.1 _INBOX.4ynIPqChOQMSroNEZqndLx.HhaycK1D 3]
c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62865 - cid:4 - <<- MSG_PAYLOAD: ["foo"]
c1          | [1372] 2020/01/10 15:18:03.177312 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.5bq99K 3]
c2          | [1372] 2020/01/10 15:18:03.177521 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS+ _INBOX.4ynIPqChOQMSroNEZqndLx.*]
c1          | [1372] 2020/01/10 15:18:03.178465 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.5bq99K 13]
c1          | [1372] 2020/01/10 15:18:03.178465 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
c1          | [1372] 2020/01/10 15:18:03.178530 [TRC] 10.35.68.24:62865 - cid:4 - ->> [MSG _INBOX.4ynIPqChOQMSroNEZqndLx.HhaycK1D 1 13]
c1          | [1372] 2020/01/10 15:18:03.179615 [DBG] 10.35.68.24:62865 - cid:4 - Client connection closed
c1          | [1372] 2020/01/10 15:18:03.180602 [TRC] 10.35.68.24:62865 - cid:4 - <-> [DELSUB 1]
c1          | [1372] 2020/01/10 15:18:03.180602 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LS- _INBOX.4ynIPqChOQMSroNEZqndLx.*]
c2          | [1372] 2020/01/10 15:18:03.179385 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LS- _INBOX.4ynIPqChOQMSroNEZqndLx.*]
c2          | [1372] 2020/01/10 15:18:03.181119 [TRC] 172.18.206.186:7422 - lid:1 - <-> [DELSUB _INBOX.4ynIPqChOQMSroNEZqndLx.*]
c2          | [1372] 2020/01/10 15:18:05.761225 [DBG] 10.35.68.24:62866 - cid:2 - Client connection created
c2          | [1372] 2020/01/10 15:18:05.762291 [TRC] 10.35.68.24:62866 - cid:2 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"c","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [PING]
c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - ->> [PONG]
c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [SUB _INBOX.TfzSpQyvrMigTw0TP7cMHt.*  1]
c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LS+ _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- [PUB test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 10.35.68.24:62866 - cid:2 - <<- MSG_PAYLOAD: ["foo"]
c2          | [1372] 2020/01/10 15:18:05.762524 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LMSG test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
c1          | [1372] 2020/01/10 15:18:05.763695 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LS+ _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LMSG test.service.1 _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 3]
c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - <<- MSG_PAYLOAD: ["foo"]
c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.x9JGBo 3]
c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.x9JGBo 13]
c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
c1          | [1372] 2020/01/10 15:18:05.763912 [TRC] 172.18.194.70:49157 - lid:1 - ->> [LMSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 13]
c2          | [1372] 2020/01/10 15:18:05.764649 [TRC] 172.18.206.186:7422 - lid:1 - <<- [LMSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 13]
c2          | [1372] 2020/01/10 15:18:05.764649 [TRC] 172.18.206.186:7422 - lid:1 - <<- MSG_PAYLOAD: ["response text"]
c2          | [1372] 2020/01/10 15:18:05.765070 [TRC] 10.35.68.24:62866 - cid:2 - ->> [MSG _INBOX.TfzSpQyvrMigTw0TP7cMHt.05sUrsio 1 13]
c2          | [1372] 2020/01/10 15:18:05.766173 [DBG] 10.35.68.24:62866 - cid:2 - Client connection closed
c2          | [1372] 2020/01/10 15:18:05.766411 [TRC] 10.35.68.24:62866 - cid:2 - <-> [DELSUB 1]
c2          | [1372] 2020/01/10 15:18:05.766411 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LS- _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
c1          | [1372] 2020/01/10 15:18:05.766060 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LS- _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
c1          | [1372] 2020/01/10 15:18:05.766060 [TRC] 172.18.194.70:49157 - lid:1 - <-> [DELSUB _INBOX.TfzSpQyvrMigTw0TP7cMHt.*]
c2          | [1372] 2020/01/10 15:18:07.378670 [DBG] 10.35.68.24:62867 - cid:3 - Client connection created
c2          | [1372] 2020/01/10 15:18:07.378670 [TRC] 10.35.68.24:62867 - cid:3 - <<- [CONNECT {"verbose":false,"pedantic":false,"user":"d","pass":"[REDACTED]","tls_required":false,"name":"NATS Sample Requestor","lang":"go","version":"1.9.1","protocol":1,"echo":true}]
c2          | [1372] 2020/01/10 15:18:07.378670 [TRC] 10.35.68.24:62867 - cid:3 - <<- [PING]
c2          | [1372] 2020/01/10 15:18:07.379670 [TRC] 10.35.68.24:62867 - cid:3 - ->> [PONG]
c2          | [1372] 2020/01/10 15:18:07.379746 [TRC] 10.35.68.24:62867 - cid:3 - <<- [SUB _INBOX.89dvNgB1mAb4aZo4PLaWJz.*  1]
c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 10.35.68.24:62867 - cid:3 - <<- [PUB test.service.1 _INBOX.89dvNgB1mAb4aZo4PLaWJz.WyXS3UnR 3]
c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 10.35.68.24:62867 - cid:3 - <<- MSG_PAYLOAD: ["foo"]
c2          | [1372] 2020/01/10 15:18:07.380243 [TRC] 172.18.206.186:7422 - lid:1 - ->> [LMSG test.service.1 _R_.BBa91r.hQOCWj 3]
c1          | [1372] 2020/01/10 15:18:07.380535 [TRC] 172.18.194.70:49157 - lid:1 - <<- [LMSG test.service.1 _R_.BBa91r.hQOCWj 3]
c1          | [1372] 2020/01/10 15:18:07.380535 [TRC] 172.18.194.70:49157 - lid:1 - <<- MSG_PAYLOAD: ["foo"]
c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - ->> [MSG test.service.1 1 _R_.ie4QZJ.tVfoKl 3]
c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - <<- [PUB _R_.ie4QZJ.tVfoKl 13]
c1          | [1372] 2020/01/10 15:18:07.380747 [TRC] 10.35.68.24:62849 - cid:2 - <<- MSG_PAYLOAD: ["response text"]
c2          | [1372] 2020/01/10 15:18:09.386622 [DBG] 10.35.68.24:62867 - cid:3 - Client connection closed
c2          | [1372] 2020/01/10 15:18:09.386891 [TRC] 10.35.68.24:62867 - cid:3 - <-> [DELSUB 1]
Gracefully stopping... (press Ctrl+C again to force)
Stopping c2   ... done
Stopping c1   ... done

[x] Included a [Minimal, Complete, and Verifiable example] (https://stackoverflow.com/help/mcve)

Versions of `nats-server` and affected client libraries used:

See logs. The go examples are as of commit f66f9c02346dc33296576bf0ef4bd48520bf88c9.

OS/Container environment:

Windows nanoserver

Steps or code to reproduce the issue:

docker-compose.yml

version: "3.2"

services:
 cluster1: 
   image: nats:2.1.2-nanoserver
   container_name: c1
   command: -c C:\\mount\\c1 -DV
   ports: 
     - 80:8222
     - 4244:4244
   expose:
     - "7422"
   volumes:
     - .\:C:\mount\
   networks:
     - cluster1
   restart: always
 cluster2: 
   depends_on: 
     - cluster1
   image: nats:2.1.2-nanoserver
   container_name: c2
   command: -c C:\\mount\\c2 -DV
   ports: 
     - 81:8222
     - 4245:4244
   expose:
     - "7422"
   volumes:
     - .\:C:\mount\
   networks:
     - cluster1
   restart: always
 
networks:
 cluster1:

cluster 1 config:

port: 4244
monitor_port: 8222
accounts: {
  A: {
    users:[{
      user: a
      password: a
    }]
    exports: [
      {service: test.service.>}
    ]
  },
  B: {
    users:[{
       user: b
        password: b
    }]
    imports: [
      {service: {account: A, subject: test.service.1}}
    ]
  }
}

leafnodes {
  port: 7422
  authorization {
    account: B
  }
}

cluster 2 config:

port: 4244
monitor_port: 8222
accounts: {
  C: {
    users:[{
      user: c
      password: c
    }]
    exports: [
      {service: test.service.>}
    ]
  },
  D: {
    users:[{
       user: d
       password: d
    }]
    imports: [
      {service: {account: C, subject: test.service.1}}
    ]
  }
}
leafnodes {
  remotes: [
    {
      urls: [
        nats-leaf://c1:7422
      ]
      account: C
    }
  ]
}

Starting a nats-rply: start "cluster1 Account A service" nats-rply -s nats://a:a@localhost:4244 test.service.1 "response text"

Sending request to account D: nats-req -s nats://d:d@localhost:4245 test.service.1 foo

Expected result:

Request is sent from account D on cluster 2 to service listening at test.service.1 on Account A on cluster 1, and the requester gets "response text" back.

Actual result:

The service listening at test.service.1 gets a request of 'foo', but no message is returned to requester. Instead: "nats: timeout for request"

Support WebSocket Connectivity

Hi,

At @gretaio, we need our signaling server to talk with web browsers, and in order to perform this, we setup a small proxy to gateway websocket to tcp so it can talk to nats.

I saw on the todolist that you plan on adding a websocket strategy, and that's something we would greatly appreciate as that'd basically half the number of connections we need to have open :+1:.

So, would you be open to a PR regarding this?
logging system, syslog and abstraction improvements
This is a WIP, this is a little roadmad and some questions I have.

[x] Create server.Logger interface and add server.SetLogger method

[x] Modify all the actual call to the new format

[x] Network syslog

[x] gnatsd options and link Network syslog

[x] fix this race condition on server.SetLogger()

[x] Tests

[x] Transform error message in real Errors

Questions:

Is really needed the alreadyFormatted function? We can control this in the codebase

Some options like LogFile, FidFile are related more to the daemon than the Server class, we can move away? and keep it at the gnatsd.go?

Related to: https://github.com/apcera/gnatsd/issues/7

Consumers stops receiving messages

Defect

Versions of `nats-server` and affected client libraries used:

Nats server version

[83] 2021/09/04 18:51:12.239432 [INF] Starting nats-server
[83] 2021/09/04 18:51:12.239488 [INF]   Version:  2.4.0
[83] 2021/09/04 18:51:12.239494 [INF]   Git:      [219a7c98]
[83] 2021/09/04 18:51:12.239496 [DBG]   Go build: go1.16.7
[83] 2021/09/04 18:51:12.239517 [INF]   Name:     NBVE7O7DMRAZ63STC7Z644KHF5HJ6QQUGLZVGDIKEG32CFL2J6O2456M
[83] 2021/09/04 18:51:12.239533 [INF]   ID:       NBVE7O7DMRAZ63STC7Z644KHF5HJ6QQUGLZVGDIKEG32CFL2J6O2456M
[83] 2021/09/04 18:51:12.239605 [DBG] Created system account: "$SYS"

Go client version: v1.12.0

OS/Container environment:

GKE Kubernetes. Running nats js HA cluster. Deployed via nats helm chart.

Steps or code to reproduce the issue:

Stream configuration:

apiVersion: jetstream.nats.io/v1beta1
kind: Stream
metadata:
  name: agent
spec:
  name: agent
  subjects: ["data.*"]
  storage: file
  maxAge: 1h
  replicas: 3
  retention: interest

There are two consumers to this stream. Each runs as queue subscriber in two services with 2 pod replicas. Note that I don't care if message is not processed, this is why ack none is set.


// 2 pods for service A.
js.QueueSubscribe(
	"data.received",
	"service1_queue",
	func(msg *nats.Msg) {},
	nats.DeliverNew(),
	nats.AckNone(),
)

// 2 pods for service B.
s.js.QueueSubscribe(
	"data.received",
	"service2_queue",
	func(msg *nats.Msg) {},
	nats.DeliverNew(),
	nats.AckNone(),
)

Expected result:

Consumer receives messages.

Actual result:

Stream stats after few days:

agent                  │ File    │ 3         │ 28,258   │ 18 MiB  │ 0    │ 84      │ nats-js-0, nats-js-1*, nats-js-2

Consumers stats:

service1_queue │ Push │ None       │ 0.00s    │ 0           │ 0           │ 0           │ 60,756    │ nats-js-0, nats-js-1*, nats-js-2
service2_queue │ Push │ None       │ 0.00s    │ 0           │ 0           │ 8,193 / 28% │ 60,843    │ nats-js-0, nats-js-1*, nats-js-2

Non of the nats server pods contains errors indicating any problem.
Unprocessed messages count for second consumer stays the same and doesn't decrease.
The only fix which helped is after I changed second consumer raft leader with nats consumer cluster step-down. But after some time problem still comes back.
There are active connections to the server. Checked with nats server report connections.

/cc @kozlovic @derekcollison

NATS for RPC

Hello! I use nats as a backbone for RPC in my microservice backend and it works great (low latency, good stability, loosely-coupled services unlike grpc, for example). However, I was reinventing a wheel to pair nats and RPC concepts (you can find my solution in my busrpc-spec repo, sorry for this ads). The great work is done for JetStream, however, the nats is originally beloved for it's low latency and effectivenes and was chosen by many for faster message distribution. Do you have any plans to add some features to provide RPC features for your project?
NATS storage problem

Hi, I am using the Interest policy of nats in kubernetes with 3 replicas, and I have the following problem: What is indicated in the stream is not what is actually stored on disk, the disk has much more capacity, I do not know if it is because the delated messages are stored, and the problem I have now is that I am filling the storage very quickly without knowing why.

this is the storage of one of my discs, all 3 replicas are the same:

The nats is deployed with helm, using the following version: 0.18.1. nats-server version: 2.9.3 Each replica has its own PV with a storage of 10gb for each one. To allocate each volume we use longhorn, so the directory where it is stored is /var/lib/longhorn.
Allow stream meta to be synched on creation to disk

Feature Request

As we gather more embedded use cases we are seeing device resets leave meta data filed and block file with 0 bytes.

Use Case:

NATS server embedded on devices.

Proposed Change:

Allow server to be configured to sync meta data to disk on stream creation.

Who Benefits From The Change(s)?

Embedded use cases.
Allow to set max startup time on windows service

Feature Request

Use Case:

The windows service reports a failure to start if it's not ready to accept connections within 10 seconds.

This value was fixed and hardcoded since before JetStream. On (some) Windows systems, this leads to service startup failures, as the store dir sorting may be hindered by an important load, or slowed down by increased accesses times, typically from security software influence.

see the relevant line of code

Proposed Change:

Check for an env var to allow setting this delay to any delay depending on expectations for the current use case.

Who Benefits From The Change(s)?

windows users, esp. when jetstream and security software are competing for processing resources.

Alternative Approaches

Currently setting a NATS service as delayed start, limits occurrences of failures to start. Once the streams and consumers are big enough to cause startup failures, several manual starts usually restores a startup time under 10 seconds.

Another way to work around the service start failure is to whitelist the server storage dir with security software. IT services are usually not happy with this.
Setup TLS, manually specify ServerName

Feature Request

Being able to setup a TLS connection where the nats-route can be an IP
instead of the domain name that has been used to sign the certificate.

Use Case:

Being able to make a TLS connection on a network where we do not have control over the DNS, but do have the IP.

Proposed Change:

Add a configuration parameter to the tls configuration to specify the tlsConfig.ServerName instead of deriving it from the route address.

Who Benefits From The Change(s)?

Besides being able to manually set the IP to the server, this could also help people debugging their TLS connection.

High-Performance server for NATS, the cloud native messaging system.

Documentation

Contact

Contributing

Security

Security Audit

Reporting Security Vulnerabilities

License

Owner

NATS - The Cloud Native Messaging System

Comments

subscription count in subsz is wrong

Performance issues with locks and sublist cache

Feature Requests

Use Case:

Proposed Change:

Who Benefits From The Change(s)?

Consumer stopped working after errPartialCache (nats-server oom-killed)

Defect

Versions of nats-server and affected client libraries used:

OS/Container environment:

Steps or code to reproduce the issue:

Expected result:

Actual result:

jetstream could not pull message after nats-server restart

Client Auth API

memory increase in clustered mode

Suggest repair actions for JetStream cluster consumer NO quorum issue

Environment

Event description

NATS server logs

Tried Actions

Service crossing accounts and leaf nodes can't send message back to requester.

Defects

Versions of nats-server and affected client libraries used:

OS/Container environment:

Steps or code to reproduce the issue:

Expected result:

Actual result:

Support WebSocket Connectivity

logging system, syslog and abstraction improvements

Consumers stops receiving messages

Defect

Versions of nats-server and affected client libraries used:

OS/Container environment:

Steps or code to reproduce the issue:

Expected result:

Actual result:

NATS for RPC

NATS storage problem

Allow stream meta to be synched on creation to disk

Feature Request

Use Case:

Proposed Change:

Who Benefits From The Change(s)?

Allow to set max startup time on windows service

Feature Request

Use Case:

Proposed Change:

Who Benefits From The Change(s)?

Alternative Approaches

Setup TLS, manually specify ServerName

Feature Request

Use Case:

Proposed Change:

Who Benefits From The Change(s)?

Related tags

Micro is a platform for cloud native development

Kafka implemented in Golang with built-in coordination (No ZK dep, single binary install, Cloud Native)

CockroachDB - the open source, cloud-native distributed SQL database.

A feature complete and high performance multi-group Raft library in Go.

High performance, distributed and low latency publish-subscribe platform.

short-url distributed and high-performance

Collection of high performance, thread-safe, lock-free go data structures

A realtime distributed messaging platform

Glow is an easy-to-use distributed computation system written in Go, similar to Hadoop Map Reduce, Spark, Flink, Storm, etc. I am also working on another similar pure Go system, https://github.com/chrislusf/gleam , which is more flexible and more performant.

Simple, fast and scalable golang rpc library for high load

Dapr is a portable, event-driven, runtime for building distributed applications across cloud and edge.

A distributed systems library for Kubernetes deployments built on top of spindle and Cloud Spanner.

A distributed locking library built on top of Cloud Spanner and TrueTime.

Versions of `nats-server` and affected client libraries used:

Versions of `nats-server` and affected client libraries used:

Versions of `nats-server` and affected client libraries used: