We've been experiencing errors with lib/pq that eventually results in a state where no database connections in the pool are free. Max connections as set by SetMaxOpenConns
is not reached.
The errors normally start with db.Prepare
:
Could not prepare statement: pq: unexpected describe rows response: '3'
We also see the unexpected response 'C'. These develop into multiple occurrences of
sql: statement expects 0 inputs; got 4
and similar errors on statements that have previously been prepared with no errors.
To reiterate, we don't query any prepared statements that returned errors from Prepare
. These errors are returned from queries on successfully prepared statements.
Then, we see occurrences of failures of Begin
:
Could not start a transaction: pq: unknown response for simple query: '1'
Could not start a transaction: unexpected command tag
Could not start a transaction: pq: unexpected transaction status idle in transaction
The only error in postgres around this time is:
FATAL: invalid frontend message type 90
Which happened around 20s after the initial errors were seen in our application log.
pprof
indicates that all goroutines querying the database were netpolling:
# 0x423886 netpollblock+0xa6 /usr/local/src/go/src/pkg/runtime/netpoll.goc:280
# 0x4231ea net.runtime_pollWait+0x6a /usr/local/src/go/src/pkg/runtime/netpoll.goc:116
# 0x695534 net.(*pollDesc).Wait+0x34 /usr/local/src/go/src/pkg/net/fd_poll_runtime.go:81
# 0x695590 net.(*pollDesc).WaitRead+0x30 /usr/local/src/go/src/pkg/net/fd_poll_runtime.go:86
# 0x696910 net.(*netFD).Read+0x2a0 /usr/local/src/go/src/pkg/net/fd_unix.go:204
# 0x6a5825 net.(*conn).Read+0xc5 /usr/local/src/go/src/pkg/net/net.go:122
# 0x48a1a0 bufio.(*Reader).fill+0x110 /usr/local/src/go/src/pkg/bufio/bufio.go:91
# 0x48a5a4 bufio.(*Reader).Read+0x1a4 /usr/local/src/go/src/pkg/bufio/bufio.go:159
# 0x46f566 io.ReadAtLeast+0xf6 /usr/local/src/go/src/pkg/io/io.go:288
# 0x46f6d1 io.ReadFull+0x71 /usr/local/src/go/src/pkg/io/io.go:306
# 0x4f5f9b github.com/lib/pq.(*conn).recvMessage+0x10b /home/matt/dev/go/src/github.com/lib/pq/conn.go:637
# 0x4f6357 github.com/lib/pq.(*conn).recv1+0x27 /home/matt/dev/go/src/github.com/lib/pq/conn.go:690
# 0x4f5242 github.com/lib/pq.(*conn).prepareToSimpleStmt+0x822 /home/matt/dev/go/src/github.com/lib/pq/conn.go:508
# 0x4f4997 github.com/lib/pq.(*conn).prepareTo+0x87 /home/matt/dev/go/src/github.com/lib/pq/conn.go:486
# 0x4f5670 github.com/lib/pq.(*conn).Prepare+0x120 /home/matt/dev/go/src/github.com/lib/pq/conn.go:539
# 0x4e5199 database/sql.(*driverConn).prepareLocked+0x49 /usr/local/src/go/src/pkg/database/sql/sql.go:250
# 0x4e7ce1 database/sql.(*DB).prepare+0xb1 /usr/local/src/go/src/pkg/database/sql/sql.go:828
# 0x4e7b9c database/sql.(*DB).Prepare+0x5c /usr/local/src/go/src/pkg/database/sql/sql.go:808
In a separate instance of unresponsiveness, all the goroutines were waiting on mutexes, e.g.:
# 0x4241d0 sync.runtime_Semacquire+0x30 /usr/local/src/go/src/pkg/runtime/sema.goc:199
# 0x47f136 sync.(*Mutex).Lock+0xd6 /usr/local/src/go/src/pkg/sync/mutex.go:66
# 0x4e6dd2 database/sql.(*DB).conn+0x42 /usr/local/src/go/src/pkg/database/sql/sql.go:616
# 0x4e7c82 database/sql.(*DB).prepare+0x32 /usr/local/src/go/src/pkg/database/sql/sql.go:823
# 0x4e7bbc database/sql.(*DB).Prepare+0x5c /usr/local/src/go/src/pkg/database/sql/sql.go:808
The logged errors weren't present in that case, so these could be two totally different problems.
We're running Ubuntu 12.04 and pg 9.2 with go 1.2.1, and we're not assuming a trouble-free network by any means.
I appreciate this is somewhat vague and we don't have a reproducible test case (yet), but any pointers for further investigation would be appreciated. Let me know if further background info would be helpful.