I have a Docker cluster running 10 web services (of the same kind). They are all using MongoDB, among other things for data persistence.
This is the code that gets called from main()
, when the service is booting up:
// Init establishes a connection with MongoDB instance.
func Init(mongoURL string) *mgo.Session {
mongo, err := mgo.Dial(mongoURL)
misc.PanicIf(err)
// make sure we are strongly consistent
mongo.SetMode(mgo.Strong, true)
// update global state
db = mongo
Entries = db.DB("").C("entries")
Channels = db.DB("").C("channels")
Settings = db.DB("").C("settings")
Metadata = db.DB("").C("metadata")
// only use this on first load, to confirm the settings are there
// every refresh should be done via UpdateGlobalSettings (thread-safe)
GlobalSettings = &GlobalSettingsStruct{}
GlobalSettings.Init()
return mongo
}
So basically the API and the workers just use global variables such as Entries
, Settings
, etc.
After running for a while, the service stopped working properly. Every mongo action, such as Entries.find(...)
returns an err of: Closed Explicitly
.
What does that mean? Should I be refreshing the mongoDB connection periodically or should I make it close and re-establish connection with each request?
The app is performance oriented, so despite mongo connection being down, everything is still up and running as everything operates on in-memory or cluster cache. I don't want to do something stupid that delays processing.
First of all try enabling debug mode in mgo in order to get some more info about what's happening.
I suppose the server is dropping the connection after some inactivity time. In any case, usually the approach is to do a mgo Dial
and then Copy
the connection at the beginning of every request handling(using a middleware).
Pietro
that is ok. but to use Session.Copy
to create a new connection instance, instead of using the returned session directly. there are connection pool inside golang mongodb driver package.
When sess.SetPoolLimit(50)
not used many errors are occurring when mgo
is under stress like 10.000 concurrent connections.
When I limit the pool errors went away.
I've created a test-case source-code for this problem below, so you can test it in your own machine easily.
Do you have any suggestions about this behaviour, I would like to hear.
package main
import (
"fmt"
"time"
// you can also use original go-mgo/mgo here as well
mgo "github.com/globalsign/mgo"
"github.com/globalsign/mgo/bson"
)
// TODO: put some records into db first:
//
// use testapi
// db.competitions.insert([
// {game_id: 1, game_name: "foo"},
// {game_id: 2, game_name: "bar"},
// {game_id: 3, game_name: "jazz"}
// ])
// NOTE: you might want to increase this depending on your machine power
// mine is:
// MacBook (Retina, 12-inch, Early 2015)
// 1,2 GHz Intel Core M
// 8 GB 1600 MHz DDR3
const ops = 10000
type m bson.M
func main() {
sess, err := mgo.DialWithTimeout("localhost", time.Second)
if err != nil {
panic(err)
}
defer sess.Close()
// NOTE: without this setting there are many errors
// see the end of the file
// setting pool limit prevents most of the timeouts
// sess.SetPoolLimit(50)
// sess.SetSocketTimeout(60 * time.Second)
sess.SetMode(mgo.Monotonic, true)
time.Sleep(time.Second)
done := make(chan bool, ops)
for i := 0; i < ops; i++ {
go func() {
defer func() { done <- true }()
result, err := fetchSomething(sess)
if err != nil {
fmt.Printf("ERR: %s
", err)
}
fmt.Printf("RESULT: %+v
", result)
}()
}
for i := 0; i < ops; i++ {
<-done
}
}
func fetchSomething(sess *mgo.Session) ([]m, error) {
s := sess.Copy()
defer s.Close()
result := []m{}
group := m{
"$group": m{
"_id": m{
"id": "$game_id",
"name": "$game_name",
},
},
}
project := m{
"$project": m{
"_id": "$_id.id",
"name": "$_id.name",
},
}
sort := m{
"$sort": m{
"_id": 1,
},
}
err := col(s, "competitions").Pipe([]m{group, project, sort}).All(&result)
if err != nil {
return nil, err
}
return result, nil
}
// col is a helper for selecting a column
func col(sess *mgo.Session, name string) *mgo.Collection {
return sess.DB("testapi").C(name)
}
/*
ERRORS WITHOUT sess.SetPoolLimit(50)
$ go run socket_error.go
RESULT: [map[name:foo _id:1] map[_id:2 name:bar] map[_id:3 name:jazz]]
ERR: read tcp 127.0.0.1:52918->127.0.0.1:27017: read: connection reset by peer
ERR: write tcp 127.0.0.1:52084->127.0.0.1:27017: write: broken pipe
RESULT: []
RESULT: []
ERR: write tcp 127.0.0.1:53627->127.0.0.1:27017: write: broken pipe
ERR: write tcp 127.0.0.1:53627->127.0.0.1:27017: write: broken pipe
RESULT: []
ERR: write tcp 127.0.0.1:53627->127.0.0.1:27017: write: broken pipe
RESULT: []
ERR: write tcp 127.0.0.1:53627->127.0.0.1:27017: write: broken pipe
RESULT: []
ERR: write tcp 127.0.0.1:53627->127.0.0.1:27017: write: broken pipe
RESULT: []
ERR: write tcp 127.0.0.1:53627->127.0.0.1:27017: write: broken pipe
RESULT: []
RESULT: []
ERR: read tcp 127.0.0.1:53627->127.0.0.1:27017: read: connection reset by peer
RESULT: []
ERR: resetNonce: write tcp 127.0.0.1:53625->127.0.0.1:27017: write: broken pipe
RESULT: []
RESULT: [map[name:foo _id:1] map[_id:2 name:bar] map[_id:3 name:jazz]]
ERR: resetNonce: write tcp 127.0.0.1:54591->127.0.0.1:27017: write: broken pipe
RESULT: []
...
...
*/