I am writing some Go webservices (also implementing the webserver in Go with http.ListenAndServe). I have a map of structs which I would like to keep in memory (with an approximate data size of 100Kb) to be used by different http requests.
Which is the best way to achieve this in Go? In your experience, is it better to use global package variables or caching systems (like memcache/groupcache)?
Don't indulge in premature optimization. Define a Go package API to encapsulate the data and then you can change the implementation at any time. For example, just scribbling,
package data
type Key struct {
// . . .
}
type Data struct {
// . . .
}
var dataMap map[Key]Data
func init() {
dataMap = make(map[Key]Data)
}
func GetData(key Key) (*Data, error) {
data := dataMap[key]
return &data, nil
}
The simple solution that still solves the problem is generally the right approach. Caching data locally at the server is both simple and efficient, and can be done in many circumstances even if a richer system is being used.
As an example, last month I've put a quick hack live (http://ubuntu-edge.info) which took a bit of load due to the news spreading around the crowdfunding campaign, and the Go process had very low load most of the time, in a small EC2 machine. The data was trivial and was just cached in memory, refreshed once per minute from the database that was being updated by an external process.
I generally look at memcache (and now groupcache) when doing it myself would mean replicating more of their functionality. For example, when there's too much data to be kept in memory, or when losing a server might affect the performance of the overall system because the cached data is all gone at once, or because the performance is being impacted by contention.
As a side note, groupcache is actually a library, so you can embed it in your server instead of having as an actual external system.
In addition to the answers you've already received, consider making use of receiver-curried method values and http.HandlerFunc.
If your data is data that is loaded before the process starts, you could go with something like this:
type Common struct {
Data map[string]*Data
}
func NewCommon() (*Common, error) {
// load data
return c, err
}
func (c *Common) Root(w http.ResponseWriter, r *http.Request) {
// handler
}
func (c *Common) Page(w http.ResponseWriter, r *http.Request) {
// handler
}
func main() {
common, err := NewCommon()
if err != nil { ... }
http.HandleFunc("/", common.Root)
http.HandleFunc("/page", common.Page)
http.ListenAndServe(...)
}
This works nicely if all of the Common
data is read-only. If the Common
data is read/write, then you'll want to have something more like:
type Common struct {
lock sync.RWMutex
data map[string]Data // Data should probably not have any reference fields
}
func (c *Common) Get(key string) (*Data, bool) {
c.lock.RLock()
defer c.lock.RUnlock()
d, ok := c.data[key]
return &d, ok
}
func (c *Common) Set(key string, d *Data) {
c.lock.Lock()
defer c.lock.Unlock()
c.data[key] = *d
}
The rest is basically the same, except instead of accessing the data through the receiver's fields directly, you'd access them through the getters and setters. In a webserver where most of the data is being read, you will probably want an RWMutex, so that reads can be executed concurrently with one another. Another advantage of the second approach is that you've encapsulated the data, so you can add in transparent writes to and/or reads from a memcache or a groupcache or something of that nature in the future if your application grows such a need.
One thing that I really like about defining my handlers as methods on an object is that it makes it much easier to unit test them: you can easily define a table driven test that includes the values you want and the output you expect without having to muck around with global variables.