如果地图是引用类型,为什么json.Unmarshal需要指向地图的指针?

I was working with json.Unmarshal and came across the following quirk. When running the below code, I get the error json: Unmarshal(non-pointer map[string]string)

func main() {
    m := make(map[string]string)
    data := `{"foo": "bar"}`
    err := json.Unmarshal([]byte(data), m)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(m)
}

Playground

Looking at the documentation for json.Unmarshal, there is seemingly no indication that a pointer is required. The closest I can find is the following line

Unmarshal parses the JSON-encoded data and stores the result in the value pointed to by v.

The lines regarding the protocol Unmarshal follows for maps are similarly unclear, as it makes no reference to pointers.

To unmarshal a JSON object into a map, Unmarshal first establishes a map to use. If the map is nil, Unmarshal allocates a new map. Otherwise Unmarshal reuses the existing map, keeping existing entries. Unmarshal then stores key-value pairs from the JSON object into the map. The map's key type must either be a string, an integer, or implement encoding.TextUnmarshaler.

Why must I pass a pointer to json.Unmarshal, especially if maps are already reference types? I know that if I pass a map to a function, and add data to the map, the underlying data of the map will be changed (see the following playground example), which means that it shouldn't matter if I pass a pointer to a map. Can someone clear this up?

As stated in the documentation:

Unmarshal uses the inverse of the encodings that Marshal uses, allocating maps, slices, and pointers as necessary, with ...

Unmarshal may allocates the variable(map, slice, etc.). If we pass a map instead of pointer to a map, then the newly allocated map won't be visible to the caller. The following examples (Go Playground) demonstrates this:

package main

import (
    "fmt"
)

func mapFunc(m map[string]interface{}) {
    m = make(map[string]interface{})
    m["abc"] = "123"
}

func mapPtrFunc(mp *map[string]interface{}) {
    m := make(map[string]interface{})
    m["abc"] = "123"

    *mp = m
}

func main() {
    var m1, m2 map[string]interface{}
    mapFunc(m1)
    mapPtrFunc(&m2)

    fmt.Printf("%+v, %+v
", m1, m2)
}

in which the output is:

map[], map[abc:123]

If the requirement says that a function/method may allocate a variable when necessary and the newly allocated variable need to be visible to the caller, the solution will be: (a) the variable must be in function's return statement or (b) the variable can be assigned to the function/method argument. Since in go everything is pass by value, in case of (b), the argument must be a pointer. The following diagram illustrates what happen in the above example:

Illustration of variable allocation

  1. At first, both map m1 and m2 point to nil.
  2. Calling mapFunc will copy the value pointed by m1 to m resulting m will also point to nil map.
  3. If in (1) the map already allocated, then in (2) the address of underlying map data structure pointed by m1 (not the address of m1) will be copied to m. In this case both m1 and m point to the same map data structure, thus modifying map items through m1 will also be visible to m.
  4. In the mapFunc function, new map is allocated and assigned to m. There is no way to assign it to m1.

In case of pointer:

  1. When calling mapPtrFunc, the address of m2 will be copied to mp.
  2. In the mapPtrFunc, new map is allocated and assigned to *mp (not mp). Since mp is pointer to m2, assigning the new map to *mp will change the value pointed by m2. Note that the value of mp is unchanged, i.e. the address of m2.

The other key part of the documentation is this:

To unmarshal JSON into a pointer, Unmarshal first handles the case of the JSON being the JSON literal null. In that case, Unmarshal sets the pointer to nil. Otherwise, Unmarshal unmarshals the JSON into the value pointed at by the pointer. If the pointer is nil, Unmarshal allocates a new value for it to point to.

If Unmarshall accepted a map, it would have to leave the map in the same state whether the JSON were null or {}. But by using pointers, there's now a difference between the pointer being set to nil and it pointing to an empty map.

Note that in order for Unmarshall to be able to "set the pointer to nil", you actually need to pass in a pointer to your map pointer:

package main

import (
    "encoding/json"
    "fmt"
    "log"
)

func main() {
    var m *map[string]string
    data := `{}`
    err := json.Unmarshal([]byte(data), &m)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(m)

    data = `null`
    err = json.Unmarshal([]byte(data), &m)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(m)

    data = `{"foo": "bar"}`
    err = json.Unmarshal([]byte(data), &m)
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(m)
}

This outputs:

&map[]
<nil>
&map[foo:bar]

Your viewpoint is no different than saying "a slice is nothing but a pointer". Slices (and maps) use pointers to make them lightweight, yes, but there are still more things that make them work. A slice contains info about its length and capacity for example.

As for why this happens, from a code perspective, the last line of json.Unmarshal calls d.unmarshal(), which executes the code in lines 176-179 of decode.go. It basically says "if the value isn't a pointer, or is nil, return an InvalidUnmarshalError."

The docs could probably be clearer about things, but consider a couple of things:

  1. How would the JSON null value be assigned to the map as nil if you don't pass a pointer to the map? If you require the ability to modify the map itself (rather than the items in the map), then it makes sense to pass a pointer to the item that needs modified. In this case, it's the map.
  2. Alternately, suppose you passed a nil map to json.Unmarshal. Values will be unmarshaled as necessary after the code json.Unmarshal uses eventually calls the equivalent of make(map[string]string). However, you still have a nil map in your function because your map pointed to nothing. There's no way to fix this other than to pass a pointer to the map.

However, let's say there was no need to pass the address of your map because "it's already a pointer", and you've already initialized the map, so it's not nil. What happens then? Well, if I bypass the test in the lines I linked earlier by changing line 176 to read if rv.Kind() != reflect.Map && rv.Kind() != reflect.Ptr || rv.IsNil() {, then this can happen:

`{"foo":"bar"}`: false map[foo:bar]
`{}`: false map[]
`null`: panic: reflect: reflect.Value.Set using unaddressable value [recovered]
    panic: interface conversion: string is not error: missing method Error

goroutine 1 [running]:
json.(*decodeState).unmarshal.func1(0xc420039e70)
    /home/kit/jstest/src/json/decode.go:172 +0x99
panic(0x4b0a00, 0xc42000e410)
    /usr/lib/go/src/runtime/panic.go:489 +0x2cf
reflect.flag.mustBeAssignable(0x15)
    /usr/lib/go/src/reflect/value.go:228 +0xf9
reflect.Value.Set(0x4b8b00, 0xc420012300, 0x15, 0x4b8b00, 0x0, 0x15)
    /usr/lib/go/src/reflect/value.go:1345 +0x2f
json.(*decodeState).literalStore(0xc420084360, 0xc42000e3f8, 0x4, 0x8, 0x4b8b00, 0xc420012300, 0x15, 0xc420000100)
    /home/kit/jstest/src/json/decode.go:883 +0x2797
json.(*decodeState).literal(0xc420084360, 0x4b8b00, 0xc420012300, 0x15)
    /home/kit/jstest/src/json/decode.go:799 +0xdf
json.(*decodeState).value(0xc420084360, 0x4b8b00, 0xc420012300, 0x15)
    /home/kit/jstest/src/json/decode.go:405 +0x32e
json.(*decodeState).unmarshal(0xc420084360, 0x4b8b00, 0xc420012300, 0x0, 0x0)
    /home/kit/jstest/src/json/decode.go:184 +0x224
json.Unmarshal(0xc42000e3f8, 0x4, 0x8, 0x4b8b00, 0xc420012300, 0x8, 0x0)
    /home/kit/jstest/src/json/decode.go:104 +0x148
main.main()
    /home/kit/jstest/src/jstest/main.go:16 +0x1af

Code leading to that output:

package main

// Note "json" is the local copy of the "encoding/json" source that I modified.
import (
    "fmt"
    "json"
)

func main() {
    for _, data := range []string{
        `{"foo":"bar"}`,
        `{}`,
        `null`,
    } {
        m := make(map[string]string)
        fmt.Printf("%#q: ", data)
        if err := json.Unmarshal([]byte(data), m); err != nil {
            fmt.Println(err)
        } else {
            fmt.Println(m == nil, m)
        }
    }
}

The key is this bit here:

reflect.Value.Set using unaddressable value

Because you passed a copy of the map, it's unaddressable (i.e. it has a temporary address or even no address from the low-level machine perspective). I know of one way around this (x := new(Type) followed by *x = value, except using the reflect package), but it doesn't actually solve the problem; you're creating a local pointer that can't be returned to the caller and using it instead of your original storage location!

So now try a pointer:

        if err := json.Unmarshal([]byte(data), m); err != nil {
            fmt.Println(err)
        } else {
            fmt.Println(m == nil, m)
        }

Output:

`{"foo":"bar"}`: false map[foo:bar]
`{}`: false map[]
`null`: true map[]

Now it works. Bottom line: use pointers if the object itself may be modified (and the docs say it might be, e.g. if null is used where an object or array (map or slice) is expected.