Given the sign capability from Go NaCl library (https://github.com/golang/crypto/tree/master/nacl/sign), how to sign a file, especially, a very large file as big as more than 1GB? Most of the internet search results are all about signing a slice or small array of bytes.
I can think of 2 ways:
For signing very large files (multiple gigabytes and up), the problem of using a standard signing function is often runtime and fragility. For very large files (or just slow disks) it could perhaps take hours or more just to serially read the full file from start to end.
In such cases, you want a way to process the file in parallel. One of the common ways to do this which is suitable for cryptographic signatures is Merkle tree hashes. They allow you to split the large file into smaller chunks, hash them in parallel (producing "leaf hashes"), and then further hash those hashes in a tree structure to produce a root hash which represents the full file.
Once you have calculated this Merkle tree root hash, you can sign this root hash. It then becomes possible to use the signed Merkle tree root hash to verify all of the file chunks in parallel, as well as verifying their order (based on the positions of the leaf hashes in the tree structure).
The problem with NaCl is that you need to put the whole message into RAM, as per godoc:
Messages should be small because: 1. The whole message needs to be held in memory to be processed. 2. Using large messages pressures implementations on small machines to process plaintext without verifying the signature. This is very dangerous, and this API discourages it, but a protocol that uses excessive message sizes might present some implementations with no other choice. 3. Performance may be improved by working with messages that fit into data caches. Thus large amounts of data should be chunked so that each message is small.
However, there are various other methods. Most of them basically do what you described in the first way. You basically copy the file contents into an io.Writer
which takes the contents and calculates the hash sum - this is most efficient.
The code below is pretty hacked, but you should get the picture. I achieved an average throughput of 315MB/s with it.
package main
import (
"crypto/ecdsa"
"crypto/elliptic"
"crypto/rand"
"crypto/sha256"
"flag"
"fmt"
"io"
"log"
"math/big"
"os"
"time"
)
var filename = flag.String("file", "", "file to sign")
func main() {
flag.Parse()
if *filename == "" {
log.Fatal("file can not be empty")
}
f, err := os.Open(*filename)
if err != nil {
log.Fatalf("Error opening '%s': %s", *filename, err)
}
defer f.Close()
start := time.Now()
sum, n, err := hash(f)
duration := time.Now().Sub(start)
log.Printf("Hashed %s (%d bytes)in %s to %x", *filename, n, duration, sum)
log.Printf("Average: %.2f MB/s", (float64(n)/1000000)/duration.Seconds())
r, s, err := sign(sum)
if err != nil {
log.Fatalf("Error creatig signature: %s", err)
}
log.Printf("Signature: (0x%x,0x%x)
", r, s)
}
func sign(sum []byte) (*big.Int, *big.Int, error) {
priv, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
if err != nil {
log.Printf("Error creating private key: %s", err)
}
return ecdsa.Sign(rand.Reader, priv, sum[:])
}
func hash(f *os.File) ([]byte, int64, error) {
var (
hash []byte
n int64
err error
)
h := sha256.New()
// This is where the magic happens.
// We use the efficient io.Copy to feed the contents
// of the file into the hash function.
if n, err = io.Copy(h, f); err != nil {
return nil, n, fmt.Errorf("Error creating hash: %s", err)
}
hash = h.Sum(nil)
return hash, n, nil
}