I need to walk a filesystem in Golang and need to be able to resume scanning after restarting.
In C, I can do this via telldir()
and then a seekdir()
while resuming.
Golang only offers a filepath.Walk()
function but that provides no way to start walking the filesystem from a specific path or point.
This makes it inefficient for large filesystems
Any way around it?
The signature for filepath.walk
is:
func Walk(root string, walkFn WalkFunc) error
The documentation states that it starts at the directory specified by root
and the signature of the callback function (walkFn
) is:
type WalkFunc func(path string, info os.FileInfo, err error) error
So you can start your scan at any given directory and walk the filesystem with that as the root. You cannot start part-way through a directory, but you can selectively prune the tree that you are walking.
There's also a "magic" return value, filepath.SkipDir
, which skips either walking a directory (if returned when the callback is invoked on a directory) or the remaining files i the directory (if returned when the callback is invoked on a file).
This MAY be enough to get the behaviour that you want, but it is a little hard to tell from your question. You cannot break out of a filepath.Walk
invocation, then resume it later on. However, you may be able to work around that limitation by spawning goroutines from within your walkFn
callback, if you're mainly concerned with the callback taking time to complete.
You may write your own state machine:
1- Walk all the way through using filepath.Walk()
and buffer the result, then use this buffer with state (fast).
2- save the root string
for the paused state, then scan from the ground up until you reach the saved path (slow).
3- You may use channels, like this working sample:
(try on The Go Playground):
package main
import (
"fmt"
"os"
"path/filepath"
"sync"
"time"
)
var dirs = make(chan string, 10)
var wg sync.WaitGroup
func main() {
wg.Add(1)
go GetDirectories(`../`, `*`)
fmt.Println()
fmt.Println(<-dirs)
fmt.Println(<-dirs)
fmt.Println()
time.Sleep(1 * time.Second) // pause
for dir := range dirs {
fmt.Println(dir)
}
wg.Wait()
fmt.Println(`Done.`)
}
// Returns the names of the subdirectories (including their paths)
// that match the specified search pattern in the specified directory.
func GetDirectories(root, pattern string) {
defer wg.Done()
defer close(dirs)
filepath.Walk(root, func(path string, fi os.FileInfo, err error) error {
if !fi.IsDir() {
return nil
}
matched, err := filepath.Match(pattern, fi.Name())
if err != nil {
return err
}
if !matched {
return nil
}
dirs <- path //dirs = append(dirs, path)
return nil
})
}