I am working on a CLI in Go that scrapes a webpage to collect the href attributes of all the links on the page into a slice. I want to store this slice in memory for some time so that the scraper is not being called on every execution of the CLI command. Ideally, the scraper would only be called after the cache expires or the user provides some sort of --update
flag.
I came across the library go-cache and other similar libraries, but from what I could tell they only work for something that is continuously running, like a server.
I thought about writing the links to a file, but then how would I expire the results after a specific duration? Would it make sense to create a small server in the background that shuts down after a while in order to use a library like go-cache
? Any help is appreciated.
There are two main approaches in these scenarios:
Create a daemon, service or background application that acts as your data repository. You can run it as an HTTP server / RPC server depending on your requirements. Your CLI application then interacts with this daemon as required;
Implement a persistence mechanism that will allow data to be written and read across multiple CLI application executions. You may use normal text files, databases or even an implementation of golang's encoding/gob to write and read your slice (a map would probably be better) to and from a binary file.
You can timestamp entries and simply remove them after their ttl expires by explicitly deleting them, or by simply not rewriting them during subsequent executions, according to the strategy / approach selected above.
The scope and number of examples for such an open ended question is too myriad to post in a single answer and will most likely require multiple specific questions.
Use a database and store as much detail as you can (fetched_at, host, path, title, meta_desc, anchors etc). You'll be able to query over the data later and it will be useful to have it in a structured format. If you don't want to deal with a db dependency you could embed something like boltdb (pure go) or sqlite (cgo).