I am working on a project where some strings are hashed. To make sure that I always get the right result I would like to normalize them before I hash them. ... and there is the Unicode norm package for that. So far so good.
I do not want to store the normalized form, already have the data stored in its raw from - that I assume the customer likes it. I would like years later if I am asked to calculate the hash to the same string I to get the same result. Now if the standard improves or there was a bug that was fixed using the latest version of the library will allow for a different result. I do not care if the previous result was not perfect - I just want the same one.
My question is: what might be a good way to enforce consistency - avoiding my own implementation.
Can't you just store the hash for the texts? That way you don't even have to normalize and calculate them when you need them.
If you can't or don't want to: you don't have to use the latest version of the normalization package. You can pick the current latest version (or any commit you choose), and put it in a vendor folder. And so you can use this "fixed" version years later too. For vendoring details, see Package version management in Go 1.5.
Just think about it: if the norm
package is changed (e.g. a bug is fixed) and produces a different output (normalized text), how else would you be able to get the "old" output without having the code that produces the old output? Only by having the old version of the norm
package.
On the other hand, you said you need to calculate the hash to verify that the data was not tampered. If so, I don't see the purpose of normalization. You can just hash the text without normalization, if the original text is changed (e.g. it was "edited", but it wasn't really changed, just re-saved in normalized form), that can be taken as "tampered".