I'm currently playing with some ideas wrt CRF-ish work and I have an idea that I need help with.
I've got a bunch of function objects (think something expensive like neural nets). They are applied onto a linear buffer (think an array of float
s or byte
s) but at varying intervals. So they look like that (think of Start and End as "apply Object to buf[Start:End]
":
| Object | Start | End |
|--------|-------|-----|
| A | 0 | 4 |
| B | 4 | 10 |
| C | 13 | 15 |
[4:10]
to [4:12]
.[4:10]
to [3:12]
, A would have to be applied to the range [0:3]
and B would have to be applied to the range [3:12]
The tasks can be summarized as follows:
Is there any good data structure that I'm missing out on that would make these tasks easier?
Am I missing out on something blindingly obvious?
A friend suggested I look up incremental compilation methods because it's similar. An analogy used would be that Roslyn would parse/reparse fragments of text in a ranged fashion. That would be quite similar to my problem - just replace linear buffer of floats with linear buffer of tokens.
The problem is I couldn't find any solid useful information about how Roslyn does it.
This solution isn't particularly memory-efficient, but if I understand you correctly, it should allow for a relatively simple implementation of the functionality you want.
Keep an array or slice funcs
of all your function objects, so that they each have a canonical integer index, and can be looked up by that index.
Keep a slice of ints s
that is always the same size as your buffer of floats; it maps a particular index in your buffer to a "function index" in the slice of functions. You can use -1 to represent a number that is not part of any interval.
Keep a slice of (int, int) pairs intervals
such that intervals[i]
contains the start-end indices for the function stored at funcs[i]
.
I believe this enables you to implement your desired functionality without too much hassle. For example, to query by index i
, look up s[i]
, then return funcs[s[i]]
and intervals[s[i]]
. When changes occur to the buffer, change s
as well, cross-referencing between s
and the intervals
slice to figure out if neighboring intervals are affected. I'm happy to explain this part in more detail, but I don't totally understand the requirements for interval updates. (When you do an interval insert, does it correspond to an insert in the underlying buffer? Or are you just changing which buffer elements are associated with which functions? In which case, does an insert cause a deletion at the beginning of the next interval? Most schemes should work, but it changes the procedure.)