I'm maintaining a small program that goes through documents in a Neo4j database and dumps a JSON-encoded object to a document database. In Neo4j—for performance reasons, I imagine—there's no real data, just ID's.
Imagine something like this:
posts:
post:
id: 1
tags: 1, 2
author: 2
similar: 1, 2, 3
I have no idea why it was done like this, but this is what I have to deal with. The program then uses the ID's to fetch information for each data structure, resulting in a proper structure. Instead of author
being just an int
, it's an Author
object, with name, email, and so on.
This worked well until the similar
feature was added. Similar
consists of ID's referencing other posts. Since in my loop I'm building the actual post objects, how can I reference them in an efficient manner? The only thing I could imagine was creating a cache with the posts I already "converted" and, if the referenced ID is not in the cache, put the current post on the bottom of the list. Eventually, they will all be processed.
The approach you're proposing won't work if there are cycles of similar
relationships, which there probably are.
For example, you've shown a post 1
that is similar to a post 2
. Let's say you come across post 1
first. It refers to post 2
, which isn't in the cache yet, so you push post 1
back onto the end of the queue. Now you get to post 2
. It refers to post 1
, which isn't in the cache yet, so you push post 2
back onto the end of the queue. This goes on forever.
You can solve this problem by building the post objects in two passes. During the first pass, you make Post
objects and fill them with all the information except for the similar
references, and you build up a map[int]*Post
that maps ID numbers to posts. On the second pass, for each post, you iterate over the similar
ID numbers, look up each one in the map, and use the resulting *Post
values to fill a []*Post
slice of similar posts.