最初未知时在neo4j中建模关系

I currently have some code that looks through various datasets and models electronic relationships between them. E.g., JSESSIONID.

I would like to model each user's interactions with an application where they have to submit unique identifiers. E.g., email address.

In processing logs of the application, I see emailA@host.com use the application with JSESSIONID asdfghjkl. I then see emailB@host.com also use the applcation with JESSIONID asdfghjkl. Finally, I see emailB@host.com use JSESSIONID qwertyuiop.

In my go code, it's easy for me to process the logs and write out both emailA@host.com and emailB@host.com as Nodes and then write the JSESSIONID relationship between them.

MERGE (a:EMAIL {label:userA@host.com}) MERGE (b:EMAIL {label:userB@host.com}) MERGE (a)-[:asdfghjkl]-(b)

However, I don't know the best way to do this at scale. (i.e., Application logs are 1TB in size). The limitation is memory -- I can't find all email addresses that use asdfghjkl as a SessionIDs without processing all the data, so I can't write out the relationship between them due to memory constraints.

What I would really like to do is to write out something as is follows, but this obviously fails:

MERGE (a:EMAIL {label:userA@host.com}) (a)-[:asdfghjkl]

Then later: MERGE (b:EMAIL {label:userB@host.com}) (b)-[:asdfghjkl]

Can I create these relationships with a query after the fact?

Sounds like you should model JSESSIONID as nodes rather than as relationships, as that will allow you to link the JSESSIONID to multiple email addresses, and you can add a unique constraint on the id for fast lookups.

MERGE (a:EMAIL {label:userA@host.com}) 
MERGE (b:EMAIL {label:userB@host.com}) 
MERGE (jsid:JSESSIONID {id:'asdfghjkl'})
MERGE (a)-[:jsid]->(jsid)
MERGE (b)-[:jsid]->(jsid)

Your queries to find all :EMAIL nodes using a specific JSESSION id should be quite fast:

MATCH (email:EMAIL)-[:jsid]->(jsid:JSESSIONID {id:'asdfghjkl'})
RETURN email