How is it possible to find all hidden relationship among tags been extracted from multiple documents?
My dataset is based on set of documents, each of which I extract from group of tags, my mission is to find relation among different tags from different documents:
Doc_id tags
1 a, b, c
2 c, k, m
3 m, n, p
Results of hidden relation should be as:
a -> k using c
b -> m using c
a -> n using c, m (a->c->m->n)
and so on.
What you're describing with your tags is a graph, whose nodes would be your tags, and vertices your relation. You can read the following wikipedia page if you want:
https://en.wikipedia.org/wiki/Graph_theory
You can also read this one, more focused on the implementation of a graph:
https://en.wikipedia.org/wiki/Graph_(abstract_data_type)
What you want to know is if two tags (in a graph, two nodes) are related but, not directly (in the same class, but not neighbors).
You can code it yourself, or look for a good implementation already working. This one for example: https://github.com/clue/graph (don't know if it's really good, just searched on github and took the first result).
Implement the graph yourself can be a good training. I think OOP is the best approach for this.
What you need now is to know if there is a road between two tags. In your example, each tag has a road to an other one, but that's not always the case.
Searching for paths in graph is not a trivial algorithm, and implementation can be hard. There are a lot of ways of doing it.
A naive solution is to keep an array of already visited nodes. You start from a node, set this node as visited, and add all his neighbors to an array of accessible nodes. Then you do the same thing with all the neighbors. When you encounter an already visited node, you don't do anything (otherwise you'll get an infinite loop). When the algorithm stop, you have an array of all accessible tags from your start tag.
Hope that helps, I will wait for your answer to see if you need something else. Consider that this algorithm is probably already implemented in the clue/graph repository that I linked.