I currently have the problem that I have a dataset of about 1000 entries.
Each entry has two relevant features:
weight
(float)origin
(string / another entity)I have to sort those entries into groups of max. four entries. Groups can contain less entries, though.
Now, the way those entries are being sorted into the groups depend on their features in the following way:
weight
within a group can be 10%.origin
as possible in each group. Having one duplicate is not so bad, but having three or more entries with the same origin
should be avoided.Within the dataset weight
has a range of roughly 20.0 to 120.0. There are about 50 different possible values for origin
.
I have to implement this in php, but answering with a php implementation is not necessary. The algorithm alone would be enough.
I have tried sorting all values for their weight
and then simply split them every fourth entry. But the groups I then get are hard to rearrange with regard to the origin
value. I think I could somehow get this done through a nasty implementation, but I hope there is a very elegant algorithm that can do just that.
Thanks in advance!
Here is a greedy that might give good results:
Sort entried by weight
groups = []
used = array of length len(entries) initialized in false
For i = 0 to len(entries):
if (used[i] == false):
group = [entries[i]]
j = i + 1
while(j < len(entries) and delta(group[0], entries[j]) < 10 and len(group) < 4):
if used[j] == false and entries[j].origin != all the origins in group:
group.add(entries[j])
used[j] = true
j = j + 1
if (len(group) < 4):
//decide if you prefer a small group or a bigger group with repeated origins
groups.add(group)