I am have been trying to figure out a way I can detect series of files. For instance:
If a given directory has the following files:
I would like to get the condense the listing to something like
How should I go about detecting the groups?
Here's one way you can solve this, which is more efficient than a brute force method.
preg_replace('/\d//g', $key)
).You will have something like $arr1 = [Birthday001 => Birthday, Birthday002 => Birthday ...]
$arr2 = [Birthday => 2, ...]
Simply build a histogram whose keys are modified by a regex:
<?php
# input
$filenames = array("Birthday001.jpg", "Birthday002.jpg", "Birthday003.jpg", "Picknic1.jpg", "Picknic2.jpg", "Afternoon.jpg");
# create histogram
$histogram = array();
foreach ($filenames as $filename) {
$name = preg_replace('/\d+\.[^.]*$/', '', $filename);
if (isset($histogram[$name])) {
$histogram[$name]++;
} else {
$histogram[$name] = 1;
}
}
# output
foreach ($histogram as $name => $count) {
if ($count == 1) {
echo "$name ($count picture)
";
} else {
echo "$name ($count pictures)
";
}
}
?>
Generate an array of words like "my" (developing this array will be very important, "my" is the only one in your example given) and strip these out of all the file names. Strip out all numbers and punctuation, also extensions should be long gone at this point. Once this is done, put all of the unique results into an array. You can then use this as a fairly reliable source of keywords to search for any stragglers that the other processing didn't catch.