I'm trying to convert an old HTML Site to a new CMS. To get the correct menu hierachy (with varying depth) I want to read all the files with PHP and extract/parse the menu (nested unordered lists) into an associative array
root.html
<ul id="menu">
<li class="active">Start</li>
<ul>
<li><a href="file1.html">Sub1</a></li>
<li><a href="file2.html">Sub2</a></li>
</ul>
</ul>
file1.html
<ul id="menu">
<li><a href="root.html">Start</a></li>
<ul>
<li class="active">Sub1</li>
<ul>
<li><a href="file3.html">SubSub1</a></li>
<li><a href="file4.html">SubSub2</a></li>
<li><a href="file5.html">SubSub3</a></li>
<li><a href="file6.html">SubSub4</a></li>
</ul>
</ul>
</ul>
file3.html
<ul id="menu">
<li><a href="root.html">Start</a></li>
<ul>
<li><a href="file1.html">Sub1</a></li>
<ul>
<li class="active">SubSub1</li>
<ul>
<li><a href="file7.html">SubSubSub1</a></li>
<li><a href="file8.html">SubSubSub2</a></li>
<li><a href="file9.html">SubSubSub3</a></li>
</ul>
</ul>
</ul>
</ul>
file4.html
<ul id="menu">
<li><a href="root.html">Start</a></li>
<ul>
<li><a href="file1.html">Sub1</a></li>
<ul>
<li><a href="file3.html">SubSub1</a></li>
<li class="active">SubSub2</li>
<li><a href="file5.html">SubSub3</a></li>
<li><a href="file6.html">SubSub4</a></li>
</ul>
</ul>
</ul>
I would like to loop through all files, extract 'id="menu"' and create an array like this (or similar) while keeping the hierarchy and file information
Array
[file] => root.html
[child] => Array
[Sub1] => Array
[file] => file1.html
[child] => Array
[SubSub1] => Array
[file] => file3.html
[child] => Array
[SubSubSub1] => Array
[file] => file7.html
[SubSubSub2] => Array
[file] => file8.html
[SubSubSub3] => Array
[file] => file9.html
[SubSub2] => Array
[file] => file4.html
[SubSub3] => Array
[file] => file5.html
[SubSub4] => Array
[file] => file6.html
[Sub2] => Array
[file] => file2.html
With the help of the PHP Simple HTML DOM Parser libray I successfully read the file and extracted the menu
$html = file_get_html($file);
foreach ($html->find("ul[id=menu]") as $ul) {
..
}
To only parse the active section of the menu (leaving out the links to got 1 or more levels up) I used
$ul->find("ul",-1)
which finds the last ul inside the outer ul. This works great for a single file.
But I'm having trouble to loop through all the files/menus and keep the parent/child information because each menu has a different depth.
Thanks for all suggestions, tips and help!
Edit: OK, this was not so easy after all :)
By the way, this library is really an excellent tool. Kudos to the guys who wrote it.
Here is one possible solution:
class menu_parse {
static $missing = array(); // list of missing files
static private $files = array(); // list of source files to process
// initiate menu parsing
static function start ($file)
{
// start with root file
self::$files[$file] = 1;
// parse all source files
for ($res=array(); current(self::$files); next(self::$files))
{
// get next file name
$file = key(self::$files);
// parse the file
if (!file_exists ($file))
{
self::$missing[$file] = 1;
continue;
}
$html = file_get_html ($file);
// get menu root (if any)
$root = $html->find("ul[id=menu]",0);
if ($root) self::menu ($root, $res);
}
// reorder missing files array
self::$missing = array_keys (self::$missing);
// that's all folks
return $res;
}
// parse a menu at a given level
static private function menu ($menu, &$res)
{
foreach ($menu->children as $elem)
{
switch ($elem->tag)
{
case "li" : // name and possibly source file of a menu
// grab menu name
$name = $elem->plaintext;
// see if we can find a link to the menu file
$link = $elem->children(0);
if ($link && $link->tag == 'a')
{
// found the link
$file = $link->href;
$res[$name]->file = $file;
// add the source file to the processing list
self::$files[$file] = 1;
}
break;
case "ul" : // go down one level to grab items of the current menu
self::menu ($elem, $res[$name]->childs);
}
}
}
}
Usage:
// The result will be an array of menus indexed by item names.
//
// Each menu will be an object with 2 members
// - file -> source file of the menu
// - childs -> array of menu subtitems
//
$res = menu_parse::start ("root.html");
// parse_menu::$missing will contain all the missing files names
echo "Result : <pre>";
print_r ($res);
echo "</pre><br>missing files:<pre>";
print_r (menu_parse::$missing);
echo "</pre>";
Ouput of your test case:
Array
(
[Start] => stdClass Object
(
[childs] => Array
(
[Sub1] => stdClass Object
(
[file] => file1.html
[childs] => Array
(
[SubSub1] => stdClass Object
(
[file] => file3.html
[childs] => Array
(
[SubSubSub1] => stdClass Object
(
[file] => file7.html
)
[SubSubSub2] => stdClass Object
(
[file] => file8.html
)
[SubSubSub3] => stdClass Object
(
[file] => file9.html
)
)
)
[SubSub2] => stdClass Object
(
[file] => file3.html
)
[SubSub3] => stdClass Object
(
[file] => file5.html
)
[SubSub4] => stdClass Object
(
[file] => file6.html
)
)
)
[Sub2] => stdClass Object
(
[file] => file2.html
)
)
[file] => root.html
)
)
missing files: Array
(
[0] => file2.html
[1] => file5.html
[2] => file6.html
[3] => file7.html
[4] => file8.html
[5] => file9.html
)
You could modify the code to have the (sub)menus as an array with numeric indexes and names as properties (so that two items with the same name would not overwrite each other), but that would complicate the structure of the result.
Should such name duplication occur, the best solution would be to rename one of the items, IMHO.
It could be modified to handle more than one, but that does not make much sense IMHO (it would mean a root menu ID duplication, which would likely cause trouble to the JavaScript trying to process it in the first place).
This is more like a directory tree with upward links. file1 on level 1 points to file3 on level 2, and this points back to file 1 on level 1 which causes the "different depth". Consider of setting up a particular menu-object pointing upwards and downwards and keeping lists of that instead of arrays of arrays of strings. Starting point for such a hierarchie in php could be a class like this:
class menuItem {
protected $leftSibling = null;
protected $rightSibling = null;
protected $parents = array();
protected $childs = array();
protected properties = array();
// set property like menu name or file name
function setProp($name, $val) {
$this->properties[$name] = $val;
}
// get a propertue if set, false otherwise
function getProp($name) {
if ( isset($this->properties[$name]) )
return $this->properties[$name];
return false;
}
function getLeftSiblingsAsArray() {
$sibling = $this->getLeftSibling();
$siblings = array();
while ( $sibling != null ) {
$siblings[] = $sibling;
$sibling = $sibling->getLeftSibling();
}
return $siblings;
}
function addChild($item) {
$this->childs[] = $item;
}
function addLeftSibling($item) {
$sibling = $this->leftSibling;
while ( $sibling != null ) {
if ( $sibling->hasLeft() )
$sibling = $sibling->getLeftSibling();
else {
$sibling->addFinalLeft($item);
break;
}
}
}
function addFinalLeft(item) {
$sibling->leftSibling = $item;
}
....