I am trying to parse PHPIDS's XML rule list (downloadable on their site http://phpids.org/), against the entered URL using the XML's regular expressions.
Now, i know very little about regex, and i've tried looking around and i haven't found much information on it which i've found helpful.
What i'd like to do is something like this (psuedo) if URL == regex die();
This is my latest attempt, of many:
<?php
$file="default_filter.xml";
$load = simplexml_load_file($file);
$regex = $load->filter->rule;
$url = explode(" ","http://localhost/test.php");
$url2 = "http://localhost/test.php";
if(in_array($regex,$url))
{
echo "bad url";
}
if(preg_match($regex,$url2))
{
echo "bad url";
}
//The above gives me Warning: preg_match() [function.preg-match]: Unknown modifier '|' in C:\wamp\www\test.php on line 12
//Which, already i don't understand regex so i dont know why the above is a problem...
?>
If i can get it to work, i'll loop through the rules, but for now i'm just trying 1 regex to get it working.
but i cannot figure out get the regex working.
This is the regex which is being pulled from the XML file:
<rule><![CDATA[(?:"[^"]*[^-]?>)|(?:[^\w\s]\s*\/>)|(?:>")]]></rule>
although i do not understand one bit of that...
Thanks in advance to anyone who can assit me.
About the only thing I can say is the delimiters are missing. Given that the $regex
contains only the regex and not the <![CDATA[
portion this should work:
if(preg_match('#'.$regex.'#',$url2))
Give that a shot.
I can't really see this part ever working: if(in_array($regex,$url)){echo "bad url";}
I'm not really sure what you are trying to achieve with that condition statement.
As for extracting the regex pattern from your XML rule, I can give you some guidance via the following test code:
$inputs=array(
"empty"=>'',
"doublequote-greater"=>'">"', // first regex condition match
"dollar-slash-greater"=>'$/>', // second regex condition match
"greater-doublequote"=>'>"', // third regex condition match
"text"=>'<a>'
);
$rule='<rule><![CDATA[(?:"[^"]*[^-]?>)|(?:[^\w\s]\s*\/>)|(?:>")]]></rule>';
$regex=(preg_match("/<rule><!\[CDATA\[\K.*?(?=\]\])/",$rule,$match)?"/$match[0]/":FALSE);
if($regex){
foreach($inputs as $k=>$v){
if(preg_match($regex,$v)){
echo "Bad ($k): $v
";
}else{
echo "Good ($k): $v
";
}
}
}else{
echo "Failed to extract regex pattern from XML rule: $rule";
}
Output:
Good (empty):
Bad (doublequote-greater): ">"
Bad (dollar-slash-greater): $/>
Bad (greater-doublequote): >"
Good (text):
For a break down of your regex pattern and to continue learning/experimenting, I recommend: https://regex101.com/