I have been asked to get the quantities and ingredient from ingredient list. I have successfully extracted quantity and ingredient from list but i am stuck where one half characters occurs such as ¼, ½, etc. They are not encoded in HTML and my regex fails there.
Here is the list of ingredients
$subject = array("1 teaspoon salt",
"¼ teaspoon black pepper",
"1 cup all-purpose flour",
"1 ½ - 2 cups shredded Parmesan cheese");
preg_replace ('/(([0-9][\s+]*[\-]*[0-9]*[\s+]*)(teaspoon|tablespoons|cup|cups)*)([a-z0-9\s]+)/','Quantity: $1 Name: $4',$food)
Quantity: 1 teaspoon Name: salt
¼ teaspoon black pepper (failed)
Quantity: 1 cup Name: all-purpose flour
Quantity: 1 Name: ½ - Quantity: 2 cup Name: s shredded Parmesan cheese (failed)
Incorporate the Unicode code points into your regular expression. At first blush, it looks like the appropriate codepoints are [\x{00BC}-\x{00BE}\x{2153}-\x{215E}]
You'll also have to use a u
modifier, to enable unicode in preg.