At the moment I have this:
<?php
$stran = file_get_contents("http://meteo.arso.gov.si/uploads/probase/www/fproduct/text/sl/fcast_si_text.html");
$stran = str_replace("<h2>","
",$stran);
$stran = str_replace("</h2>","
",$stran);
$stran = str_replace("<h1>","
",$stran);
$stran = str_replace("</h1>","
",$stran);
$stran = strip_tags($stran);
echo $stran;
?>
Now this gives me some empty lines at the top. I also want to delete every text after "Vir: Državna meteorološka služba RS (meteo.si - ARSO)" including empty lines before this string.
I've tried some regular expressions but the all delete all text. Hot do I do it?
Can be done using regex.
// Convert h1/h2 opening/closing tags to new line, ignore case
$stran = preg_replace('/<\/?h[12]>/i', "
", $stran);
$stran = strip_tags($stran);
// Remove all leading whitespace
$stran = preg_replace('/^\s+/', '', $stran);
// Remove everything after "Vir: ..."
$stran = preg_replace('/(?<=Vir: Državna meteorološka služba RS \(meteo.si - ARSO\)).*/s', '', $stran);
Generally speaking I would recommend to really parse the html to extract the information. Have a look at http://php.net/manual/en/class.domdocument.php