I have this code:
function toDataUri( $html )
{
# convert css URLs to data URIs
$html = preg_replace_callback( "#(url\([\'\"]?)([^\"\'\)]+)([\"\']?\))#", 'create_data_uri', $html );
return $html;
}
// callback function
private function create_data_uri( $matches )
{
$filetype = explode( '.', $matches[ 2 ] );
$filetype = trim(strtolower( $filetype[ count( $filetype ) - 1 ] ));
// replace ?whatever=value from extensions
$filetype = preg_replace('#\?.*#', '', $filetype);
$datauri = $matches[ 2 ];
$data = get_file_contents( $datauri );
if (! $data) return $matches[ 0 ];
$data = base64_encode( $data );
//compile and return a data: URI with the encoded image data
return $matches[ 1 ] . "data:image/$filetype;base64,$data" . $matches[ 3 ];
}
It basically searches for URLs with format url(path)
in HTML file and replaces them with base 64 Data URIS.
The problem is that if input html is few kilos such as 10kb, it takes ages to return the final response. Is there any optimization we can do in such case or any other solution you have that when given html, it searches for url(path)
matches and converts them to data uris ?
The expression is cheap already — starts with a fixed string and doesn't need backtracking.
In PCRE there's S
modifier that enables some regex optimisation, but it matters only for patterns without a fixed prefix.
It shouldn't be slow — 10KB isn't much for a simple regex like this. Perhaps the bottleneck is somewhere else?
url(
in the parsed file and no )
to the end of the file, then it'll scan a bit more. [^\"\'\)]{0,1000}
would limit that. But it's a minor optimisation that only makes a difference when you have pathological syntax errors in the file.()
around whole expression. 0th match is always capturing entire string.