My environment is Windows Server 2016 and IIS 10. In my PHP script I’m trying to run Google Chrome in a headless mode to get html code of an external web page:
<?php
$chromeApp = "C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe";
$command = "\"$chromeApp\" --headless --disable-gpu \
--dump-dom $urladdress > page.html";
exec ($command);
?>
That code works if I run
>C:\php script.php
from the Command line. It also works if I run the actual command:
>"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" \
--headless --disable-gpu --dump-dom https://google.com > page.html
But if I run that script from a browser it creates empty page.html file and hungs till timeout. However if I restart IIS during its execution I get the page.html file filled with the needed data.
What could be a problem here?
You have 4 processes in play here.
CMD.exe is taking the output of Chrome.exe and piping it to your file. It will do that upon completion of Chrome.exe or may do it partially intermittently. When I run similar code to yours above, my Chrome.exe does not finish. I can see Chrome.exe still running in TaskManager consuming 25% CPU (100% on one of my cores).
I'm guessing restarting IIS somehow forces the flush in progress of the commands. In most of my cases, there was data inside the page.html file prior to doing IISReset, thought not all of it. (Windows Explorer showed 0KBs, but opening the file showed data in the file nonetheless).
As for things to try, try at --no-sandbox as an argument as that may be interfering since the process is running under a non-interactive session.
this is not an answer, but too much to put in a comment, exec() doesn't really give much feedback,
first don't do this:
$command = "\"$chromeApp\" ";
because different shells can't agree on how stuff should be quoted, so you should use the escapeshellarg() function instead, also don't do this
--dump-dom $urladdress > page.html
because $urladdress may need to be escaped (and if hackers are able to control your $urladdress, then this is actually an arbitrary code execution vulnerability), do this instead:
$command = escapeshellarg($chromeApp)." --headless --disable-gpu \
--dump-dom ".escapeshellarg($urladdress)." > page.html";
(and if your page.html
may have names with special characters too, you should run that name through escapeshellarg() as well.)
but replace exec() with proc_open, tell me what you get from running this:
<?php
declare(strict_types=1);
$urladdress="http://google.com";
$chromeApp = _cygwinify_filepath("C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe");
$command = escapeshellarg($chromeApp)." --headless --disable-gpu --dump-dom ".escapeshellarg($urladdress);
$descriptorspec = array(
0 => array("pipe", "rb"), // by default stdin is inherited, we don't want that so we create a stdin pipe just so we can fclose() it.
1 => array("pipe", "wb"), // stdout
2 => array("pipe", "wb"), // stderr
);
$proc=proc_open($command,$descriptorspec,$pipes);
if(!$proc){
throw new \RuntimeException("failed to create process! \"{$command}\"");
}
$stdout="";
$stderr="";
$fetch=function()use(&$stdout,&$stderr,&$pipes){
$tmp=stream_get_contents($pipes[1]);
if(is_string($tmp) && strlen($tmp) > 0){
$stdout.=$tmp;
}
$tmp=stream_get_contents($pipes[2]);
if(is_string($tmp) && strlen($tmp) > 0){
$stderr.=$tmp;
}
};
fclose($pipes[0]);
$status=array();
while(($status=proc_get_status($proc))['running']){
$fetch();
}
$fetch();
fclose($pipes[1]);
fclose($pipes[2]);
proc_close($proc);
var_dump($stdout,$stderr,$status);
function _uncygwinify_filepath(string $path) : string
{
static $is_cygwin_cache = null;
if ($is_cygwin_cache === null) {
$is_cygwin_cache = (false !== stripos(PHP_OS, "cygwin"));
}
if ($is_cygwin_cache) {
return trim(shell_exec("cygpath -aw " . escapeshellarg($path)));
} else {
return $path;
}
}
function _cygwinify_filepath(string $path) : string
{
static $is_cygwin_cache = null;
if ($is_cygwin_cache === null) {
$is_cygwin_cache = (false !== stripos(PHP_OS, "cygwin"));
}
if ($is_cygwin_cache) {
return trim(shell_exec("cygpath -a " . escapeshellarg($path)));
//return "/cygdrive/" . strtr($path, array(':' => '', '\\' => '/'));
} else {
return $path;
}
}
edit: i wrote use(&$stdout,$stderr,&$pipes)
instead of use(&$stdout,&$stderr,&$pipes)
, sorry, fixed. re-run it if you just ran it without this fix.