I'm building a web app that retrieves dynamic generated content through puppeteer. I have set up (apache + php) docker containers, one for the p5js project that generates an svg based on a (large, 2MB) json file, and one container with PHP that retrieves that svg. Dockers runs in an Nginx config (nginx for routing, apache for quicker PHP handling). I'm using the cheapest CENTOS server available on digitalocean. So upgrading would definitley help.
I don't want the javascript in the p5js project to be exposed to the public, so I thought a nodejs solution would be best in this scenario.
The PHP page does a shell_exec("node pup.js")
. It basically runs in approx 1-3 seconds which is perfect.
Problem is when I try to test a multi user scenario and open 5 tabs to run this PHP page, the loadtime drops to even 10+ seconds, which is killing for my app.
So the question would be how to set up this architecture (php calling a node command) for a multi user environment.
===
I've tried several frameworks like x-ray, nightmare, jsdom, cheerio, axios, zombie, phantom just trying to replace puppeteer. Some of the frameworks returned nothing, some just didn't work out for me. I think I just need a headless browser solution, to be able to execute the p5js. Eventually puppeteer gets the job done, only not in a multi-user environment (I think due to my current php shell_exec puppeteer architecture).
Maybe my shell_exec workflow was the bottleneck, so I ended up building a simple node example.js
which waits 5 seconds before finish (not using puppeteer), and I ran this with several tabs simultaneously, works like a charm. All tabs load in about 5-6 seconds.
I've also tried pm2
to test if my node command was the bottleneck, I did some testing on the commandline, with no major results and I couldn't get PHP to run a pm2 command, so I dropped this test.
I've tried setting PuPHPeteer up, but couldn't get it to run.
At some time I thought it had something to do with multiple puppeteer browsers launched, but I've read that this should be no problem.
The PHP looks like:
<?php
$puppeteer_command = "node /var/www/pup.js >&1";
$result = shell_exec($puppeteer_command);
echo $result;
?>
My puppeteer code:
const puppeteer = require('puppeteer');
let url = "http://the-other-dockercontainer/";
let time = Date.now();
let scrape = async () => {
const browser = await puppeteer.launch({
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto(url);
await page.waitForSelector('svg', { timeout: 5000 });
let svgImage = await page.$('svg');
await svgImage.screenshot({
path: `${time}.png`,
omitBackground: true,
});
await browser.close();
return time;
}
scrape().then((value) => {
console.log(value); // Success!
});
I was thinking about building the entire app in nodejs if that is the best solution, but I've put so many hours in this PHP infrastructure, I'm at the point of really like getting some advice :)
Since I have full control over the target and destination site one brainfart would be to have node to serve a server which accepts a json file and return the svg based on a local p5js site, but don't now (yet) if this would be any different.
UPDATE
So thanks to some comments, I've tried a new approach: not using p5js, but native processing code (java). I've exported the processing code to a linux 64bit application and created this little nodejs example:
var exec = require('child_process').exec;
var cmd = '/var/www/application.linux64/minimal';
exec(cmd, processing);
// Callback for command line process
function processing(error, stdout, stderr) {
// I could do some error checking here
console.log(stdout);
};
When I call this node example.js within a shell_exec in PHP, I get this:
First call takes about 2 seconds. But when I hit a lot of refreshes, time is again building up by a lot of seconds. So, clearly, my understanding of multithreading is not that good, or am I missing something crucial in my testing?