I use cloud vision to annotate documents with DOCUMENT_TEXT_DETECTION, and I only use the words data.
The annotate command returns a lot of information for each letter/symbol (languages, vertices, breaks, text, confidence, ...) which adds up to a lot of memory usage. Running annotate on a 4 pages document¹ return over 100MB of data, which is past my php memory limit, causing the script to crash. Getting only the words data would probably be about 5 times smaller.
To be clear, I load the VisionClient, set up the image, run the annotate() command, and it returns a 100MB variable directly, crashing at this point, before I get the chance to do any cleaning.
$vision = new VisionClient([/* key & id here */]);
$image = $vision->image(file_get_contents($imagepath), ['DOCUMENT_TEXT_DETECTION']);
$annotation = $vision->annotate($image); // Crash at that point trying to allocate too much memory.
Is there a way to not request the entirety of the data? The documentation on annotate seems to indicate that it's possible to annotate only part of the picture, but not to toss the symbols data.
At a more fundamental level, am I doing something wrong here regarding memory management in general?
Thanks
Edit : Just realized : I also need to store the data in a file, which I do using serialize()... which double the memory usage when ran, even if I do $annotation = serialize($annotation) to avoid having 2 variables. So I'd actually need 200MB per user.
¹ Though this is related to the amount of text rather than the amount of pages.
Dino,
When dealing with large images, I would highly recommend uploading your image to Cloud Storage and then running the annotation request against the image in a bucket. This way you'll be able to take advantage of the resumable or streaming protocols available in the Storage library to upload your object with more reliability and with less memory consumption. Here's a quick snippet of what this could look like using the resumable uploader:
use Google\Cloud\Core\Exception\GoogleException;
use Google\Cloud\Storage\StorageClient;
use Google\Cloud\Vision\VisionClient;
$storage = new StorageClient();
$bucket = $storage->bucket('my-bucket');
$imageName = 'my-image.png';
$uploader = $bucket->getResumableUploader(
fopen('/path/to/local/image.png', 'r'),
[
'name' => $imageName,
'chunkSize' => 262144 // This will read data in smaller chunks, freeing up memory
]
);
try {
$uploader->upload();
} catch (GoogleException $ex) {
$resumeUri = $uploader->getResumeUri();
$uploader->resume($resumeUri);
}
$vision = new VisionClient();
$image = $vision->image($bucket->object($imageName), [
'FACE_DETECTION'
]);
$vision->annotate($image);