I'm working on a bit of code that reads through a file and returns it broken up into 'line' segments of a set number of bytes. Now I've run into a problem where at the 8192 byte mark in a file larger than 8192 bytes some strange behaviour happens where stream_get_meta_data()
returns unread_bytes
of 0
despite the fact that there is more to read, and a subsequent read gets it just fine but the formatting is jacked up.
I've reduced my code to a minimal example:
if(count($argv) > 1 && is_file($argv[count($argv)-1])) {
if( ! $fh = fopen($argv[count($argv)-1], 'r') ) {
die('Could not open file.');
}
} else if( $fh = STDIN ) {
} else { die('No file and could not open stdin'); }
$linesize = 24; // typical line size
$bufsize = 4096; // default buffer size
$buffer = '';
$offset = 0;
$output = '';
while($buffer .= fread($fh, $bufsize)) {
$s = strlen($buffer);
for( $i=0; $i<$s; $i+=$linesize ) {
// if we're not yet at the end of the file
// if there's not enough left in the buffer for a full line
// reset the buffer to the remainder of the buffer and break to outer loop
printf("off:%d s:%d i:%d b:%d
", $offset, $s, $i, ( $s-$i < $linesize ));
if( $s-$i < $linesize ) {
$meta = stream_get_meta_data($fh);
printf("break? unread_bytes:%d eof:%d
", $meta['unread_bytes'], $meta['eof']);
if( $meta['unread_bytes'] !== 0 ) {
$buffer = substr($buffer, $i);
echo "broke
";
continue 2;
}
}
// echo substr($buffer, $i, $linesize)."
";
$offset += $linesize;
}
$buffer = '';
}
fclose($fh);
Output:
off:0 s:4096 i:0 b:0
off:24 s:4096 i:24 b:0
off:48 s:4096 i:48 b:0
[snip]
off:4032 s:4096 i:4032 b:0
off:4056 s:4096 i:4056 b:0
off:4080 s:4096 i:4080 b:1
break? unread_bytes:4096 eof:0
broke
off:4080 s:4112 i:0 b:0
off:4104 s:4112 i:24 b:0
off:4128 s:4112 i:48 b:0
[snip]
off:8136 s:4112 i:4056 b:0
off:8160 s:4112 i:4080 b:0
off:8184 s:4112 i:4104 b:1
break? unread_bytes:0 eof:0
off:8208 s:303 i:0 b:0
off:8232 s:303 i:24 b:0
off:8256 s:303 i:48 b:0
off:8280 s:303 i:72 b:0
off:8304 s:303 i:96 b:0
off:8328 s:303 i:120 b:0
off:8352 s:303 i:144 b:0
off:8376 s:303 i:168 b:0
off:8400 s:303 i:192 b:0
off:8424 s:303 i:216 b:0
off:8448 s:303 i:240 b:0
off:8472 s:303 i:264 b:0
off:8496 s:303 i:288 b:1
break? unread_bytes:0 eof:0
[actual end of file]
You can see at the middle break the stream metadata indicates that there are 0 bytes left unread, but after that it breaks back to the outer loop and reads again. What's the deal?
Also, I'm not using feof($fh)
or $meta['eof']
because it doesn't seem to get set when reading from stdin which you can also see in the example.
The documentation states for unread_bytes
:
Note: You shouldn't use this value in a script.
That said, it seems you're making it a lot more complicated than strictly necessary:
while (!feof($fh)) {
$line = fread($fh, $linesize);
if (strlen($line) < $linesize) {
break;
}
// do what you want
}