How can I determine whether a string was compressed with gzcompress
(aparts from comparing sizes of string before/after calling gzuncompress
, or would that be the proper way of doing it) ?
A string and a compressed string are both simply sequences of bytes. You cannot really distinguish one sequence of bytes from another sequence of bytes. You should know whether a blob of bytes represents a compressed format or not from accompanying metadata.
If you really need to guess programmatically, you have several things you can try:
0x20
. Those bytes aren't typically used in regular text. There's no real guarantee that they occur in a compressed string though.mb_check_encoding
to see whether a string is valid in the encoding you suspect it to be in. If it isn't, it's probably compressed (or you checked for the wrong encoding). With the caveat that virtually any byte sequence is valid in virtually every single-byte encoding, so this'll only work for multi-byte encodings.PRE:
I guess, if you send a request, you can immediately look into $http_response_header
to see if the one of the items in the array is a variation of Content-Encoding: gzip
. But this is LAME!
there is a far better method.
Here is HOW TO...
according to GZIP RFC:
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
the ID1
and ID2
identify the content as GZIP. And CM
states that the ZLIB_ENCODING
(the compression method) is ZLIB_ENCODING_DEFLATE
- which is customarily used by GZIP with all web-servers.
oh! and they have fixed values:
"\x1f"
"\x8b"
"\x08"
(or just 8...)$is_gzip = 0 === mb_strpos($mystery_string , "\x1f" . "\x8b" . "\x08");
<?php
/** @link https://gist.github.com/eladkarako/d8f3addf4e3be92bae96#file-checking_gzip_like_a_boss-php */
date_default_timezone_set("Asia/Jerusalem");
while (ob_get_level() > 0) ob_end_flush();
mb_language("uni");
@mb_internal_encoding('UTF-8');
setlocale(LC_ALL, 'en_US.UTF-8');
header('Time-Zone: Asia/Jerusalem');
header('Charset: UTF-8');
header('Content-Encoding: UTF-8');
header('Content-Type: text/plain; charset=UTF-8');
header('Access-Control-Allow-Origin: *');
function get($url, $cookie = '') {
$html = @file_get_contents($url, false, stream_context_create([
'http' => [
'method' => "GET",
'header' => implode("
", [''
, 'Pragma: no-cache'
, 'Cache-Control: no-cache'
, 'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2310.0 Safari/537.36'
, 'DNT: 1'
, 'Accept-Language: en-US,en;q=0.8'
, 'Accept: text/plain'
, 'X-Forwarded-For: ' . implode(', ', array_unique(array_filter(array_map(function ($item) { return filter_input(INPUT_SERVER, $item, FILTER_SANITIZE_SPECIAL_CHARS); }, ['HTTP_X_FORWARDED_FOR', 'REMOTE_ADDR', 'HTTP_CLIENT_IP', 'SERVER_ADDR', 'REMOTE_ADDR']), function ($item) { return null !== $item; })))
, 'Referer: http://eladkarako.com'
, 'Connection: close'
, 'Cookie: ' . $cookie
, 'Accept-Encoding: gzip'
])
]]));
$is_gzip = 0 === mb_strpos($html, "\x1f" . "\x8b" . "\x08", 0, "US-ASCII");
return $is_gzip ? zlib_decode($html, ZLIB_ENCODING_DEFLATE) : $html;
}
$html = get('http://www.pogdesign.co.uk/cat/');
echo $html;
UTF-8
(since we don't really know if the web-server will return a GZIP content.Accept-Encoding: gzip
, tells the web-sever, it may output a GZIP content.ZLIB
methods.