I have a txt file that I open for reading with fopen. I then try to echo the rows on the screen using
<xmp>... contents ... </xmp>
One of the rows reads something like:
"aut\xf3k\xf6lcs\xf6nz\xe9s budapest kauci\xf3 n\xe9lk\xfcl"
Can someone tell me how do properly decode this?
#!/usr/bin/php -q
<?php
$read_handle = fopen("somefile.txt", "r");
$write_handle = fopen("write.csv", "w");
if ($read_handle) {
while (($buffer = fgets($read_handle, 4096)) !== false) {
// Some modifications to the buffer here, converting it to CSV format
@fwrite($write_handle, $buffer."
");
}
}
if (!feof($read_handle)) {
echo "Error: unexpected fgets() fail
";
}
@fclose($read_handle);
@fclose($write_handle);
}
?>
This script runs on the command line and when I then "tail" the resulting CSV, it shows the above encoding. When I import the CSV to MySQL, it shows me the same result. Similair when opening the CSV in OpenOffice.
The txt file is an export from Google BigQuery, using the following command
bq -q --format=pretty query "SELECT QUERY HERE" > somefile.txt
You may think, why not directly make the BigQuery command line tool output a CSV file, but that's because it triggers some bug in the system that also has to do with this encoding...
This sounds like a bug in the BigQuery CLI. By default strings are UTF-8 on the way in and UTF-8 on the way out. However it appears that there is a printing problem combining Unicode and non-Unicode strings in the client...
To double check that this is a client problem, you can pass the flag "--apilog=" and inspect the message request/response for the query. If the response is correct but the result printed by the client is wrong, then this is definitely a client issue.
If you have some sample data that you can upload and query to repro this problem, please open an issue at http://code.google.com/p/google-bigquery-tools/issues/list so we can make sure to fix your specific issue.
Thank you!
Note that with the BigQuery command line tool, you can create a new table from a query, then export that query to CSV.
# Run Query:
bq query --destination_table=mydataset.baby_table "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"
# Extract data to CSV:
bq extract mydataset.baby_table gs://mybucket/baby_table.csv