I have a script written by a friend that gets all the contents from a directory of .txt files and uploads them into a database alongside some other information.
aka: filename | Contents
Each file's contents - simple text info - is stored in a corresponding database entry. It's been working very well so far, but the contents of a new bunch of text files simply aren't being read. The filenames are read fine and that info is imported into the database easily. It's just the actual contents. Old .txt files that I've imported previously still are imported perfectly.
Examples of files are here: Working / Not-Working
Long story short - does anyone know why the contents of some .txt files can be read and not others? Encoding issues possibly, etc? (though they're from the same person and look identical) I'm losing my mind.
Thanks!
$dir = 'text';
//createxml(10);exit;
$time_start = microtime(true);
$files = scandir($dir);
natsort($files);
foreach ($files as $v) {
if ($v != "." && $v != ".." && $v != "thumbs" && $v != ".DS_Store") {
//get work done
$text = file_get_contents($dir.'/'.$v);
//get volume, page, county
$ta = explode('.',$v);
$ma = explode('_',$ta[0]);
$last = count($ma)-1;
$volume = '';
$year = '1999';
for ($i = 0; $i < $last; ++$i)
{
$volume .= $ma[$i].'_';
}
$volume = $mysqli->real_escape_string(rtrim($volume,'_'));
$pagenr = $mysqli->real_escape_string($ma[$last]);
$ntext = $mysqli->real_escape_string(getmtext($text));
$pdf = 'http://griffiths.****.ie/gv4/thoms/'.$volume.'/'.$volume.'_pg'.str_pad($pagenr, 4, "0", STR_PAD_LEFT).'.pdf';
$thumb = 'http://griffiths.****.ie/gv4/thoms/'.$volume.'/thumbs2/'.$volume.'_'.str_pad($pagenr, 4, "0", STR_PAD_LEFT).'.jpg';
//create sql
$echo[$volume] .= "('','$year','$pagenr','$volume','$ntext','$pdf','$thumb'),";
$excl[$volume]=true;
}
}
// check if there is volume already in DB
foreach ($excl as $k => $v) {
$volumes .= "'$k',";
}
$volumes = rtrim($volumes,',');
$excls ='';
if ($result = $mysqli->query("SELECT DISTINCT volume FROM thoms_copy2 WHERE volume in ($volumes)")) {
//found volumes already in DB
while ($r = $result->fetch_array(MYSQLI_NUM))
//we only need the new volumes, so we will ignore the rest
unset($echo[$r[0]]);
$result->close();
}
//create mysql string
foreach ($echo as $k => $v) {
$echot .= $v.',';
}
$echot = rtrim($echot,',');
if ($echot) {// if i have something to insert
//insert into DB
$sql = "INSERT INTO `thoms_copy2` (`id`,`year`,`main_page`, `volume`, `texty`, `pdf`, `thumb`) VALUES $echot";
if ($result = $mysqli->query($sql)) {
echo "Done.";
//create the XML file
createxml($mysqli->affected_rows);
} else {
printf("Error message: %s
", $mysqli->error);
echo "<br><br>$sql";
}
} else { echo "Done. Nothing new."; }
$time_end = microtime(true);
$time = $time_end - $time_start;
echo "<br>$time";
//functions ===============================================================
function getmtext($str) {
$text = '';
$words = str_word_count($str, 1);
foreach ($words as $word) {
if ($word[0] >= 'A' && $word[0] <= 'Z')
if (strlen($word)>1)
$text .= $word.' ';
}
return $text;
}
No, file_get_contents is equal to combination of fopen+fread+fclose, so it provides bytes as a result. If you have a wrong charset, it will not affect that fact, that your file consists from bytes (which will be returned by file_get_contents). Since you're not a script author, it's difficult to say, where's the problem, but you should be sure, that your files are accessible to your script (i.e. have correct permissions, for example).