I wanna read some text files in a folder line by line. for example of 1 txt :
Fast and Effective Text Mining Using Linear-time Document Clustering
Bjornar Larsen WORD2 Chinatsu Aone
SRA International AK, Inc.
4300 Fair Lakes Cow-l Fairfax, VA 22033
{bjornar-larsen, WORD1
I wanna remove line that does not contain of words = word
, word2
, word3
, and does not end with dot .
so. from the example, the result will be :
Bjornar Larsen WORD2 Chinatsu Aone
SRA International, Inc.
{bjornar-larsen, WORD1
I am confused, hw to remove the line? it that possible? or can we replace them with a space?
here's the code :
$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
$handle = fopen($files, "r") or die ('can not open file');
$ori_content= file_get_contents($files);
foreach(preg_split("/((?
)|(
?))/", $ori_content) as $buffer){
$pos1 = stripos($buffer, $word1);
$pos2 = stripos($buffer, $word2);
$pos3 = stripos($buffer, $word3);
$last = $str[strlen($buffer)-1];//read the las character
if (true !== $pos1 OR true !== $pos2 OR true !==$pos3 && $last != '.'){
//how to remove
}
}
}
please help me, thank you so much :)
You're using a !== true
comparison to test the return-value of the stripos
. !== true
means "is not absolutely equal-to the boolean value true". The return-value of stripos
is numeric, unless the word doesn't exist, in which case it's false
. In other words, your condition is always false.
Try updating it to use === false
instead. Also, you're using OR
in between each; Your example shows that it needs to only contain 1 of them - so if you're checking that "none of them were found", you'll need to use &&
for everything:
if (($pos1 === false) && ($pos2 === false) && ($pos3 === false) && ($last != '.'))
Regarding "how to remove the line", you'll need to keep a list of all lines you want to keep. This means, we'll actually want to flip the condition above to use !== false
and an ||
between everything (because we want to keep all lines that match any rule).
Try something like this:
$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
$handle = fopen($files, "r") or die ('can not open file');
$ori_content= file_get_contents($files);
$linesToKeep = array(); // list of all lines that match our rules
foreach(preg_split("/((?
)|(
?))/", $ori_content) as $buffer){
$pos1 = stripos($buffer, $word1);
$pos2 = stripos($buffer, $word2);
$pos3 = stripos($buffer, $word3);
$last = $str[strlen($buffer)-1];
if (($pos1 !== false) || ($pos2 !== false) || ($pos3 !== false) || ($last == '.')) {
$linesToKeep[] = $buffer; // save this line
}
}
// process list of lines for this file;
// file_put_contents($files, join("
", $linesToKeep)); // write back to file
// $lines = join("
", $linesToKeep); // convert to string to manipulate
}
Now, you'll have every line that matches your ruleset in the $linesToKeep
array. You can convert this back to a string with $lines = join(" ", $linesToKeep);
, or iterate through it and process it however you'd like.
You'll need to create a secondary buffer.
$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
$handle = fopen($files, "r") or die ('can not open file');
$ori_content= file_get_contents($files);
/* Create our second buffer */
$buffer2 = "";
foreach(preg_split("/((?
)|(
?))/", $ori_content) as $buffer){
$pos1 = stripos($buffer, $word1);
$pos2 = stripos($buffer, $word2);
$pos3 = stripos($buffer, $word3);
$last = $str[strlen($buffer)-1];//read the last character
/* This will only execute if the three words and a trailing period are _not_ found */
if ($pos1 === false && $pos2 === false && $pos3 === false && $last != '.') {
$buffer2 .= $buffer . PHP_EOL;
}
}
}
echo $buffer2;
Nice approach... But you can use arrays
to read in your file and put it your file. Till now it is fine.
$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
$handle = fopen($files, "r") or die ('can not open file');
$ori_content= file_get_contents($files);
# Declare a variable array to store the contents.
$fileContents = array();
foreach(preg_split("/((?
)|(
?))/", $ori_content) as $buffer){
$pos1 = stripos($buffer, $word1);
$pos2 = stripos($buffer, $word2);
$pos3 = stripos($buffer, $word3);
$last = $str[strlen($buffer)-1];//read the las character
if (($pos1 !== false) || ($pos2 !== false) || ($pos3 !== false) || ($last == '.')){
$fileContents[] = $buffer;
}
}
# Put the contents
file_put_contents($file, implode(PHP_EOL, $fileContents);
}
I would just use explode:
$handle = fopen($files, "r") or die ('can not open file');
$ori_content = file_get_contents($files);
$lines = explode ( '
' , $ori_content );
foreach ( $lines AS $line )
{
if (strpos ( $line , 'word' ) !== false OR strpos ( $line , 'word2' ) !== false OR strpos ( $line , 'word3' ) !== false OR substr ( $line , -1 ) == '.')
{
$newParagraph = $line . '
';
}
}
echo $newParagraph;
Much simpler than what you were trying to do.
Try
$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
$lines = file($files);
foreach ($lines as $key=>$line) {
if (!preg_match('/(word|word2|word3)/i', $line) && substr($line, -1) != '.') {
unset($lines[$key]);
}
}
$ori_content = implode("
", $lines);
}