服务器的PHP入侵了。有没有办法定位特定的字符串+标签？

A friend of mine (who doesn't like strong passwords ...) got his server hacked and i'm trying to help him.

Basically, a lot of php files on the server have had lots of stuff added on line 1 between two php tags.

Something like this:

1. <?php lots-Of-Stuff-I-Can-Target-Easily-With-Grep ?><?php
2. Line 2 and beyond, the file is unchanged.

So my question is: is there a way via grep - or something else - to just target the first php tags with everything in between to delete it? In the exemple above, on line 1: everything but the second opening php tag.

I know I could target line 1 and just replace with an opening php tag, but to avoid any problem, i would much prefer target and delete the exact thing i want out.

There is way too many files to do it by hand. Any ideas?

</div>

I fully agree with Ben Hilliers comment. (I mean especially the part with the back-up as I have only little knowledge about PHP.) However...

The following sed command can be used:

sed 's/^<?php .* ?>\(<?php.*\)$/\1/'

Alternatively, it can be done with awk:

awk '/<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'

or with the additional condition NR==1 to ensures that the test/replacement is done on first line only:

awk 'NR==1 && /<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'

One of these commands (cleverly combined with find) should do the job.

Please, notice the distinct escapings which are necessary in sed vs. awk.

And, as Ben Hillier already recommded: Don't forget to backup before.

Demonstration:

$ echo '<?php lots-Of-Stuff-I-Can-Target-Easily-With-Grep ?><?php' \
> | sed 's/^<?php .* ?>\(<?php.*\)$/\1/'
<?php

$ echo '<?php' | sed 's/^<?php .* ?>\(<?php.*\)$/\1/'
<?php

$ echo '<?php lots-Of-Stuff-I-Can-Target-Easily-With-Grep ?><?php' | awk '/<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'
<?php

$ echo '<?php' | awk '/<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'
<?php

$ echo '<?php lots-Of-Stuff-I-Can-Target-Easily-With-Grep ?><?php' | awk 'NR==1 && /<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'
<?php

$ echo '<?php' | awk 'NR==1 && /<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'
<?php

$ cat >test.txt <<EOF
> <?php lots-Of-Stuff-I-Can-Target-Easily-With-Grep ?><?php
> // contents
> // contents
> // contents
> ?>
> EOF

$ cat test.txt | sed 's/^<?php .* ?>\(<?php.*\)$/\1/'
<?php
// contents
// contents
// contents
?>

$ cat test.txt | awk '/<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'
<?php
// contents
// contents
// contents
?>

$ cat test.txt | awk 'NR==1 && /<\?php .* \?><\?php/ { $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0) }{ print }'
<?php
// contents
// contents
// contents
?>

$

Last but not least the awk script for human readers:

# catch 1st line with a duplicated "<?php"
NR==1 && /<\?php .* \?><\?php/ {
  # replace the line by everything including and after 2nd "<?php"
  $0 = gensub(/^<\?php.*\?>(<\?php.*)$/, "\\1", 1, $0)
}
{ print } # print any line

服务器的PHP入侵了。 有没有办法定位特定的字符串+标签？

服务器的PHP入侵了。有没有办法定位特定的字符串+标签？