Perl 的常见问题? [打烊]

The question on Hidden features of Perl yielded at least one response that could be regarded as either a feature or a mis-feature. It seemed logical to follow up with this question: what are common non-obvious mistakes in Perl? Things that seem like they ought to work, but don't.

I won't give guidelines as to how to structure answers, or what's "too easy" to be considered a gotcha, since that's what the voting is for.

Table of Answers

Syntax

Semantics/Language Features

Debugging

Best Practices

Meta-Answers

See Also: ASP.NET - Common gotchas

转载于:https://stackoverflow.com/questions/166653/common-gotchas-for-perl

The fact that single quotes can be used to replace :: in identifiers.

Consider:

use strict;
print "$foo";        #-- Won't compile under use strict
print "$foo's fun!"; #-- Compiles just fine, refers to $foo::s

Leading to the following problem:

use strict;
my $name = "John";
print "$name's name is '$name'";
# prints:
#  name is 'John'

The recommended way to avoid this is to use braces around your variable name:

print "${name}'s name is '$name'";
# John's name is 'John'

And also to use warnings, since it'll tell you about the use of the undefined variable $name::s

The most common gotcha is to start your files with anything different than

use strict;
use diagnostics;

pjf adds: Please be warned that diagnostics has a significant impact on performance. It slows program start-up, as it needs to load perldiag.pod, and until bleadperl as of a few weeks ago, it also slows and bloats regexps because it uses $&. Using warnings and running splain on the results is recommended.

Confusing references and actual objects:

$a = [1,2,3,4];
print $a[0];

(It should be one of $a->[0] (best), $$a[0], @{$a}[0] or @$a[0])

Assigning arrays to scalars makes no sense to me. For example:

$foo = ( 'a', 'b', 'c' );

Assigns 'c' to $foo and throws the rest of the array away. This one is weirder:

@foo = ( 'a', 'b', 'c' );
$foo = @foo;

This looks like it should do the same thing as the first example, but instead it sets $foo to the length of @foo, so $foo == 3.

You can't localize exported variables unless you export the entire typeglob.

Using the /o modifier with a regex pattern stored in a variable.

m/$pattern/o

Specifying /o is a promise that $pattern won't change. Perl is smart enough to recognize whether or not it changed and recompile the regex conditionally, so there's no good reason to use /o anymore. Alternately, you can use qr// (e.g. if you're obsessed with avoiding the check).

my $x = <>;
do { 
    next if $x !~ /TODO\s*[:-]/;
    ...
} while ( $x );

do is not a loop. You cannot next. It's an instruction to perform a block. It's the same thing as

$inc++ while <>;

Despite that it looks like a construction in the C family of languages.

You can print to a lexical filehandle: good.

print $out "hello, world\n";

You then realise it might be nice to have a hash of filehandles:

my %out;
open $out{ok},   '>', 'ok.txt' or die "Could not open ok.txt for output: $!";
open $out{fail}, '>', 'fail.txt' or die "Could not open fail.txt for output: $!";

So far, so good. Now try to use them, and print to one or the other according to a condition:

my $where = (frobnitz() == 10) ? 'ok' : 'fail';

print $out{$where} "it worked!\n"; # it didn't: compile time error

You have to wrap the hash dereference in a pair of curlies:

print {$out{$where}} "it worked!\n"; # now it did

This is completely non-intuitive behaviour. If you didn't hear about this, or read it in the documentation I doubt you could figure it out on your own.

This gotcha is fixed in Perl 5.10 - if you're lucky enough to be working somewhere that isn't allergic to upgrading things >:-(

I speak of The Variable That's Validly Zero. You know, the one that causes unexpected results in clauses like:

unless ($x) { ... }
$x ||= do { ... };

Perl 5.10 has the //= or defined-or operator.

This is particularly insidious when the valid zero is caused by some edge-condition that wasn't considered in testing before your code went to production...

Perl's DWIMmer struggles with << (here-document) notation when using print with lexical filehandles:

# here-doc
print $fh <<EOT;
foo
EOT

# here-doc, no interpolation
print $fh <<'EOT';
foo
EOT

# bitshift, syntax error
# Bareword "EOT" not allowed while "strict subs" in use
print $fh<<EOT;
foo
EOT

# bitshift, fatal error
# Argument "EOT" isn't numeric...
# Can't locate object method "foo" via package "EOT"...
print $fh<<'EOT';
foo
EOT

The solution is to either be careful to include whitespace between the filehandle and the << or to disambiguate the filehandle by wrapping it in {} braces:

print {$fh}<<EOT;
foo
EOT

If you're foolish enough to do so Perl will allow you to declare multiple variables with the same name:

my ($x, @x, %x);

Because Perl uses sigils to identify context rather than variable type, this almost guarantees confusion when later code uses the variables, particularly if $x is a reference:

$x[0]
$x{key}
$x->[0]
$x->{key}
@x[0,1]
@x{'foo', 'bar'}
@$x[0,1]
@$x{'foo', 'bar'}
...

I did this once:

my $object = new Some::Random::Class->new;

Took me ages to find the error. Indirect method syntax is eeevil.

Most of Perl's looping operators (foreach, map, grep) automatically localize $_ but while(<FH>) doesn't. This can lead to strange action-at-a-distance.

The perltrap manpage lists many traps for the unwary organized by type.

Constants can be redefined. A simple way to accidentally redefine a constant is to define a constant as a reference.

 use constant FOO => { bar => 1 };
 ...
 my $hash = FOO;
 ...
 $hash->{bar} = 2;

Now FOO is {bar => 2};

If you are using mod_perl (at least in 1.3) the new FOO value will persist until the module is refreshed.

Misspelling variable names... I once spent an entire afternoon troubleshooting code that wasn't behaving correctly only to find a typo on a variable name, which is not an error in Perl, but rather the declaration of a new variable.

This is a meta-answer. A lot of nasty gotchas are caught by Perl::Critic, which you can install and run from the command line with the perlcritic command, or (if you're happy to send your code across the Internet, and not be able to customise your options) via the Perl::Critic website.

Perl::Critic also provides references to Damian Conways Perl Best Practices book, including page numbers. So if you're too lazy to read the whole book, Perl::Critic can still tell you the bits you should be reading.

What values would you expect @_ to contain in the following scenario?

sub foo { } 

# empty subroutine called in parameters

bar( foo(), "The second parameter." ) ;

I would expect to receive in bar:

undef, "The second parameter." 

But @_ contains only the second parameter, at least when testing with perl 5.88.

"my" declarations should use parentheses around lists of variables

use strict;
my $a = 1;
mysub();
print "a is $a\n";

sub {
    my $b, $a;   # Gotcha!
    $a = 2;
}

It prints a is 2 because the my declaration only applied to $b (the mention of $a on that line simply didn't do anything). Note that this happens without warning even when "use strict" is in effect.

Adding "use warnings" (or the -w flag) improves things greatly with Perl saying Parentheses missing around "my" list. This shows, as many have already, why both the strict and warnings pragmas are always a good idea.

Use of uninitialized value in concatenation...

This one drives me crazy. You have a print that includes a number of variables, like:

print "$label: $field1, $field2, $field3\n";

And one of the variables is undef. You consider this a bug in your program -- that's why you were using the "strict" pragma. Perhaps your database schema allowed NULL in a field you didn't expect, or you forgot to initialize a variable, etc. But all the error message tells you is that an uninitialized value was encountered during a concatenation (.) operation. If only it told you the name of the variable that was uninitialized!

Since Perl doesn't want to print the variable name in the error message for some reason, you end up tracking it down by setting a breakpoint (to look at which variable is undef), or adding code to check for the condition. Very annoying when it only happens one time out of thousands in a CGI script and you can't recreate it easily.

Graeme Perrow's answer was good, but it gets even better!

Given a typical function that returns a nice list in list context, you might well ask: What will it return in scalar context? (By "typical," I mean the common case in which the documentation doesn't say, and we assume it doesn't use any wantarray funny business. Maybe it's a function you wrote yourself.)

sub f { return ('a', 'b', 'c'); }
sub g { my @x = ('a', 'b', 'c'); return @x; }

my $x = f();           # $x is now 'c'
my $y = g();           # $y is now 3

The context a function is called in is propagated to return statements in that function.

I guess the caller was wrong to want a simple rule of thumb to enable efficient reasoning about code behaviour. You're right, Perl, it's better for the caller's character to grovel through the called function's source code each time.

Comparing strings using == and != instead of eq and ne. For instance:

$x = "abc";
if ($x == "abc") {
    # do something
}

Instead of:

$x = "abc";
if ($x eq "abc") {
    # do something
}

Forgetting to prepend the directory path to the results of readdir before doing tests on those results. Here's an example:

#!/usr/bin/env perl
use strict;
use warnings;

opendir my $dh, '/path/to/directory/of/interest'
  or die "Can't open '/path/to/directory/of/interest for reading: [$!]";

my @files = readdir $dh; # Bad in many cases; see below
# my @files = map { "/path/to/directory/of/interest/$_" } readdir $dh;

closedir $dh or die "Can't close /path/to/directory/of/interest: [$!]";

for my $item (@files) {
  print "File: $item\n" if -f $item;
  # Nothing happens. No files? That's odd...
}

# Scratching head...let's see...
use Data::Dumper;
print Dumper @files;
# Whoops, there it is...

This gotcha is mentioned in the documentation for readdir, but I think it's still a pretty common mistake.

Unary minus with "foo" creates "-foo":

perl -le 'print -"foo" eq "-foo" ? "true" : "false"'

This only works if the first character matches /[_a-zA-Z]/. If the first character is a "-" then it changes the first character to a "+", and if the first character is a "+" then it changes the first character to a "-". If the first character matches /[^-+_a-zA-Z]/ then it attempts to convert the string to a number and negates the result.

perl -le '
    print -"foo";
    print -"-foo";
    print -"+foo";
    print -"\x{e9}"; #e acute is not in the accepted range
    print -"5foo";   #same thing for 5
'

The code above prints

-foo
+foo
-foo
-0
-5

This feature mostly exists to allow people to say things like

my %options = (
    -depth  => 5,
    -width  => 2,
    -height => 3,
);

Modifying the array you're looping on in a for(each) as in:

my @array = qw/a b c d e f g h/;

for ( @array ) {
    my $val = shift @array;
    print $val, "\n";
}

it gets confused and doesn't do what you would expect

Treating a scalar as an integer:

$n = 1729;
$s = 0;
$i = 0;

while ($n) {
    $s += $n % 10;
    $n/=10;
    $i ++
}

print "Sum : $s\n";
print "Number of iterations : $i\n"

Sum : 19

Number of iterations : 327

Ideally it should have had only four iterations, but a scalar is not an int, and we got an unexpected result.

How about the fact that

@array = split( / /, $string );

doesn't give the same result as

@array = split( ' ', $string );

if $string has leading spaces?

That might take some people by surprise.

The hash "constructor" is nothing more than a list, and the => fat comma is nothing more than syntactic sugar. You can get bitten by this when confusing the [] arrayref syntax with the () list syntax:

my %var = (
    ("bar", "baz"),
    fred => "barney",
    foo => (42, 95, 22)
);

# result
{ 'bar' => 'baz',
  '95' => 22,
  'foo' => 42,
  'fred' => 'barney' };

# wanted
{ 'foo' => [ 42, 95, 22 ] }