What are some really useful but esoteric language features in Perl that you've actually been able to employ to do useful work?
Guidelines:
(These are all from Corion's answer)
Operators:
++
and unary -
operators work on stringsm//
operatorQuoting constructs:
Syntax and Names:
Modules, Pragmas, and command-line options:
overload::constant
Variables:
Loops and flow control:
Regular expressions:
Other features:
DATA
blockeof
functiondbmopen
functionOther tricks, and meta-answers:
See Also:
转载于:https://stackoverflow.com/questions/161872/hidden-features-of-perl
There are many non-obvious features in Perl.
For example, did you know that there can be a space after a sigil?
$ perl -wle 'my $x = 3; print $ x'
3
Or that you can give subs numeric names if you use symbolic references?
$ perl -lwe '*4 = sub { print "yes" }; 4->()'
yes
There's also the "bool" quasi operator, that return 1 for true expressions and the empty string for false:
$ perl -wle 'print !!4'
1
$ perl -wle 'print !!"0 but true"'
1
$ perl -wle 'print !!0'
(empty line)
Other interesting stuff: with use overload
you can overload string literals and numbers (and for example make them BigInts or whatever).
Many of these things are actually documented somewhere, or follow logically from the documented features, but nonetheless some are not very well known.
Update: Another nice one. Below the q{...}
quoting constructs were mentioned, but did you know that you can use letters as delimiters?
$ perl -Mstrict -wle 'print q bJet another perl hacker.b'
Jet another perl hacker.
Likewise you can write regular expressions:
m xabcx
# same as m/abc/
Autovivification. AFAIK no other language has it.
while(/\G(\b\w*\b)/g) {
print "$1\n";
}
the \G anchor. It's hot.
Let's start easy with the Spaceship Operator.
$a = 5 <=> 7; # $a is set to -1
$a = 7 <=> 5; # $a is set to 1
$a = 6 <=> 6; # $a is set to 0
The m//
operator has some obscure special cases:
?
as the delimiter it only matches once unless you call reset
.'
as the delimiter the pattern is not interpolated.A bit obscure is the tilde-tilde "operator" which forces scalar context.
print ~~ localtime;
is the same as
print scalar localtime;
and different from
print localtime;
Taint checking. With taint checking enabled, perl will die (or warn, with -t
) if you try to pass tainted data (roughly speaking, data from outside the program) to an unsafe function (opening a file, running an external command, etc.). It is very helpful when writing setuid scripts or CGIs or anything where the script has greater privileges than the person feeding it data.
Magic goto. goto &sub
does an optimized tail call.
The debugger.
use strict
and use warnings
. These can save you from a bunch of typos.
One of my favourite features in Perl is using the boolean ||
operator to select between a set of choices.
$x = $a || $b;
# $x = $a, if $a is true.
# $x = $b, otherwise
This means one can write:
$x = $a || $b || $c || 0;
to take the first true value from $a
, $b
, and $c
, or a default of 0
otherwise.
In Perl 5.10, there's also the //
operator, which returns the left hand side if it's defined, and the right hand side otherwise. The following selects the first defined value from $a
, $b
, $c
, or 0
otherwise:
$x = $a // $b // $c // 0;
These can also be used with their short-hand forms, which are very useful for providing defaults:
$x ||= 0; # If $x was false, it now has a value of 0. $x //= 0; # If $x was undefined, it now has a value of zero.
Cheerio,
Paul
It's simple to quote almost any kind of strange string in Perl.
my $url = q{http://my.url.com/any/arbitrary/path/in/the/url.html};
In fact, the various quoting mechanisms in Perl are quite interesting. The Perl regex-like quoting mechanisms allow you to quote anything, specifying the delimiters. You can use almost any special character like #, /, or open/close characters like (), [], or {}. Examples:
my $var = q#some string where the pound is the final escape.#;
my $var2 = q{A more pleasant way of escaping.};
my $var3 = q(Others prefer parens as the quote mechanism.);
Quoting mechanisms:
q : literal quote; only character that needs to be escaped is the end character. qq : an interpreted quote; processes variables and escape characters. Great for strings that you need to quote:
my $var4 = qq{This "$mechanism" is broken. Please inform "$user" at "$email" about it.};
qx : Works like qq, but then executes it as a system command, non interactively. Returns all the text generated from the standard out. (Redirection, if supported in the OS, also comes out) Also done with back quotes (the ` character).
my $output = qx{type "$path"}; # get just the output
my $moreout = qx{type "$path" 2>&1}; # get stuff on stderr too
qr : Interprets like qq, but then compiles it as a regular expression. Works with the various options on the regex as well. You can now pass the regex around as a variable:
sub MyRegexCheck {
my ($string, $regex) = @_;
if ($string)
{
return ($string =~ $regex);
}
return; # returns 'null' or 'empty' in every context
}
my $regex = qr{http://[\w]\.com/([\w]+/)+};
@results = MyRegexCheck(q{http://myurl.com/subpath1/subpath2/}, $regex);
qw : A very, very useful quote operator. Turns a quoted set of whitespace separated words into a list. Great for filling in data in a unit test.
my @allowed = qw(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z { });
my @badwords = qw(WORD1 word2 word3 word4);
my @numbers = qw(one two three four 5 six seven); # works with numbers too
my @list = ('string with space', qw(eight nine), "a $var"); # works in other lists
my $arrayref = [ qw(and it works in arrays too) ];
They're great to use them whenever it makes things clearer. For qx, qq, and q, I most likely use the {} operators. The most common habit of people using qw is usually the () operator, but sometimes you also see qw//.
The "for" statement can be used the same way "with" is used in Pascal:
for ($item)
{
s/ / /g;
s/<.*?>/ /g;
$_ = join(" ", split(" ", $_));
}
You can apply a sequence of s/// operations, etc. to the same variable without having to repeat the variable name.
NOTE: the non-breaking space above ( ) has hidden Unicode in it to circumvent the Markdown. Don't copy paste it :)
The flip-flop operator is useful for skipping the first iteration when looping through the records (usually lines) returned by a file handle, without using a flag variable:
while(<$fh>)
{
next if 1..1; # skip first record
...
}
Run perldoc perlop
and search for "flip-flop" for more information and examples.
Not really hidden, but many every day Perl programmers don't know about CPAN. This especially applies to people who aren't full time programmers or don't program in Perl full time.
The operators ++ and unary - don't only work on numbers, but also on strings.
my $_ = "a"
print -$_
prints -a
print ++$_
prints b
$_ = 'z'
print ++$_
prints aa
This is a meta-answer, but the Perl Tips archives contain all sorts of interesting tricks that can be done with Perl. The archive of previous tips is on-line for browsing, and can be subscribed to via mailing list or atom feed.
Some of my favourite tips include building executables with PAR, using autodie to throw exceptions automatically, and the use of the switch and smart-match constructs in Perl 5.10.
Disclosure: I'm one of the authors and maintainers of Perl Tips, so I obviously think very highly of them. ;)
The null filehandle diamond operator <>
has its place in building command line tools. It acts like <FH>
to read from a handle, except that it magically selects whichever is found first: command line filenames or STDIN. Taken from perlop:
while (<>) {
... # code for each line
}
rename("$_.part", $_) for "data.txt";
renames data.txt.part to data.txt without having to repeat myself.
As Perl has almost all "esoteric" parts from the other lists, I'll tell you the one thing that Perl can't:
The one thing Perl can't do is have bare arbitrary URLs in your code, because the //
operator is used for regular expressions.
Just in case it wasn't obvious to you what features Perl offers, here's a selective list of the maybe not totally obvious entries:
Portability and Standardness - There are likely more computers with Perl than with a C compiler
A file/path manipulation class - File::Find works on even more operating systems than .Net does
Quotes for whitespace delimited lists and strings - Perl allows you to choose almost arbitrary quotes for your list and string delimiters
Aliasable namespaces - Perl has these through glob assignments:
*My::Namespace:: = \%Your::Namespace
Static initializers - Perl can run code in almost every phase of compilation and object instantiation, from BEGIN
(code parse) to CHECK
(after code parse) to import
(at module import) to new
(object instantiation) to DESTROY
(object destruction) to END
(program exit)
Functions are First Class citizens - just like in Perl
Block scope and closure - Perl has both
Calling methods and accessors indirectly through a variable - Perl does that too:
my $method = 'foo';
my $obj = My::Class->new();
$obj->$method( 'baz' ); # calls $obj->foo( 'baz' )
Defining methods through code - Perl allows that too:
*foo = sub { print "Hello world" };
Pervasive online documentation - Perl documentation is online and likely on your system too
Magic methods that get called whenever you call a "nonexisting" function - Perl implements that in the AUTOLOAD function
Symbolic references - you are well advised to stay away from these. They will eat your children. But of course, Perl allows you to offer your children to blood-thirsty demons.
One line value swapping - Perl allows list assignment
Ability to replace even core functions with your own functionality
use subs 'unlink';
sub unlink { print 'No.' }
or
BEGIN{
*CORE::GLOBAL::unlink = sub {print 'no'}
};
unlink($_) for @ARGV
Add support for compressed files via magic ARGV:
s{
^ # make sure to get whole filename
(
[^'] + # at least one non-quote
\. # extension dot
(?: # now either suffix
gz
| Z
)
)
\z # through the end
}{gzcat '$1' |}xs for @ARGV;
(quotes around $_ necessary to handle filenames with shell metacharacters in)
Now the <>
feature will decompress any @ARGV
files that end with ".gz" or ".Z":
while (<>) {
print;
}
The quoteword operator is one of my favourite things. Compare:
my @list = ('abc', 'def', 'ghi', 'jkl');
and
my @list = qw(abc def ghi jkl);
Much less noise, easier on the eye. Another really nice thing about Perl, that one really misses when writing SQL, is that a trailing comma is legal:
print 1, 2, 3, ;
That looks odd, but not if you indent the code another way:
print
results_of_foo(),
results_of_xyzzy(),
results_of_quux(),
;
Adding an additional argument to the function call does not require you to fiddle around with commas on previous or trailing lines. The single line change has no impact on its surrounding lines.
This makes it very pleasant to work with variadic functions. This is perhaps one of the most under-rated features of Perl.
The ability to parse data directly pasted into a DATA block. No need to save to a test file to be opened in the program or similar. For example:
my @lines = <DATA>;
for (@lines) {
print if /bad/;
}
__DATA__
some good data
some bad data
more good data
more good data
I'd say the ability to expand the language, creating pseudo block operations is one.
You declare the prototype for a sub indicating that it takes a code reference first:
sub do_stuff_with_a_hash (&\%) {
my ( $block_of_code, $hash_ref ) = @_;
while ( my ( $k, $v ) = each %$hash_ref ) {
$block_of_code->( $k, $v );
}
}
You can then call it in the body like so
use Data::Dumper;
do_stuff_with_a_hash {
local $Data::Dumper::Terse = 1;
my ( $k, $v ) = @_;
say qq(Hey, the key is "$k"!);
say sprintf qq(Hey, the value is "%v"!), Dumper( $v );
} %stuff_for
;
(Data::Dumper::Dumper
is another semi-hidden gem.) Notice how you don't need the sub
keyword in front of the block, or the comma before the hash. It ends up looking a lot like: map { } @list
Also, there are source filters. Where Perl will pass you the code so you can manipulate it. Both this, and the block operations, are pretty much don't-try-this-at-home type of things.
I have done some neat things with source filters, for example like creating a very simple language to check the time, allowing short Perl one-liners for some decision making:
perl -MLib::DB -MLib::TL -e 'run_expensive_database_delete() if $hour_of_day < AM_7';
Lib::TL
would just scan for both the "variables" and the constants, create them and substitute them as needed.
Again, source filters can be messy, but are powerful. But they can mess debuggers up something terrible--and even warnings can be printed with the wrong line numbers. I stopped using Damian's Switch because the debugger would lose all ability to tell me where I really was. But I've found that you can minimize the damage by modifying small sections of code, keeping them on the same line.
It's often enough done, but it's not all that obvious. Here's a die handler that piggy backs on the old one.
my $old_die_handler = $SIG{__DIE__};
$SIG{__DIE__}
= sub { say q(Hey! I'm DYIN' over here!); goto &$old_die_handler; }
;
That means whenever some other module in the code wants to die, they gotta come to you (unless someone else does a destructive overwrite on $SIG{__DIE__}
). And you can be notified that somebody things something is an error.
Of course, for enough things you can just use an END { }
block, if all you want to do is clean up.
overload::constant
You can inspect literals of a certain type in packages that include your module. For example, if you use this in your import
sub:
overload::constant
integer => sub {
my $lit = shift;
return $lit > 2_000_000_000 ? Math::BigInt->new( $lit ) : $lit
};
it will mean that every integer greater than 2 billion in the calling packages will get changed to a Math::BigInt
object. (See overload::constant).
While we're at it. Perl allows you to break up large numbers into groups of three digits and still get a parsable integer out of it. Note 2_000_000_000
above for 2 billion.
Binary "x" is the repetition operator:
print '-' x 80; # print row of dashes
It also works with lists:
print for (1, 4, 9) x 3; # print 149149149
My vote would go for the (?{}) and (??{}) groups in Perl's regular expressions. The first executes Perl code, ignoring the return value, the second executes code, using the return value as a regular expression.
Based on the way the "-n"
and "-p"
switches are implemented in Perl 5, you can write a seemingly incorrect program including }{
:
ls |perl -lne 'print $_; }{ print "$. Files"'
which is converted internally to this code:
LINE: while (defined($_ = <ARGV>)) {
print $_; }{ print "$. Files";
}
Special code blocks such as BEGIN
, CHECK
and END
. They come from Awk, but work differently in Perl, because it is not record-based.
The BEGIN
block can be used to specify some code for the parsing phase; it is also executed when you do the syntax-and-variable-check perl -c
. For example, to load in configuration variables:
BEGIN {
eval {
require 'config.local.pl';
};
if ($@) {
require 'config.default.pl';
}
}
map - not only because it makes one's code more expressive, but because it gave me an impulse to read a little bit more about this "functional programming".
tie, the variable tying interface.
The continue clause on loops. It will be executed at the bottom of every loop, even those which are next'ed.
while( <> ){
print "top of loop\n";
chomp;
next if /next/i;
last if /last/i;
print "bottom of loop\n";
}continue{
print "continue\n";
}
The "desperation mode" of Perl's loop control constructs which causes them to look up the stack to find a matching label allows some curious behaviors which Test::More takes advantage of, for better or worse.
SKIP: {
skip() if $something;
print "Never printed";
}
sub skip {
no warnings "exiting";
last SKIP;
}
There's the little known .pmc file. "use Foo" will look for Foo.pmc in @INC before Foo.pm. This was intended to allow compiled bytecode to be loaded first, but Module::Compile takes advantage of this to cache source filtered modules for faster load times and easier debugging.
The ability to turn warnings into errors.
local $SIG{__WARN__} = sub { die @_ };
$num = "two";
$sum = 1 + $num;
print "Never reached";
That's what I can think of off the top of my head that hasn't been mentioned.
The goatse operator*
:
$_ = "foo bar";
my $count =()= /[aeiou]/g; #3
or
sub foo {
return @_;
}
$count =()= foo(qw/a b c d/); #4
It works because list assignment in scalar context yields the number of elements in the list being assigned.
*
Note, not really an operator