Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
418 views
in Technique[技术] by (71.8m points)

operators - Why do I have to use a * in front of a Perl bareword filehandle?

While trying to do this:

 my $obj = new JavaScript::Minifier;
 $obj->minify(*STDIN, *STDOUT);
// modified above line to
 $obj->minify(*IP_HANDLE,*OP_HANDLE)

The above works if IP_HANDLE and OP_HANDLE are filehandles but still I am not able to figure out what actually the * does when applied to a filehandle or any other datatype.

Thanks,

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In the bad old days before perl v5.6, which introduced lexical filehandles — more than a decade ago now — passing file- and directory handles was awkward. The code from your question is written using this old-fashioned style.

The technical name for *STDIN, for example, is a typeglob, explained in the “Typeglobs and Filehandles” section of perldata. You may encounter manipulation of typeglobs for various purposes in legacy code. Note that you may grab typeglobs of global variables only, never lexicals.

Passing handles was a common purpose for dealing directly with typeglobs, but there were other uses as well. See below for details.

  • Passing filehandles to subs
  • Syntactic ambiguity: string or filehandle
  • Aliases via typeglob assignment
  • Localizing handles by localizing typeglobs
  • Peeking under the hood: *foo{THING} syntax
  • Tying it all together: DWIM!

Passing filehandles to subs

The perldata documentation explains:

Typeglobs and Filehandles

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that we have real references, this is seldom needed.

[...]

Another use for typeglobs is to pass filehandles into a function or to create new filehandles. If you need to use a typeglob to save away a filehandle, do it this way:

$fh = *STDOUT;

or perhaps as a real reference, like this:

$fh = *STDOUT;

See perlsub for examples of using these as indirect filehandles in functions.

The referenced section of perlsub is below.

Passing Symbol Table Entries (typeglobs)

WARNING: The mechanism described in this section was originally the only way to simulate pass-by-reference in older versions of Perl. While it still works fine in modern versions, the new reference mechanism is generally easier to work with. See below.

Sometimes you don’t want to pass the value of an array to a subroutine but rather the name of it, so that the subroutine can modify the global copy of it rather than working with a local copy. In Perl you can refer to all objects of a particular name by prefixing the name with a star: *foo. This is often known as a “typeglob,” because the star on the front can be thought of as a wildcard match for all the funny prefix characters on variables and subroutines and such.

When evaluated, the typeglob produces a scalar value that represents all the objects of that name, including any filehandle, format, or subroutine. When assigned to, it causes the name mentioned to refer to whatever * value was assigned to it. [...]

Note that a typeglob can be taken on global variables only, not lexicals. Heed the warning above. Prefer to avoid this obscure technique.

Syntactic ambiguity: string or filehandle?

Without the * sigil, a bareword is just a string.

Simple strings sometimes suffice, hower. For example, the print operator allows

$ perl -le 'print { "STDOUT" } "Hiya!"'
Hiya!

$ perl -le '$h="STDOUT"; print $h "Hiya!"'
Hiya!

$ perl -le 'print "STDOUT" +123'
123

These fail with strict 'refs' enabled. The manual explains:

FILEHANDLE may be a scalar variable name, in which case the variable contains the name of or a reference to the filehandle, thus introducing one level of indirection.

In your example, consider the syntactic ambiguity. Without the * sigil, you could mean strings

$ perl -MO=Deparse,-p prog.pl
use JavaScript::Minifier;
(my $obj = 'JavaScript::Minifier'->new);
$obj->minify('IP_HANDLE', 'OP_HANDLE');

or maybe a sub call

$ perl -MO=Deparse,-p prog.pl
use JavaScript::Minifier;
sub OP_HANDLE {
    1;
}
(my $obj = 'JavaScript::Minifier'->new);
$obj->minify('IP_HANDLE', OP_HANDLE());

or, of course, a filehandle. Note in the examples above how the bareword JavaScript::Minifier also compiles as a simple string.

Enable the strict pragma and it all goes out the window anyway:

$ perl -Mstrict prog.pl
Bareword "IP_HANDLE" not allowed while "strict subs" in use at prog.pl line 6.
Bareword "OP_HANDLE" not allowed while "strict subs" in use at prog.pl line 6.

Aliases via typeglob assignment

One trick with typeglobs that’s handy for Stack Overflow posts is

*ARGV = *DATA;

(I could be more precise with *ARGV = *DATA{IO}, but that’s a little fussy.)

This allows the diamond operator <> to read from the DATA filehandle, as in

#! /usr/bin/perl

*ARGV = *DATA;   # for demo only; remove in production

while (<>) { print }

__DATA__
Hello
there

This way, the program and its input can be in a single file, and the code is a closer match to how it will look in production: just delete the typeglob assignment.

Localizing handles by localizing typeglobs

As noted in perlsub

Temporary Values via local()

WARNING: In general, you should be using my instead of local, because it’s faster and safer. Exceptions to this include the global punctuation variables, global filehandles and formats, and direct manipulation of the Perl symbol table itself. local is mostly used when the current value of a variable must be visible to called subroutines. [...]

you can use typeglobs to localize filehandles:

$ cat prog.pl
#! /usr/bin/perl

sub foo {
  local(*STDOUT);
  open STDOUT, ">", "/dev/null" or die "$0: open: $!";
  print "You can't see me!
";
}

print "Hello
";
foo;
print "Good bye.
";

$ ./prog.pl
Hello
Good bye.

“When to Still Use local()” in perlsub has another example.

2. You need to create a local file or directory handle or a local function.

A function that needs a filehandle of its own must use local() on a complete typeglob. This can be used to create new symbol table entries:

sub ioqueue {
    local (*READER, *WRITER); # not my!
    pipe (READER, WRITER) or die "pipe: $!";
    return (*READER, *WRITER);
}
($head, $tail) = ioqueue();

To emphasize, this style is old-fashioned. Prefer to avoid global filehandles in new code, but being able to understand the technique in existing code is useful.

Peeking under the hood: *foo{THING} syntax

You can get at the different parts of a typeglob, as perlref explains:

A reference can be created by using a special syntax, lovingly known as the *foo{THING} syntax. *foo{THING} returns a reference to the THING slot in *foo (which is the symbol table entry which holds everything known as foo).

$scalarref = *foo{SCALAR};
$arrayref = *ARGV{ARRAY};
$hashref = *ENV{HASH};
$coderef = *handler{CODE};
$ioref = *STDIN{IO};
$globref = *foo{GLOB};
$formatref = *foo{FORMAT};

All of these are self-explanatory except for *foo{IO}. It returns the IO handle, used for file handles (open), sockets (socket and socketpair), and directory handles (opendir). For compatibility with previous versions of Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}, though it is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn of its use.

*foo{THING} returns undef if that particular THING hasn’t been used yet, except in the case of scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn’t been used yet. This might change in a future release.

*foo{IO} is an alternative to the *HANDLE mechanism given in [“Typeglobs and Filehandles” in perldata] for passing filehandles into or out of subroutines, or storing into larger data structures. Its disadvantage is that it won’t create a new filehandle for you. Its advantage is that you have less risk of clobbering more than you want to with a typeglob assignment. (It still conflates file and directory handles, though.) However, if you assign the incoming value to a scalar instead of a typeglob as we do in the examples below, there’s no risk of that happening.

splutter(*STDOUT); # pass the whole glob
splutter(*STDOUT{IO}); # pass both file and dir handles

sub splutter {
  my $fh = shift;
  print $fh "her um well a hmmm
";
}

$rec = get_rec(*STDIN); # pass the whole glob
$rec = get_rec(*STDIN{IO}); # pass both file and dir handles

sub get_rec {
  my $fh = shift;
  return scalar <$fh>;
}

Tying it all together: DWIM!

Context is key with Perl. In your example, although the syntax may be ambiguous, the intent is not: even if the parameters are strings, those strings are clearly intended to name filehandles.

So consider all the cases minify may need to handle:

  • bareword
  • bare typeglob
  • reference to typeglob
  • filehandle in a scalar

For example:

#! /usr/bin/perl

use warnings;
use strict;

*IP_HANDLE = *DATA;
open OP_HANDLE, ">&STDOUT";
open my $fh, ">&STDOUT";
my $offset = tell DATA;

use JavaScript::Minifier;
my $obj = JavaScript::Minifier->new;
$obj->minify(*IP_HANDLE, "OP_HANDLE");

seek DATA, $offset, 0 or die "$0: seek: $!";
$obj->minify(*IP_HANDLE, $fh);

__DATA__
Ahoy there
matey!

As a library author, being accomodative can be useful. To illustrate, the following stub of JavaScript::Minifier understands both old-fashioned and modern ways of passing filehandles.

package JavaScript::Minifier;

use warnings;
use strict;

sub new { bless {} => shift }

sub minify {
  my($self,$in,$out) = @_;

  for ($in, $o

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...