On Perl


Location: Chicago/Delhi, United States

Friday, June 17, 2005

Stages of a Perl Programmer

There is an excellent article by Nathan Torkington on Perl, rather on stages in the life of a Perl programmer - novice to wizard. Like it says, 'You won't come away a wizard, but you'll know what you need to do in order to become one'. Must read for all perl mongers.

Thursday, June 16, 2005

sprintf hack

Often, you need to prefix 0 to maintain uniform column size.
We generally do it like this using sprintf:
for (0..99){
$output = sprintf("%02d", $_);
print $output;

A better alternative would be
for (0..99) {
$_ = "0$_" if $_ < 10;

Here, $_ is changed only 10 times and we avoid calling sprint 100 times as in earlier case.

Trouble with system() in Perl

You created a script to generate some files and these files have to be processed by another (perl or shell) script called from within your first perl script using
system("./secondScript.sh argProcessALLFiles /some/dir");

However something strange is happening. You find that the files are being created properly, but, if these have to be uploaded, it happens only one at a time, and waits for the Ctrl-C for the next one. Is there anything you could do in the perl script, that would not wait for the Ctrl C?

Apparently there is nothing in Perl script that you can do about it. The problem is with underlying code in secondScript.sh shell script. Also, when you kill you perl program with a Control-C, the underlying system call will not kill the invoked application/script as the system() and backticks block SIGINT and SIGQUIT.

A piped open is another alternative to running system, and even allows you to send a Control-C as follows:

open my $handle, "| @commands" or die "Can't fork. $!";
print $handle , "\x3"; #Hex-03 is Control-C
close $handle or warn "Couldn't close $handle. $!";

Wednesday, June 15, 2005

Installing Perl modules on your system

In order to avoid re-inventing the wheel, Perl has modules. CPAN is a repository for all modules in Perl. As you program in Perl, you will eventually find yourself in a position where you will need to install a module that did not come installed with your Perl installation. CPAN also lists a page on how to install modules.

Apart from what's given there, there are some more methods that should work on both Unix and Windows:

1. Use PPM
Perl Package Manager(PPM) is a perl script which is used to install modules on your system. Perl automatically downloads any other dependent modules (or prompts you atleast). Here is how to use it.
C:\Perl\bin>perl ppm.pl
PPM interactive shell (2.1.5) - type 'help' for available commands.
PPM> install module-name

You can also use PPM to search for modules like this:
PPM> search MP3
Packages available from http://ppm.ActiveState.com/cgibin/PPM/ppmserver.pl?urn:/
Bundle-MP3 [1.00] A bundle to install all MP3-related modules
MP3-Daemon [0.63] a daemon that possesses mpg123
MP3-ID3v1Tag [1.11] Edit ID3v1 Tags from an Audio MPEG Layer 3.
MP3-Info [1.02] Manipulate / fetch info from MP3 audio files

2. Use CPAN Shell
To invoke the CPAN shell, use this:
perl -MCPAN -e shell

To install and search for modules:
cpan>i /mp3/ /* search for all modules with /mp3/ in the name*/
cpan>i MP3::Info /* show information about this module*/
cpan>install "MP3::Info" /*install the module*/

To see the cpan shell's configurations do:
cpan>o conf

3. Universal Method
Decompress and Unpack your module (like CPAN says). Use Winzip/gzip/tar etc.

Go the downloaded-module's folder and run the following steps (in order and look out for errors and warnings):
perl Makefile.PL
make test
make install

For windows, you may use nmake or dmake instead of make.

Finally, a Test
After you have installed your module, you may want to test whether it has been installed properly or not. Try these options:
c:\> perldoc MP3::Info /*Should show you the pod page for MP3::Info module*/

Try compiling or running a perl script that has only one line:
use MP3::Info; /*stored in test.pl*/

perl -c test.pl

Tuesday, June 14, 2005

print Quiz

We always take printf, System.out.println, print etc to be the simplest parts of any programming language. Here is perl teaser on print:

What will the following piece of code print?
print ( 2 + 2 ) * 5 ;

Since I have told you it is a teaser, then you already know the answer is not going to be 20 i.e. 4*5. So what will it be? Will this piece of code give an error? No.

At this point, you probably want to read the perldoc on print or simply run this code.

It prints 4. The part corresponding to (2+2) only. What happens to the rest? It's all there in perldoc which says:
Also be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print--interpose a `+' or put parentheses around all the arguments.

So the arguments to print are (2+2) only, and the rest i.e. * 5, is simply ignored.

Monday, June 13, 2005

Perl is...

Perl is...
o a cross-platform programming language
o an open source software
o Y2K compliant
o listed in Oxford English Dictionary (OED)
o has features of C,awk,sed,sh,basic
o equipped with DBI modules for interfacing with 3rd party databases like Oracle, mySQL etc
o supports procedural and object-oriented programming
o supports Unicode
o works with HTML, XML and other markup languages
o even interfaces with external C/C++ libraries using XS or SWIG

Uncultured Perl

Larry Wall's talk on Perl 0-5.

Interesting read.

Written by Larry Wall
Friday, 15 October 1999

They say one should never start with an apology, so here goes: I'm sorry, but this essay will be all over the map. But then, given that I too am all over the map (and by extension, so is Perl), this kind of wandering is entirely inevitable. So I'll do my best to keep this story roughly organized in a chronological order because the history of Perl is suggestive of various principles that you may see in operation elsewhere.

Or maybe not...


Sex makes babies.

Like the typical human, Perl was conceived in secret, and existed for roughly nine months before anyone in the world ever saw it. Its womb was a secret project for the National Security Agency known as the "Blacker" project, which has long since closed down. The goal of that sexy project was not to produce Perl. However, Perl may well have been the most useful thing to come from Blacker. Sex can fool you that way.

Not only was Perl born of a sexy project, but its early design came from heavy cross-pollination.

Genetically speaking, of course, Perl has many more ancestors than that simple diagram above indicates. (I've left out theology and biology, for instance.) I am a synthesist at heart, and can't help throwing a little bit of everything into the pot all at once. Just in the realm of computer science,Perl draws inspiration from many languages. One could draw a similar diagram just for that
More Parents

That also is an oversimplification, since Perl has drawn from many other languages over the years. (Lisp and Ada, for instance.) But the four languages in the diagram influenced Perl the most at the beginning of its development. Well, those four, and BASIC-PLUS...

At this point, I'm talking about Perl, version 0. Only a few people in my office ever used it. In fact, the early history of Perl recorded in O'Reilly's Camel Book (Programming Perl) was written by my officemate of the time, Daniel Faigin.

He, along with my brother in law, Mark Biggar, were most influential in the early design of Perl. They were also the only users at the time. Mark talked me out of using bc as a backend expression processor, and into using normal, built in floating point operations, since they were just being standardized by the IEEE (Institute of Electrical and Electronics Engineers). Relying on that standard was one of the better decisions I ever made. Earlier scripting languages such as REXX didn't have that option, and as a result they tend to run slower.

As with a human fetus, the last several months of Perl's gestation did not produce any large differences in the appearance of Perl. Rather, Perl simply put on some weight as it prepared for the real world. I recognized that acceptance of Perl hinged on the ability of people to leverage what they already knew, so I didn't release Perl until I'd written two translators: s2p (sed to Perl) and a2p (awk to Perl).

I made one major, incompatible change to Perl just before it was born. From the start, one of my overriding design principles was to "optimize for the common case." I didn't coin this phase, of course. I learned it from people like Dennis Ritchie, who realized that computers tend to assign more values than they compare. This is why Dennis made = represent assignment and == represent comparison in his C programming language.

I'd made many such tradeoffs in designing Perl, but I realized that I'd violated the principle in Perl's regular expression syntax. It used grep's notion of backslashing ordinary characters to produce metacharacters, rather than egrep's notion of backslashing metacharacters to produce ordinary characters.

It turns out that you use the metacharacters much more frequently than you do the literal characters, so it made sense to change Perl so that /(.*)/ defined a substring that could be referenced later, while /\(.*\)/ matched a sequence inside literal parentheses.

This change broke all of Dan Faigin's scripts. I think he's forgiven me by now, even if he won't admit it.

As you might expect, another decision at this time was whether to release Perl as free software. This was essentially a no brainer. I'd already done most of my thinking about this topic earlier when I released the rn and patch programs. Occasionally people ask me whether I was surprised that Perl caught on, and then they're surprised when I say "no." But you have to remember that I already had several open source projects under my belt, and had a pretty good notion that anything I liked, other people might also like.

But it wasn't as easy deciding to do an open source project back then. These days, you can actually go to your company and make a case for releasing the source code. Back then, you had to believe in it enough to take some risks.

I knew that I didn't dare ask the company lawyers for permission, because they'd have thought about it for something like six months, and then told me "no." This is despite the fact that they wouldn't be interested in peddling it themselves. In the old days, a lot of free software was released under the principle that it's much easier to ask forgiveness than to seek permission. I'm glad things have changed -- at least to the extent that the counterculture is acknowledged these days, even if it's not quite accepted. Yet.

But Perl was actually much more countercultural than you might think. It was intended to subvert the Unix philosophy. More specifically, it was intended to subvert that part of Unix philosophy that said that every tool should do only one thing and do that one thing well.

The problem with that philosophy is that many of the tools available under Unix did not, in fact, do things very well. They had arbitrary limits. They were slow. They were non-portable. They were difficult to integrate via the shell because they had different ideas of data formats. They worked okay as long as you did what was expected, but if you wanted to do something slightly different, you had to write your own tool from scratch.

So that's what I did. Perl is just another tool in the Unix toolbox. Perl does one thing, and it does it well: it gets out of your face.

I released Perl to the comp.sources.misc newsgroup in 1987, and that was the end of Perl version 0.

But Perl was also intended to subvert some attitudes in computer science. More on that later.


The Baby
Perl Tim O'Reilly
A Class Act: Larry Wall's boss at O'Reilly & Associates, Tim O'Reilly.

Perl was born at an early age. I mean something specific by that.

Humans do not come from the womb with the ability to drive a stick shift. Neither did Perl. This was intentional. I've always been smart enough to realize how stupid I am, and one of the things I'm stupid about is predicting how my programs will develop over time. So Perl was equipped to learn, and have a long childhood.

We value the maturing process in our own species, but for some reason we don't like it as much in computer programs. In the absence of a handy Zeus, we like to think that computer programs should spring fully formed from our own foreheads. We want to present the world with a fait accompli. In modern terms, we want to build a cathedral.

Now let me just say that I think cathedrals have gotten a bum rap lately. Open Source advocate Eric Raymond has likened the commercial software development model to a cathedral, while he compares free software development to a bazaar.

Eric's heart is in the right place, but I think his metaphors are a little off. Most cathedrals were built in plain view with lots of volunteer labor. And most of the bazaars I've seen have produced little of lasting architectural value. Eric should have written about artists who insist on having an unveiling when their sculpture or painting is finished. Somehow I can't imagine anyone pulling a shroud off of a cathedral and saying, "Voila!"

An open source programmer is more like a public mural painter. An artist may have some idea of what he's about, but anyone walking by can make suggestions, or even help paint if the artist is agreeable. If not, he can always go paint a better mural on the other side of the freeway if he wants.

Anyway, when Perl version 1 arrived in the world, it knew it was the new arrival, not the world. It recognized that there are depths of wisdom in the world that have not yet been plumbed by computer science, let alone by Perl. So Perl played the role of a baby, and grew. Various nurturing individuals adopted Perl, and helped to raise it, because Perl was cute, and they became irrationally attached to it.

Besides, cute is cool. See penguins.


The Toddler

Theory is good, in moderation.

One of the ways Perl subverts computer science is by adopting theoretical axes without grinding them. The theory of regular expressions is highly developed, but typical users just want to get their work done. Perl 2 had a more powerful regular expression engine under the hood, but the idea was to make things easier for the user, not harder.

So, to effect this, I introduced many redundancies in regular expression syntax that have come to be a large part of what nowadays are known as Perl 5 regular expressions, even when they're being called from Java or Python. Yes, in Perl \d means the same thing as

[0-9]. So what? I can say "digits" in English to mean 0 through 9, without saying "0 through 9" every time. Why not in Perl too?

Moderation is good, in theory.

If you want to have a thriving open source project, then you must build a culture around it; and to build a culture, you must encourage cultural identity. That is, you must encourage a certain amount of immoderation. Call it "religion" if you will, though I think religion has gotten a bum rap too. After all, immoderation on behalf of a good cause is how saints are made.

Hence, we were immoderately evangelical about Perl. In particular, because we wanted to build a new culture, we had to pull people in from many different cultures, which oddly enough meant we had to avoid being classified as a culture ourselves for a little while.

Consequently, I refused to create a newsgroup for Perl for a long time because I wanted people to talk about how to solve their shell problems in Perl, and I didn't want Perl to become ghettoized right off the bat.

So when people would ask in the shell newsgroups how to do something, we could give them the "how to do it in Perl" answer without enduring chants of, "Take it to the Perl newsgroup!" In this way, we subverted Usenet.

(Later on, and for similar cross-cultural reasons, I started scanning my entire Usenet feed for Perl references -- I basically invented my own form of Kibology (http://www.kibo.com/) about the same time as Kibo came up with his religion of the Internet, but he got to name it. Drat. Perhaps it's just as well. "Wallogy" just doesn't quite have the right ring to it, does it?)

Anyway, I did some sneaky things to make sure Perl developed a healthy culture. While we took ourselves very seriously in some ways, we also tried to laugh at ourselves occasionally. Perl not only stands for the Practical Extraction and Report Language, but it also stands for the Pathologically Eclectic Rubbish Lister.

Anyone who can't laugh at himself is not taking life seriously enough.


The Child
Perl Larry Wall
Striking A Subversive Pose: Larry Wall.

I remember that when I was a child, I thought there was nothing I couldn't do if I tried hard enough.

Perl 3 was when Perl started trying harder. The early design goal of optimizing for the common case was okay as far as it went, but it didn't say anything about what to do for the uncommon case. One common solution is to say you're not going to do anything about it, and in fact, that's what I once said about binary data, based on the precedent of many Unix text processing tools that could not handle binary data. When Perl 2 was out, I said to myself, "Perl is just a text processing language. If I make Perl handle binary data, who knows where it will stop?"

Well, Perl now handles binary data, and who knows where it will stop? That's the rub -- people don't want Perl to quit just because the going gets tough. So the earlier design goal evolved into our current goal: "Easy things should be easy, and hard things should be possible." Sure, Perl is mostly a text processing language, but every time we add the ability to do something else, we make a way to solve gobs of problems that are 90 percent text processing and 10 percent something else.

Another way Perl 3 tried harder was by adopting the GNU General Public License, known as the GPL. (The copyright statement on the first two versions was woefully inadequate.) More on the GPL later...

I've been talking about the advances that marked major versions, but you should realize that most of the development of Perl was not at the version boundaries. Development was continuous between major versions, and the typical patch file would contain bug fixes mixed together with enhancements.

It's a sign of the robustness of the design of Perl that people didn't rebel at using a development version for all their work. In large part, this was because Perl has always had an extensive set of regression tests, and users could be confident that, even if they installed a new development version, at least the functionality tested by the regression tests was guaranteed not to break. And that was good enough for many, because most Perl scripts don't use the fancy stuff. I take it as a compliment that many people still use Perl 4, because that means I put a lot of the right stuff into Perl early.


The Preteen

It's an odd thing, but nothing much changed between Perl 3 and Perl 4. Or, I should say, between the end of Perl 3 and the beginning of Perl 4. Just as it's hard to say when a child becomes a preteen, the transition from Perl 3 to Perl 4 was rather arbitrary. What really happened was that we wrote the first edition of the Camel Book, and then changed the version number on Perl so that we could say that it documented version 4.

But there is actually a qualitative difference between how a child thinks and how a preteen thinks. There comes a time at about the age of ten when suddenly a child doesn't just think about things, but starts thinking about thinking about things. So I guess you could say that producing the Camel Book was our version of thinking about thinking about Perl. Certainly it helped a lot of people understand more about how I think about Perl. Or at least, about camels.

People keep asking me, "Why a camel?" And I have a standard litany of answers. (I'm allowed to give more than one answer, of course, because of the Perl Slogan: "There's More Than One Way To Do It!") But the fact is, there's no single left-brained answer. If I'd wanted a left-brained animal on my book, it would have been an oyster. But I wanted a right-brained animal, and the camel was it for various reasons:

A camel is ugly but useful; it may stink, and it may spit, but it'll get you where you're going.

A camel is self sufficient in a dry place. You don't need to take along food and water for your camel. You don't need a toolkit to keep your camel going. You don't need tracks or roadways, or pipes to other processes.

A camel is a horse designed by a committee. Or at least it looks like one. But appearances can be deceiving, and a camel is well adapted to its ecological niche. So is Perl. Despite the fact that it is designed by a committee.

Camels have vague Biblical associations with caravans and treasure, such as Pearls of Great Price, not to mention Pearls of Wisdom, and not to mention Pearls before Swine. Or the Pearly Gates.

Finally, no animal has a better attitude than a camel. Presuming you're looking for an animal with an attitude.

Frankly, I wanted a camel because it was countercultural. Camels aren't modern, therefore they have to be either premodern or post-modern. Maybe even both. Fortunately our postmodern culture is based on the happy contradiction that any decent culture must be countercultural. If you can't deconstruct your own program, you aren't really with the program.

Another way to look at the Camel Book is that it was a kind of bar mitzvah for Perl. Perl wasn't a real grown up language before it had a book. I remember being shocked the first time I was told that half the desks on Wall Street had a Perl book on them. I shouldn't have been. The book is what legitimized Perl in the eyes of many.

Another thing that helped legitimize Perl was the addition of the Artistic License to stand beside the GPL. Perl 3 used only the GPL, but I found that this didn't do quite what I wanted. I wanted Perl to be used, and the GPL was preventing people from using Perl. Not that I dislike the GPL myself -- it provides a set of assurances that many hackers find comforting. But business people needed a different set of assurances, and so I wrote the Artistic License to reassure them.

The really brilliant part was that I didn't require people to state which license they were distributing under, so nobody had to publicly commit to one or the other. In sociological terms, nobody had to lose face, or cause anyone else to lose face. Most everyone chose to read whichever license they preferred, and to ignore the other. That's how Perl used psychology to subvert the license wars which, as you may or may not be aware, are still going on. Ho hum.

Yet another thing that helped legitimize Perl was that there was a long period of stability for Perl 4, patch level 36. The primary cause of this was that I abandoned Perl 4 to work on Perl 5.


The Adolescent

Shortly after a preteen starts to think about thinking, the raging hormones enter the scene, causing a total brain meltdown and, hopefully, eventual reconstruction.

For Perl, the meltdown happened because I decided to follow the rule: "Plan to throw away your prototype, because you will anyway." Perl 5 was nearly a total reorganization. I have in times past claimed that it was a total rewrite, but that's a bit of a stretch, since I did, in fact, evolve Perl 4's runtime system into Perl 5's. (Though if you compared them, you'd see almost nothing in common.) The compiler, though, was a total rewrite.

You only get one shot at raising your kids. Similarly, I figured that this was my first and last chance to raise Perl right. So I thought of all the buzzwords I wanted Perl to be compliant with, and heroically set out to make it so. By and large, I think I succeeded. As with the typical adolescent, Perl 5 is significantly, um, sexier than Perl 4.

Among other things, Perl became more readable and more extensible. It also became more unmanageable, not because it was a bad design, but simply because it was growing up, and had too many interests of its own. With Perl 5, I began to realize something I was learning with my kids: I had to let them learn to make their own way in the world. Not that I don't still give them all a lot of advice. But the process of letting go is gradual; if you let go all of a sudden, they tend to land hard.

One manifestation of this was that I had to learn to delegate most of the work of Perl development and documentation to other people. We've developed a system of "pumpkin holders" who are responsible for the various aspects of Perl development and maintenance.

But that's not good enough. I also realized that, while the open source community is good at some things, it's lousy at other things. If Perl was to be useful to as many people as possible, I had to learn to delegate some things to the business community as well. In particular, we really needed to have a commercially packaged version of Perl for the Windows folks, because many of them were (and still are) clueless about open source. It's almost like we're doing Windows users a favor by charging them money for something they could get for free, because they get confused otherwise. (This is Linux Magazine, so I can get away with saying this, right?)

But beyond that, I was looking around for someone with some business sense to cooperate with, when lo and behold I found out that Tim O'Reilly (as in O'Reilly & Associates, my publisher) was having the same ideas about establishing a more symbiotic relationship with the open source community.

Tim is a class act. He's also a bit of a rarity: a brilliant (but not greedy) entrepreneur. His slogan is "Interesting work for interesting people." We hit it off right away, and Tim offered to pay me to take care of Perl, because anything that is good for Perl is good for O'Reilly. And from my perspective, lots of what O'Reilly does happens to be good for Perl.

But it goes beyond even that.

Tim and I both felt that there was something larger than Perl afoot. Free software has been around in various forms as long as there has been software, but something new was beginning to happen, something countercultural to the counterculture.

The various open source projects were starting to realize that, hey, we aren't just a bunch of separate projects, but we have a lot in common. We don't have a bunch of separate open source movements here. We have a single Open Source movement -- albeit one with lots of diversity of opinion as to how best to move the bandwagon forward to where more people can hop on.

In short, our counterculture was beginning to count.

When Tim hired me on three years ago, that was very much on our minds. We were preaching it a year before Netscape released the Mozilla browser code under an open source license.

Of course, both Linux folks and Perl folks recognize that Linux and Perl were here long before the bandwagon, and will (hopefully) be here long after the bandwagon has broken an axle (or more likely, has been hijacked). But in the meantime, both Linux and Perl have been growing up, and they are now mature enough to understand the utility of a bandwagon in inciting a revolution.

Frankly, I'm glad that Perl isn't competing against Linux. Quite the contrary -- now that both Perl and Linux are growing up, Perl is going out with Linux. I suppose that, as the father of Perl, this should make me nervous, but it doesn't. They have my blessing. May they prosper. May they have many happy years together. May they be fruitful and multiply. And dominate the world.