Update: This page seems to draw a fair bit of traffic from Google. In addition to the body of this post, there are some Java samples in the comments. Have a look there too.
For one problem last week I had two tricks to figure out: how to concatenate PDF forms and how to fill in some PDF form fields. With Acrobat people can create PDF forms which you can complete with Reader. In our case these are multi-page tax forms. The IRS defined the forms -- they're not under our control. iText was the tool of choice, but I didn't know the API. The concat_pdf tool put the forms together well enough, but it trashed the data in the forms.
I used jython to experiment with the API and diagnose the problem. It turns out that the names of the form fields on several of the forms were the same. It was a simple problem of name collisions. Jython was entirely great for diagnosing the problem. I could interrogate the forms before and after concatenation to find out their field names and values. I tried and tried and failed and failed to get Jython and iText to change the names of those fields. I spent entirely too much time in trial and error (and error) failing to bend the iText API to my task. Attempts to create subclasses or delegates around the API met with various limits -- crucial methods that were protected or whatever. There's a separate story here about recognizing when you're on the wrong path or using the wrong tools. I find myself down that dead end more often than I'd like to admit. But this is a different story, so I won't go there now.
At some point I remembered Rob's story about a colleague who spent a long time implementing the PDF spec to generate correct PDFs that nevertheless wouldn't work with Acrobat. It seems the spec and the implementations differ. (When has that ever happened?) The point of the story was that they eventually threw out the carefully crafted tool and used perl string replacement on existing files created with Acrobat. So I turned my attention to seeing if I could find a useful pattern in the field names that would yield to perl's regular expression prowess.
Jython again came in handy for extracting all the field names. All the time walking down dead ends had left me well enough acquainted with PDF internals to see the boundaries of the pattern. Emacs had been in the background of all of these tasks, but it came front and center as I tested my theory about the name collisions and about the pattern. Sure enough, once I ensured that all the fields were uniquely named, the concatenation worked quite smoothly. Quickly enough I had a perl solution to renaming the fields that was really fast.
PDFs are pretty on the display and printing side of things, but pretty ugly on the inside. Paul ended up throwing out my solution too. He found things in the beta versions of iText that allow PDF forms to be "flattened". Then the form field names aren't an issue and the files are smaller too. So all I have to show for my work is a little unwanted knowledge about PDF internals and a story for my blog about technical-pot-luck problem solving. That said, I'll include a little code here in case the string replacement trick for enforcing unique field names helps someone else from avoiding dead-ends.
my $ax = 'aa';
foreach my $file (@pdf_files) {
$file =~ s{\(([cf]\d-[a-z0-9]+)\)}{($ax-$1)}g;
$ax++;
# save the files to disk
}
The key part is that field names are delimited with parenthesis. In my case the field names themselves were fairly predictable. They looked like this: (f1-04) or in some cases (c4-alpha). I don't think you can just count on finding parentheses -- PDFs are more complex than that. (The $ax = 'aa'; $ax++ thing is a fun perl trick. Perl will increment the string alphanumerically thusly: aa, ab, ac ...)
iText and Jython make it easy to get the field names from a PDF (assuming you're not in control of those field names). Here's how:
% env CLASSPATH=./iText.jar jython
>>> from com.lowagie.text.pdf import PdfReader
>>> reader = PdfReader('path/to/your.pdf')
>>> [f.name for f in reader.acroForm.fields]
Then you can analyze the results and figure out your own replacement pattern.
The complaint: "Clicking on PDF links in Safari doesn't work."
The problem is that Apple's built-in viewer, Preview, doesn't display the contents of Acrobat Forms. In our (bivio's) case, the PDF forms in question are supplied by the IRS and filled-in by our software. In Preview, you can see the empty form but none of the data filled-in. You have to use Adobe Reader. Moreover, if you want Safari to open the files in Adobe Reader by default instead of Preview you must set the default application for PDFs in the Finder.
These instructions were tested with Mac OS X 10.3.2. Other versions of OS X may differ slightly, but the general idea should still apply.
The short story is 'Get Info' for any PDF file, change the 'Open With' settings from Preview to Adobe Acrobat, and then 'Change All...' to make all PDF files open with Acrobat.
Once you've done that, Safari will honor that setting and any PDF files downloaded will be opened with Reader instead of Preview.
Here is the play-by-play in case you need more detailed instructions:
1. Switch to the the Finder, (click the smiley icon in the Dock).
2. Locate any file whose name ends with .pdf. [1]
3. Select the file (click ONCE on it to highlight it).
4. In the File menu, select 'Get Info' (you can also use the keyboard shortcut Command-I) [2]
5. Reveal the 'Open with:' section of the window. (The Info window is divided into several sections which can be revealed or hidden using the triangle widget on the left hand side of the window. Use the triangle to make sure the 'Open with:' section is shown.) [3]
6. Change the 'Open with:' pop-up menu from Preview to Acrobat Reader
7. Click the 'Change All...' button, and click the 'Continue' button when you are prompted to confirm the change.
I personally prefer to leave Preview my default because it is much faster and the PDF files I usually read are not forms. I keep a copy of Adobe Reader around for the few times when I do need to deal with forms.
[1, 2, 3] It is hard to know where to draw the line in providing step-by-step written instructions. In this case I'm willing to assume you know how to navigate your computer with the Finder. I'll also assume that if you're interested in keyboard shortcuts, you probably don't need a description of Apple's command key. Somehow the triangle widget in the Info window crosses the threshold and demands a little explanation.
Three posts in today's RSS feeds mentioned code generation. My first exposure to code generation was in contributions to Torque (which also made me a Turbine committer).
While talking about Spring, Chris mentioned a new code generation system he's written.
Simon prefers data driven programming to code generation.
We considered using code generators for our current major project at work, and picked up Jack Herrington's book on the subject. Reading through it, it became clear that many of the problems that code generators solve can be tackled instead using data driven programming techniques made possible by dynamic languages. Since we had already settled on Python as our implementation language the need for code generation became far less apparent, and we ended up avoiding it entirely with the exception of a command line tool for passvely generating basic templates for our admin interface.
Jon endorses dynamic languages while discussing Programs That Write Programs.
We've always known that dynamic languages are a great way to create "little languages" for specific tasks. But we don't yet fully appreciate that all programming is a continuous process of language invention. And we don't (yet) evaluate programming-language productivity on those terms. .... We are linguistic animals endowed with a protean ability to generate language. Naturally we'll want that same generative power in our programming languages.
Jon has written a perfect description of lisp. Guy Steele shared this wisdom in his inspired presentation at OOPSLA 1998 Growing a Language. (140K PostScript) Paul Graham calls it bottom-up programming. I'm still slowly working my way through On Lisp. Lisp macros are code generators -- programs that write programs. Code generation is baked into the language. Although, Paul says macros make lisp more powerful than other languages, he also observes "macros are harder to write than ordinary Lisp functions, and it's considered to be bad style to use them when they're not necessary."
Lisp is the mother of dynamic languages. Its secret power is in the ease with which it can be extended into custom "little languages" which match the domain of the program under development. But code generation is only a part of that power. Functions and lexical closures are also quite powerful.
I've been programming in perl since I joined as an apprentice to bivio Software Artisans. In 1999 perl [was chosen] for [its] Lispishness, ubiquity and reliability. Though python and ruby are more fashionable, perl is still quite capable and lispish. It's fun applying lessons from On Lisp to my daily perl hacking. It's also fun learning to grow a language.
The future is here, just not evenly distributed. Lisp programmers understand about the process of continuous language invention. It's the rest of us who have yet to fully embrace that wisdom. The bad news is that it's hard work expanding your programming mind set, judging by the time it's taking me to digest On Lisp. The good news is the lisp hackers have been blazing this trail for over 40 years and there are good travel guides if you ever decide to set out in that direction.
The COMUG mailing list brought this article to my attention: Can Apple Keep the Worms Out? My reply to the thread got long enough that I thought I'd keep it here for posterity. I've elaborated on my comments from the last pair of rampaging worms: Sobig and Blaster rekindle the OS Wars
The article presents a fairly balanced story, but I have to take issue with the conclusion.
The standard Mac gloat ... goes something like this: I didn't get this virus because I have a Mac. In fact, I never get viruses. Never have, never will.
The standard Windows community reply: The reason you don't get viruses is because so few people have Macs. In fact, hackers think Macs are so marginal they don't even bother with figuring out ways to break into them or infect them with viruses. If 95% of the world used Macs, you can bet they would catch viruses all the time.
....
Now that Apple has Unix under the hood, ... the argument that Apple is safer because of its marginal place in computing's cosmos no longer applies. With its embrace of Unix, Apple has joined a big family -- and it keeps growing, thanks to Linux and other open-source versions of Unix.
I disagree about the "no longer applies" business.
While it is true that unix is a big family -- much bigger than OS 9 and earlier, it is also a remarkably diverse family. The Windows operating systems are all the product of a single company. When vulnerabilities are discovered within one version of windows, they are frequently also present in other versions.
By contrast UNIX has been developed separately by many different organizations: AT&T, UC Berkely, Free Software Foundation, Sun, HP, Novell, SCO, IBM, Silicon Graphics, NeXT, RedHat, Suse, Apple, and many others. HPUX vulnerabilities are likely to be quite different from the vulnerabilities in IRIX, or Solaris, or Linux. Each will undoubtedly have vulnerabilities, but few vulnerabilities will effect all of that diverse family. Moreover, there's a habit among the unix culture of compiling software locally to ensure the most important tools are optimized for local needs. That adds another layer of diversity between systems. All of that diversity makes it much harder for a cracker to create a worm that will have the same impact among the UNIX family as they can among the Windows family.
Another important distinction is that UNIX grew up connected to the Internet whereas Windows didn't. The Great Worm was unleashed in November 1988. It effected 6000 Sun and VAX systems but left the rest of the unix family untouched. There are two points to emphasize: it illustrates the fact that it's hard to make one worm effect all of the unix family. Second, the Great Worm revealed the need for network security to unix developers fifteen years ago. For that reason there's deeper security knowledge in the unix developer community than in the windows developer community.
But here's the more important point. It doesn't really matter if Macs would have more viruses if they were the dominant OS. Whether or not that claim is true, that world is a fantasy. The world we live in is the one where Windows is the dominant OS. And the viruses and worms in the real world breed and feed in the Windows ecosystem. That reality is unlikely to change for the foreseeable future, say the next five years or maybe ten.
There are going to continue to be many, many old windows computers out there connected by broadband to the 'Net. Even if Microsoft is cleaning up their act and making security a priority for XP and longhorn, there are still going to be many easy targets out there for the crackers to abuse. The crackers are predators and will prefer the easy targets as all predators do. Windows will continue to be an easier target than UNIX.
This doesn't mean we Mac geeks (or other unix geeks) can ignore worms. But we will get to enjoy our position of relative security for a long time to come.
For this collection of observations, I wish I could quickly collect some video fragments to illustrate the point instead of just writing about it, or that I could easily animate some examples. Neither of those are practical at the moment so words alone will have to do.
Snakes are the place to start with this pattern. Snakes move their bodies in a wave like motion which propels them forward along the ground. Alligators (or is it crocodiles -- maybe both) have legs which stick out from the sides of their torso, as opposed to the way mammals have legs underneath their torso. You can see a more subdued wave pattern along the spine of alligators when they move. I think the same is mostly true of how small lizards and geckos move, but they move quickly enough that I can't quite see it with my own eyes. Obviously the legs and feet provide some valuable traction and leverage, but the movement of the torso still exhibits the wave. The same wave is also pretty evident in the way sharks (and other fish) swim. Swimming snakes slither through water much as they do over land -- eels too.
As a kid I was fascinated that dolphins had the same sort of wave, but up and down instead of side to side. I even had a personal kind of classification of the good guys vs bad guys in the animal kingdom based on which way their wave went. Whales and dolphins and sea lions and otters all wave up and down, whereas sharks and eels and barracudas wave side to side. As I try to visualize it now, I'm not entirely convinced that sea lions and otters and walruses wave up and down -- I'll have to look for that next time I'm sitting in front of a nature show on the subject. Thinking of the shape of their hind flippers reinforces the notion, but it's been a long time since I actually looked for it.
Last year while walking our dog, Ellie, I noticed with great surprise that she's got a slight side-to-side wave in her spine when she walks or trots. I can't keep my eyes on her spine long enough to know if it's also the case when she runs, though it seems likely.
A few months ago, Sarah and I rented Winged Migration. I've been meaning to mention it in my blog ever since. I'm interested in the way birds fly and I wish I knew more about migration patterns, so I was naturally fascinated by the film. I was struck by the close-ups of flying geese. Their shoulders move up and down while their heads and tails seem to stay in place. In fact it looks somewhat like watching an accomplished athlete swim the butterfly stroke -- up and down waves.
At park in Toronto, I once noticed that seagulls appear to row through the air. I've tried to see that in other seagulls on other occasions. I can't spot it very consistently. In that park it was quite pronounced.
Segmentation is one of the earliest evolutionary organizations to emerge. This notion is suggested by the many examples spread broadly throughout the animal kingdom. You can see it in earthworms, and in insects. In larger creatures segmentation is most apparent in the spine and rib cage. It's especially apparent in snakes and eels 'cos they're all spine. My casual observations suggest that the more segmentation in the critter, the more apparent the wave in their motion. I think these waves emerge from the segmentation in the anatomy.