Thanks for surfing with a better browser. Please consider changing your web browser.
March 23, 2005

C++ lessons and Rob's implicit vs. explicit coupling

I ranted about build problems earlier: one step forward, two steps back. Sometimes taking two steps back can let you see things you couldn't see when you were standing closer. Here are few things that seem notable on my little journey into C++.

I found this outline I found helpful in explaining what a C compiler is doing and why I should have to care about linking at all. Without linking you don't get an executable file at all.

Today this notation confused me:

double *data;
...
for (i=0; i<3; i++) {
    for (j=0; j<3; j++) {
         *(data+i*3+j) = 10*i + j;
    }
}

From the context I knew it was creating a three-by-three matrix. But the syntactic mechanics were unclear. In particular:

*(data+i*3+j) = ...

I pulled one of Rob's books from the shelves at work, C++ Primer, 2nd Edition by Lippman. I flipped to the index as I almost always do with computer books. I learned some things. *foo is how you dereference a pointer. A perl analog would be dereferencing a reference: $$foo. *(expr) lets you treat the result of expr as a pointer. *(expr) = 10 puts the value 10 into that address in memory. Somewhat mind-bending, "the pointer and array notation are equivalent." [Lippman p. 46]

char buf[8] = "abcdefg";
buf[0] == *buf == 'a';

So that's pointer math. *(data+i*3+j). I've heard about pointer math in other contexts and understood it conceptually. Pointer math is that thing that programmers get wrong more often than not. It's considered a virtue of java and perl and many other languages that you can't do pointer math. Mistakes in pointer math are exploited by crackers, for example buffer overflow exploits. Long story short, about everything I've heard about pointer math has been shrouded in fear and doom.

Don't go there. That's a really bad part of town. You'll probably get hurt.

Now that I've got an example to chew on, I think I understand what they mean when they say "C lets you get closer to the metal". I've always felt these gaps in my knowledge about how computers do what they do. There's something very satisfying about having those gaps gradually colored-in.

The *(data+i*3+j) thing was still bugging me. A two dimensional matrix is supposed to look like this: data[i][j]. Lippman tells me these two forms are equivalent, but the latter looks two dimensional where the former looks one dimensional. I drew a little number line, and broke it into three sets of three to convince myself that the two were equivalent.

 0  1  2  3  4  5  6  7  8
 +--+--+--+--+--+--+--+--+
 i=0      i=1      i=2

Then there was the question of units. Its multiplying i by 3, but three of what? Then i remembered about strong typing. The compiler knows that data points to a double. The units must doubles.

Although I was sure I had this right, I asked Rob to confirm it. I'm glad I asked. I learned a useful visualization trick, some vocabulary, and got another window into one of his pet peeves.

Rob got out a piece of graph paper and drew this:

       +------+------+------+
   =6  |      |      |      |
       +------+------+------+
   =3  |      |      |      |
       +------+------+------+
i*3=0  |      |      |      |
       +------+------+------+
         j=0      =1     =2

Then Rob started on a rant. "This is really bad. There's all this implicit coupling. You have to know if it's row-major or column-major? And there's three, three, three. And the nested for-loops." From the context I figured out what row-major and column-major mean. I thought to myself, "Of course. If you're working with these very bare abstractions you need some language to describe the common problems." Let's look at the code again.

    double *data;
    ...
    for (i=0; i<3; i++) {
        for (j=0; j<3; j++) {
             *(data+i*3+j) = 10*i + j;
        }
    }

Which index gets multiplied by three? Should it be *(data+i*3+j) or *(data+i+j*3). One is row-major and the other is column-major. But it probably depends on how you label the axes on your graph. It also depends on how you choose to nest the for-loops. If you change any one of the threes to something else you have to just know to change the rest of them too. The for-loops, the variable names, the pointer expression, and the structure in memory are all coupled. In this little chunk of code these seem like pretty trivial complaints. But you're building this data for a reason. That means there's some other pile of code that has to have the correct order of the i's and j's and has to use threes and has to nest the loops the same way. Any code that gets passed this structure gets implicitly coupled to all the other code using this structure.

Implicit vs. Explicit coupling

To paraphrase Rob's coupling rant, there is always going to be coupling in your code. Implicit coupling is always bad. Explicit coupling may or may not be bad, but at least you know it's there.

I think Rob would consider the following to be more explicit about it's coupling.

int MATRIX_ROWS=3, MATRIX_COLS=3;
double *matrix;
...
for (row=0; row<MATRIX_ROWS*MATRIX_COLS; row += MATRIX_COLS) {
    for (col=0; col<MATRIX_COLS; col++) {
        *(matrix+row+col) = 10*(row/MATRIX_COLS) + col;
    }
}

Of course, I'd also expect Rob to say something like, "Why don't you just use perl? Then you can do this to build your matrix:"

my($matrix) = [
   [qw( 0  1  2)],
   [qw(10 11 12)],
   [qw(20 21 22)],
];

And you can index into it like this:

$cell = $matrix[$row][$col];

Isn't that a more natural way to express operations on a matrix?

Posted 09:08 PM | Comments (0)
March 13, 2005

Jython makes Drools declarations more palatable

I've been trying to play around with drools in fits and starts for over a year now or maybe longer. My first exposure to a rule engine was at PlanetCAD. We were going to integrate Envoy with some code which I mostly didn't like but had the redeeming quality that they used Jess for declaring pricing rules.

I wanted to play with Drools, but I just can't stand the XML syntax for declaring rules. It mixes angle-brackets with code blocks. You can choose a few different languages for use in the code blocks, but it still looks too much like JSP without JSTL. (Bob, aren't you the one who said "I hate programming in angle brackets"?) More importantly, I'm just tinkering with this stuff to get my head thinking in rules and Java doesn't feel like the right language for tinkering.

To that end I have written some code in jython to enable me to play with Drools with a pleasant python syntax. It's been kinda fun working on it in small doses after Elliott goes to bed.

Jython and Drools

I'm pretty amazed at how little code I needed to write. That could be a good sign. For your comparing pleasure, I wrote up a couple of the drools examples using my python syntax:

State Example: drools jython

Fibonacci Example: drools jython

The problem of translating this work to Groovy is left as an exercise for James Strachan. ;-) A project for the lazy web is to do something similar with Jess.

Update: Oh, the horror. Jython 2.1 doesn't do closures! The following code works okay in CPython 2.3, but throws an error in Jython:

import unittest

def fn_factory(x):
    def fn(y): return x + y
    return fn

class TestClosure(unittest.TestCase):
    def test_fn(self):
        fn = fn_factory(5)
        for i in range(3,10):
            self.assertEquals(fn(i), i + 5, 'i=%s fn(i)=%s'%(i, i+5))

if __name__ == '__main__':
    unittest.main()

There are ways to work around this, but they're all ugly and the whole point was to get away from ugly. I was so disappointed as I was debugging this and searching the web for more details. Sigh. It's pretty hard to do without closures once you get used to thinking that way. The alternatives seem so messy. A rev of Jython can't come too soon as far as I'm concerned.

Posted 12:08 PM | Comments (0)