?

Log in

No account? Create an account
Who, me? [userpic]

Low-level language, high-level fingers

April 29th, 2008 (10:53 pm)
current mood: introspective
current song: The rattling of the keyboard.

For the past 5 months now, I've been working in Ruby and JavaScript, with some personal work in Python. But, tonight, I'm hacking away at building a garbage collector, and I'm doing it in C++, because it wouldn't make sense to do it in a language that already has GC.

I'm finding that I resent C++'s verbosity and low power—I even have to tell CppUnit all the test classes, and all the methods in each class, because C++ doesn't have introspection. But the other thing I'm finding is that I'm typing much faster, and it feels satisfying. In reality, of course, I'm taking much longer to write this stuff than I would in a high-level language; but the feeling of having my fingers rev up and bang up code is...addictive.

I need to bookmark this post, for the next time I'm trying to pick a language for a product, so that I don't misead myself.

(At least I decided against doing it in C. I was thinking about it, on portability grounds, but this really is an experiment, to learn about GC; portability is not an issue.)

Comments

(Deleted comment)
Posted by: Who, me? (metageek)
Posted at: April 30th, 2008 02:32 pm (UTC)
Re: you could get that feeling

Not the same. I really did mean "high-level fingers": I think about what I need to write, in terms of my high-level understanding of C++, and my fingers take care of the details. Basically, my low-level knowledge of C++ is so ingrained that I don't need to think about it as much. I have a similar phenomenon with Emacs, which I've been using even longer than C++: all I perceive at the conscious level is my intent and the result; I don't think about the commands I give unless I have to do something unusual.

I suppose the lesson is that I should pick one high-level language and stick with it; in 10-15 years, I'll be able to barrel through it as well as I can with C++.

Posted by: Justin du Coeur (jducoeur)
Posted at: May 3rd, 2008 06:03 pm (UTC)

I even have to tell CppUnit all the test classes, and all the methods in each class, because C++ doesn't have introspection.

True, although at least some dialects are pretty good at self-registration. One trick we did a *lot* at Looking Glass and Buzzpad (which were both C++ shops) was to mark a harness as library-level (so that it loaded first), and then have classes self-register at load time. Sometimes got a little tricky to make sure that the linker got the dependencies right, but once we got the patterns correct things worked very smoothly, with pretty low coupling.

(In general, the trick with C++ always seems to be keeping the coupling down. Can be done, but it requires a lot of discipline and a good architecture.)

Posted by: Who, me? (metageek)
Posted at: May 5th, 2008 01:38 pm (UTC)
Loading tricks

mark a harness as library-level

That doesn't sound familiar; maybe it's a Windows thing?

But I think I remember a post where you went into greater detail; you called it the Ecology pattern, right? That did strike me as a really good idea.

In general, the trick with C++ always seems to be keeping the coupling down.

Yep. Even just if you want to speed up your builds.

When I started this garbage collection experiment, I wasn't expecting to go very far with it; I wrote unit tests mostly as a form of discipline, but kept them all in the same module. Now I'm starting to build it into a simple Scheme interpreter, complete with continuations; that's going to require better organization.

(My hope is to build an interpreter I can use to teach my kids. The point of doing it myself is so that, when they get old enough to get interested in the system level, they'll be able to drop down into the interpreter. Possible with, say, Python, but not easy.)

Posted by: Justin du Coeur (jducoeur)
Posted at: May 6th, 2008 01:49 am (UTC)
Re: Loading tricks

That doesn't sound familiar; maybe it's a Windows thing?

Entirely possible -- I'm pretty sure it was a pragma of some sort.

But I think I remember a post where you went into greater detail; you called it the Ecology pattern, right? That did strike me as a really good idea.

This isn't precisely the Ecology Pattern, although it can be used to help implement the Ecology Pattern.

Summarizing pretty briefly: the Ecology Pattern was originally designed for C++, as a way to deal with the dependency horrors that often arise there. The key concepts are:

-- The Singleton Pattern is *never* used, at least insofar as possible. In general, Singleton tends to be a mess in C++: besides the usual problem of making it impossible to stub classes for testing, it introduces unfortunate dependencies that tend to produce Compile Hell.
-- *Every* top-level system "singleton" (the main components of the system) has an interface, and they communicate exclusively via those interfaces. By strictly adhering to this rule, you reduce the cross-module relationships to pure virtual interfaces, which is generally a huge win from a compilation dependency POV.
-- The main module knows which objects to instantiate, but it doesn't need to know exactly what those modules do, nor what their internal dependencies are. Instead, each module, upon creation, registers itself in the Ecology, declaring what interfaces it publishes, and which interfaces it depends upon. ("Depends upon" mainly meaning that it requires that other interface for full initialization -- so for example, many things depend on the config and logging systems in order to initialize, and some depend on thread managers, high-level communication, file management and so on.)
-- The objects do very little at construction time. Instead, there are formal Init and Term phases. Once all objects are created, the Ecology system does a topological sort of the dependencies, to figure out what order to initialize in, and then calls Init on each module in turn. At shutdown time, it calls Term in the reverse of that order. This way, the top-level objects can depend on other objects for initialization, without anything needing to have a global understanding of the dependencies (which is usually hard to maintain).
-- Each top-level object stores a pointer to the Ecology, throughout the lifespan of the program.
-- Objects find each other by some kind of unique ID. In Java or C# this is a class reference; in C++ you generally have to have each object declare a unique string name or UUID. The Ecology maintains a hash of which UUIDs identify which interfaces -- anything can fetch an object from the Ecology by passing in the identifier to get the desired interface. The objects are allowed to store these pointers if they choose, so long as those pointers are not used outside the Init/Term window.

All of this requires a bit of discipline: while I often refer to it as a pattern, it's really a full-scale program architecture. But I've found that, by adhering to this architecture strictly, I wind up with programs that are *delightfully* decoupled: flexible, easy to adjust, with few annoying cross-dependencies getting in the way. I've used this model at my past five companies, and it's always been a great benefit...

Posted by: Who, me? (metageek)
Posted at: May 6th, 2008 04:34 pm (UTC)
Re: Loading tricks

Yeah, I remember—when you posted about it, I was sufficiently impressed by the idea that I implemented most of it as an exercise, just to make it stick in my head for later.

It occurs to me, though, that there's a better solution to the unique ID problem. Every class that needs to be constructed has to declare a static member whose Init/Term methods will be called, right? Why not refer to that member more directly? You don't want class B referring directly to class A, since then you've got tighter coupling; but you can do:

// In Component.hpp:
class Component {
public:
virtual Component** dependencies()=0;
};

// In AComponent.hpp:

class AComponent: public Component {
public:
static AComponent constructMe;
virtual Component** dependencies() {return 0;}

};

// In A.hpp:
// ...actual declaration of A

// In A.cpp:
#include "AComponent.hpp"
#include "A.hpp"
AComponent AComponent::constructMe;

// ...implementation of A and AComponent.

// In BComponent.hpp:
#include "AComponent.hpp"
class BComponent: public Component {
public:
static BComponent constructMe;
virtual Component** dependencies()
{
  static Component* res[]={&AComponent::constructMe,0};
  return res;
}
};

This way, the compiler and linker take care of ensuring uniqueness, but A and B don't have to know about each other. AComponent and BComponent have to know each other's declarations, but they're pretty simple.

Posted by: Justin du Coeur (jducoeur)
Posted at: May 6th, 2008 11:52 pm (UTC)
Re: Loading tricks

I'm not entirely sure what you're proposing here, so I want to be a little careful -- this is pretty different from my usual C++ coding paradigm.

Is AComponent supposed to be the interface? If so, this isn't quite right: you have the interface declaring the dependencies. But that's not what you want -- it should be the concrete implementation that declares the dependencies. A concrete class can implement any number of interfaces, but it depends on a single list of items.

This also breaks the abstraction barrier, at least a bit: to depend on, say, BComponent, you're being exposed to some of its implementation details. (Specifically, its dependency map.) That might be survivable, but it's less clean, and every abstraction break gradually costs you.

(Keep in mind that this model was originally evolved for codebases on the order of several megabytes, with dozens of libraries: abstraction breaks get *very* expensive as you scale the code up. It was written in reaction to the tendency for complex C++ projects to take an eternity to compile, because one little change in a header causes massive knock-in recompilations. So we got *very* strict about the abstraction barriers, to keep compile times fast.)

Also, I'm not quite clear what "constructMe" is supposed to indicate, but the choice of name makes me uneasy. B shouldn't know anything about the construction of A -- again, A could be implementing many separate interfaces. It looks like the interfaces as shown above aren't actually interfaces, since they have concrete implementations. When I say "interface", I mean exactly that: classes that contain *nothing* but pure virtuals. Anything less than that, and linkage becomes a lot more complex.

So while there is a top-level "component" interface, which describes how a component talks to the Ecology controller, the functional interfaces don't subclass from that. Instead, they're just conventional COM interfaces, with nothing but pure virtuals in them. (The nearest thing C++ has to a true "interface".)

The bias towards UUIDs, BTW, reflects a bias towards COM, at least in the purest sense. One thing I gradually learned, during the eight or so years I was working in C++, was that COM at least *started* as one of those things Microsoft did right. Specifically, the core IUnknown interface is, IMO, just about a necessity for good C++ programming. Most of the stuff that got layered on *top* of that was fatty junk, but we actually more or less reimplemented COM from scratch at all of the C++ shops I worked at (Looking Glass, Trenza and Buzzpad), because it was a good starting point...

Posted by: Who, me? (metageek)
Posted at: May 7th, 2008 01:17 am (UTC)
Re: Loading tricks

Is AComponent supposed to be the interface?

No, it's supposed to be the dependencies. B requires A to be initialized first, so BComponent declares a dependency on AComponent; but B doesn't have to know anything about A. The constructor for Component registers it with the framework, which calls dependencies() to get the edges of the dependency graph. That's what the static constructMe members are for: their constructors do the registration.

However, I suspect what you're talking about is better, with components making interfaces available, and requesting interfaces rather than components.

In your other comment, you wrote:

The only element that crosses the boundaries is the main program, which knows about all of these libraries, instantiates them, and sets up the Ecology for them to talk to each other.

Huh—my concern there would be that the main program shouldn't have to know that A depends on B; all it should know is that it wants to use A. As long as you link in the code for A and B, it should Just Work.

That's not incompatible with the idea of components providing interfaces, though. Component B would register as providing interfaces IB and IC, and component A would register as providing IA and requiring IC. The main program then asks the framework for an instance of IA, and the framework makes sure to initialize components B and A, in that order, before returning an IA. All this can still be done using C++ identifiers instead of UUIDs, as long as you're willing to let each interface have a single static member. It doesn't matter what the member is; it just has to have a consistent name, such as IA::interfaceID, and they all have to be the same type, such as int. Then the identifier for IA would be &IA::interfaceID.

So the usage of static and a method body inside AComponent.hpp makes me rather nervous. I confess that I'm out of practice enough with C++ that I'm not sure of the compile and link behaviour that would result, but I'd be concerned that anything that depends on AComponent can't be compiled without also compiling A, breaking the ability to compile and link libraries separately.

C++ requires that all the statics in the program be constructed before main() starts, and destructed after it finishes (modulo crashes, of course). Similarly, under Unix, if you call dlopen() to load a shared library by hand, its statics get constructed automatically before dlopen() returns. I can't speak for the Windows equivalent.

As for the separate compilation—it shouldn't be a problem. Compiling B.cpp would pull in source from AComponent.hpp, but it wouldn't actually generate any code; it'd just generate external references in the .o file. When you generate a .so (equivalent of .dll), the linker resolves all the references it can, and defers dangling references until load time; it doesn't matter if there's no .o or .so out there yet that defines those references.

Posted by: Justin du Coeur (jducoeur)
Posted at: May 7th, 2008 02:44 pm (UTC)
Re: Loading tricks

Huh—my concern there would be that the main program shouldn't have to know that A depends on B; all it should know is that it wants to use A. As long as you link in the code for A and B, it should Just Work.

I'm actually not certain we're disagreeing at all here -- I agree with that statement. You say:

The main program then asks the framework for an instance of IA, and the framework makes sure to initialize components B and A, in that order, before returning an IA.

When I say "main program", I mostly *mean* the framework. In a typical Ecology-based program, the main program really just instantiates a bunch of objects and says "go": the objects themselves do all the work.

The only real question is how the framework knows what objects to instantiate. I put that explicitly at the top level, mostly because it makes testing so easy: just create a different top level, that instantiates stubs and simulators for some of the components, and *poof* -- you have a test harness. But there are probably other ways to manage that.

C++ requires that all the statics in the program be constructed before main() starts, and destructed after it finishes (modulo crashes, of course). Similarly, under Unix, if you call dlopen() to load a shared library by hand, its statics get constructed automatically before dlopen() returns. I can't speak for the Windows equivalent.

Largely the same, although this is where the "library level" pragma I referred to before becomes really useful. That pragma basically says, "I'm a distinguished library thing, so construct me *before* constructing any other statics".

In cases where you *are* using Singleton-like approaches, this can be quite useful. You declare some sort of registrar as a library-level static; then, your system singletons can automatically and safely self-register as part of their own static loading. It almost allows you to build completely self-assembling programs.

The fly in the ointment is that, if you carry this too far, the linker often doesn't even realize that the class needs to be loaded, so *some* kind of reference is needed. But I've sometimes used variants of this trick to build self-assembling data collections, where I simply have to define a datum and it self-registers itself safely at startup. It's similar to Singleton, but a tad more predictable in its behaviour and timing.

(In practice, I essentially never use statics any more, aside from defining constants, or at least only use them as a last resort: they've bitten me on the ass too many times. But they do sometimes have appeal...)

Posted by: Who, me? (metageek)
Posted at: May 7th, 2008 03:11 pm (UTC)
Re: Loading tricks

The only real question is how the framework knows what objects to instantiate. I put that explicitly at the top level, mostly because it makes testing so easy: just create a different top level,

Right. I suppose I was envisioning that being done by controlling what you link in; but that limits you to library-level or module-level granularity.

That pragma basically says, "I'm a distinguished library thing, so construct me *before* constructing any other statics".

Oh, interesting.

Posted by: Justin du Coeur (jducoeur)
Posted at: May 7th, 2008 12:13 am (UTC)
Re: Loading tricks

An additional bit of subtext, that it suddenly occurs to me isn't clear:

I tend to focus on the benefits of the Ecology pattern as they relate to C# and Java nowadays, because those are the languages I've been working in for six years now. But there are additional motivations behind it WRT C++, and one of the most critical is *linkage* separation, not just abstraction separation.

In a really big project, you typically have things broken into a whole bunch of separately-compiled libraries, that are then linked together. The reason why the Ecology pattern is so fanatically focused on pure virtual interfaces is that they can be used across libraries without introducing linker dependencies. This means that the libraries can be built completely separately, and linked together at the end, without the messiness that concrete-class dependencies introduce into the compile equation.

So basically, in a well-constructed Ecology-based program, each library is completely independent from a linkage POV. They depend on each other's headers, but can be built cleanly and completely, in any order, without any other compilation dependency. The only element that crosses the boundaries is the main program, which knows about all of these libraries, instantiates them, and sets up the Ecology for them to talk to each other.

So the usage of static and a method body inside AComponent.hpp makes me rather nervous. I confess that I'm out of practice enough with C++ that I'm not sure of the compile and link behaviour that would result, but I'd be concerned that anything that depends on AComponent can't be compiled without also compiling A, breaking the ability to compile and link libraries separately.

(And yes: this is overkill on small projects. The Ecology pattern is very much optimized for what I think of as "mid-sized" projects in general: large enough to be complicated, but small enough that talking about a single process is still interesting. Nearly everything I've worked on has fit that general level, so I use the pattern out of habit.)

Posted by: Who, me? (metageek)
Posted at: May 7th, 2008 01:18 am (UTC)
Re: Loading tricks

And yes: this is overkill on small projects.

Yup. My current C++ hack is definitely intended to stay smaller than that; I want to produce a tiny interpreter, for teaching purposes.

12 Read Comments