Tuesday, March 13, 2007

They all say - nothing

XML is the eXtensible Markup Language, and obviously is the current kittchen sink of communication. Its purpose is to mark up text, to say 'this part is such-and-such, and that part is a order number'. But if you came here you knew that anyway.

Now, there is a strange tendency. XML is used everywhere, and most files containing XML have a peculiar property: Remove the <tags>, and they contain - nothing. There isn't anything but white space that is 'marked up'. Instead the tags and their attributes represent all the relevant data. And I wonder why they still don't do away with the <> and use a notation that is actually up to the task. After all, the XML syntax is just so it can be embedded in marked-up text. It's like using a hammer to drive in a screw, just for the lack of a standardized screwdriver.

Labels:

Saturday, November 11, 2006

No keywords

Ever wondered why languages have keywords? Well, they even fight over it -- some smalltalk proponent argued that smalltalk having just five keywords is way better than ruby having thirtysome. But special cases are usually only zero, one or infinity, and in this case one is out, too.

So indeed a language should at best not have any keywords at all. But the only that manage to do so are from the lisp family. In the others you sometimes get the really strange error. In C it is perfectly legal to say int class;, and thus the X11 people did exactly that (in a structure declaration). Use that header from C++, and you get a rather unexpected syntax error. A reserved keyword is not an identifier, and thus raises the syntax error. Also the C++ community continues inventing new keywords while C goes to strange things like long long to avoid that.

The solution is to make keywords into regular entries in the global symbol table with special values. That way any other scope could use any identifier it likes without fear of clashes with esoteric or future 'keywords'. There is one disadvantage: Parsing is affected by semantic analysis, but then that is no news, remember typedef which is explicit in C and implicit in C++ class declarations. But then, since gloop is to have macros, this is no additional disadvantage.

And to answer the question: Keywords are most easily handled directly as special tokens, and they have been invented waaay before the concept of namespace pollution.

Tuesday, October 31, 2006

The human compiler, #1

At it again... You want to get a sum over a subset of objects in a list. You don't want to iterate over the objects explicitly (as in for (ob = first_obj (list); ob; ob = next_obj (ob))), because that gets pretty annoying pretty soon, and also becomes interesting when you have to change that a bit for an explicit iterator state object.

Instead there is a function iter_list which calls a function of your for each object in the list, completely hiding the actual process of iteration. Unfortunately, now you need somehow pass your context into the callback function, and, being in C, iter_list accepts a separate parameter called userdata that it just passes to the callback for your use. Thus you do:

struct ctx {
  int sum;
  char *pref;
};
int cb (obj_t *obj, void *userdata) {
  struct ctx *cx = userdata;
  if (match (obj->name, cx->pref)) {
    sum += obj->count;
  }
}
int sum_prefixed (oblist_t *list, char *prefix) {
  struct ctx C;
  C.pref = prefix;
  C.sum = 0;
  iter_list (list, cb, &C);
  return C.sum;
}
Not quite amusing, because C does not have any local functions with lexical scoping. In a better language one could write
int sum_prefixed (oblist_t *list, char *prefix) {
  int sum = 0;
  int cb (obj_t *obj) {
    if (match (obj->name, pref)) {
      sum += obj->count;
    }
  }
  iter_list (list, cb);
  return sum;
}
The local function cbhas access to the variables surrounding its own definition. In essence the compiler now does what we did (pattern-like) above, by hand. To be even able to do the hand-trick, iter_list needs to have the third, pass-through, parameter; in the version where local functions are available this is no longer needed.

In fact the local function above works in Gnu C as shown as an language extension.

The next simplification would be to allow the function body directly in the place where it is needed, instead of defining a function and then using it's name:

int sum_prefixed (oblist_t *list, char *prefix) {
  int sum = 0;
  iter_list (list, int (obj_t *obj) {
    if (match (obj->name, pref)) {
      sum += obj->count;
    }
  });
  return sum;
}
The syntax gets a bit tricky here, but this is already very close to a form where iter_list looks like a real loop header construct. Alas, break won't work yet.

But since I have to use C in this project, I need to partially do the job of a compiler, and put down patterns again and again. Also, some languages spare me of actually declaring the type signature of the anonymous function and derive it from type of the argument for which it is used. That is called type inference, and in Java you would want it all the time, hypo.Wouldnt w = new hypo.Wouldnt () you?

Saturday, September 30, 2006

Thou shalt google first

I just tried to see whether google would find this blog yet, only to find out that gloop has quite another meaning already. A meaning rather close to the current state of affairs in popular programming languages, however. So I stay with the blog name.

Saturday, September 16, 2006

Gloop, the universe and everything?

I don't know that this blog is going to turn out to be. But the name is a reference to the three programming languages bloop, floop, and gloop that Douglas Hofstadter introduced to illustrate a few points in computability.

bloop is especially designed not to be turing-complete, and he removes one constraint to create floop which is. That one still is unable to compute every imaginable function and thus he continues to gloop, only to confront us with the fact that there is no such thing. No programming language can be more powerful that floop.

From the theoretical point of view, that is. In practice there is a big difference in how much effort you need to express in different languages what you want the computer to do. Programming is much about abstracting away repeating chores, and it is astonishing how much of those can't be automated in the popular languages of the day.

By now I am actively suffering from the fact that there are so many things that a good language would make easier but such language either don't exist or aren't reasonable to use in the field I need them.

Closures, anonymous functions and proper lexical scoping are the things I would like to play with. But a proper macro system is the one thing that would really spare me a lot of tedious work. To be able to toy with these concepts I am hacking a language that is more or less going to be a (im)proper java: static typing with local functions and closures, and most importantly with means to extend the parser and to write macros that can use the full power of the language.

One additional aspect is to make the virtual machine fully persistent meaning that it can simply resume execution of a program after the VM has terminated for some external reason.

So this language and other linguistical musings will be the topic of this blog. I'm probably not going to rant about aspherical lenses for glasses here.

And no, I don't have the compiler ready. At the moment I have a very basic lexer and parser. The third version, actually. I started with C++ (ugly), then Ruby (good), then Nice (not bad). The last try is partly because the Ruby version was already two months ago and partly because I have hopes that the Nice version is more easily translated into the target language itself. Next job is to figure out a VM and execution model.