onsdag, januari 23, 2008

Polyglot, not panglot or omniglot

It's interesting. One of the common reactions I've heard to my recent writings about polyglot programming is something I really don't understand. Actually, I've heard the same objection to other persons writing about polyglot.

The objection is that just because I propose polyglot programming - using several different programming languages for different purposes in the same system - I can use whichever language
and as such should not try to find better languages or say that certain languages are bad.

But that's really a confounding of the issue. Just because I can use any language in the dynamic layer doesn't mean I should. In fact, just because polyglot programming as a strategy means you will use more than one language, it is even more important to be careful and use the best languages available for the task. Which is why I'm working to improve JRuby, why I'm evaluating Scala as a replacement for Java, why I'm working on a language based on Io. It's all about using the best languages. I may be a polyglot, but I'm definitely not a panglot or omniglot.

fredag, januari 18, 2008

Distributed version control and SVN

As everyone else has discovered, I've just realized how nice DVCS can be. Of course, I'm part of several open source projects that all happen to use Subversion. I've tried Bazaar, Mercurial, Git and SVK. Of all these, Mercurial is the one I really, really like. But here is the problem: hgsvn is really not up to scratch right now. I tried to pull some stuff from SVN at one project and it immediately errored out. git-svn on the other hand works like a charm. It seems that right now I have to use git, unless someone has a good solution for getting Mercurial really working with SVN. (Also, the fact that hgsvn is pull-only is a bit problematic.)

Advice appreciated.

Ruby antipattern: Using eval without positioning information

I have noticed that the default way eval, instance_eval(String) and module_eval(String) almost never does what you want unless you supply positioning information. Oh, it will execute all right, and provided you have no bugs in your code, everything is fine. But sooner or later you or someone else will need to debug the code in that eval. In those cases it's highly annoying when the full error message is "(eval):1: redefining constant Foo". Sure, in most cases you can still get all the information. But why not just make sure that everything needed is there to trace the call?

I would recommend changing all places you have eval, or where using instance_eval or module_eval that takes a string argument, into the version that takes a file and line number:
eval("puts 'hello world'")
# becomes
eval("puts 'hello world'", binding, __FILE__, __LINE__)

"str".instance_eval("puts self")
# becomes
"str".instance_eval("puts self", __FILE__, __LINE__)

# becomes
String.module_eval("A=1", __FILE__, __LINE__)

onsdag, januari 16, 2008

Viability of Java and the stable layer

This post is basically a follow up to Language explorations. That post had several statements about what I believe, and some of them have been a bit misunderstood, so I'll try to substantiate them a bit more here. Specifically, I want to talk about the viability of the Java language and whether "the stable layer" should be a static or dynamic language.

Lets begin with the stable layer. First, notice that I didn't have any specific definitions for the different layers when I wrote the post. In fact, I think that they are inherently a bit fuzzy. That's OK. DSL's are also quite fuzzy. The attributes I put at the stable layer is first of all that it shouldn't need to change much. In a smaller application, you can actually look at supporting structures as the stable layer, things like application servers or web servers. But for larger applications I find that you usually need your own stable kernel.

The reasons I choose static typing for stable layer is for several reasons. Performance is absolutely one of these, since everything will run on this base it's an advantage if that part is as performant as possible. Secondly, I am no dynamic language, TDD purist that says that static typing is totally unnecessary. Static type checking can be very helpful, but it stops you from having a malleable language. So the kernel should be stable, this means that as much checking as possible is necessary. None of these requirements explicitly preclude a dynamic language, but the specific features of dynamic languages doesn't really come to their rights at this level, and also leading to performance and static type checking suffering.

The more contentious part about my post seems to have been my off hand comment that I don't believe Java is really good for anything. I didn't give much reason in that post, and people have reacted in different ways, from saying that "Java is really good" to "Why are you involved in JRuby if you think Java is so bad?". So lets start with why Java is a bad language. (I am only talking about the language here, not the platform). Also, that Java is a bad doesn't say anything about the worse alternatives, so no comments along the lines of "if you think Java is so bad, why don't you use X instead".

In short:
  • Java is extremely verbose. This is really a few different problems in one:
    • No type inference
    • Verbose generic type signatures
    • Anonymous class implementations
  • There is no way to create new kinds of abstractions:
    • No closures
    • Anonymous classes are a really clunky way to handle "block" functionality
  • Broken generics implementation
  • Language is becoming bloated
A few notes. That Java is becoming bloated is one of those things that people have been blogging about lately. After 1.5 came out, the complexity of the language got very much larger, without actually adding much punch for it. The improvements of generics were mostly negated by the verbosity of them. Fixing all the problems above will bloat the language even more. And at the same time, programming in Java is needlessly painful right now.

So, lets look at the question about why I'm working on JRuby. Well, first, I believe the JVM is a very good place to be at. It's the best platform out there, in my opinion. The libraries are great, the runtime is awesome, and it's available basically everywhere. The bytecodes of the JVM spec is good enough for most purposes. There is some tweaking that can be done (which we are looking at in JSR292), but mostly it's a very nice place. And working on JRuby is really one of the ways I've seen how bad Java as a language is. We are employing several different tricks to get away from the worst parts of it, though. Code generation is used in several places. We are generating byte codes at runtime. We are using annotations to centralize handling of JRuby methods. And we are moving parts of the implementation to Ruby. I believe that JRuby is important because it can run in the same environment as Java, but without the problems of Java.

What are the solutions to the problem with Java? There are basically two different ways. Either define a subset of Java, not necessarily totally compatible, that takes the best parts of Java syntax, does away with the problems and so on. That should be possible, but I don't know anyone who has done it. Or, you can go with an existing static language on the JVM. Here you have two choices - either one of the ports of existing extremely static languages (like ML or Haskell), or you can go with something like Scala. I haven't decided on the best solution here yet. The only thing I'm sure of is that Java in itself is a problem. I'm investigating Scala, but maybe Scala isn't the right answer either. We'll see.

tisdag, januari 15, 2008

The final, objective truth of Rails versus Grails.

So it's time to bring out the guns, boxing gloves and whatever martial arts knowledge you possess. If you want the full story, start with Graeme's post here, and drill your way down from there.

I finally lost my patience with this discussion and decided that I should tell you all the truth. The final, totally objective, totally realistic answer to the question of Rails, Grails and Everything.

No, wait. I won't!

And that's it. This discussion is becoming a bit silly. I don't expect someone will see me recommending Grails, and I'm sure we all would be very surprised seeing Graeme recommend JRuby on Rails. So what's the point?

tisdag, januari 08, 2008

Are you using ResultSetMetaData.getColumnName?

As the title says, are you using java.sql.ResultSetMetaData.getColumnName in your code? It's interesting, I have done for years, and I didn't know that it was just a bug, waiting to happen.

Until I tried MySQL's 5.1-branch of their JDBC code, I'd always assumed that getColumnName was the right one for generic SQL code. Turns out it isn't. Specifically, it isn't the right one if you're using aliasing in your code. Say you have SELECT Host AS h FROM Host. Now, until the 5.1 branch MySQL JDBC, you would get "h" if you did getColumnName(1) on this result sets metadata. Not so anymore. Now you get "Host". So what should you use? getColumnLabel. It's on the same interface. Until tonight I'd never seen a difference between them. But now there is one - so go through all your JDBC code and make sure you're using the right one here.

Oh, that's right. MySQL 5.0.5 seems to have a bug in multibyte aliasing. So if you alias Host to be a Chinese character, for example, you will not get the same value back from getColumnName or getColumnLabel. I assume this is a bug, since the 5.1-branch seems good.

söndag, januari 06, 2008

Emacs Inferior mode for the Io language

For anyone who like the Io language and work with Emacs, here is a small Inferior mode for Emacs and Io.
It's adapted from the Ruby Inferior mode. Download it here. To use it, just do "(require 'inf-io)" and then M-x run-io.


fredag, januari 04, 2008

Scala magic

Oh, right. This is a perfect example of Scala magic that I REALLY don't understand yet. If there are any Scala wizards reading, please explain!

This line of code is from a test cases written in specs:
      runtime.ev("\"hello world\"")._with(interpreter)
must be_==(runtime.stringFor("hello world"))
Runtime is a Scala class and so is interpreter. I added println()'s before and after, and at different methods in this statement. Now, the baffling thing, the really strange thing (at least in my mind), is that the left hand side of the must, gets evaluated THREE times. But this code only run once. How can that happen? What does Scala do here? (And must doesn't seem to be a special method. I looked at the implementation, and it's just a regular method on a trait, mixed into the Scala Object.

Someone please explain this! =)

Keywords in languages

It's nice to see how the amount of people looking into Scala has really exploded lately, based on the rush of blog posts an discussion.

One of the things I find a bit annoying about Scala is the proliferation of keywords. Actually, this is something I really don't like in any languages. A language should be as keywordless as possible. Of course, such a vision goes against ease of implementation for language implementers, so there always needs to be a balance here. Coming from languages such as Lisp and Io, it's amazing how clear a language can be with a well chosen message passing or invocation model. In fact, both of those languages have zero keywords. That makes it incredibly nice to implement whatever you want.

Actually, Java has been quite good at not taking much more keywords than they had from the beginning, so I found it a bit annoying when I tried to build a fluent interface in Scala, and found out that the word "with" is a keyword. And it's a keyword in the strictest sense, meaning you can't use it in places where you can't use the "with" keyword anyway. So there is no way to implement a method named "with" in Scala. Annoying. It's just, the English connection words are so much more useful for method names, especially when you can use the methods in "operator" position. Then you just want to be able to use all these words.

So. If you design a language, make sure that you take care to actually add every keyword extremely carefully. If you can, make sure that keywords can actually be used for some things when there is no ambiguity. Of course, I'm not proposing the kind of madness you can do in some languages, where a statement such as "IF IF THEN THEN" is valid, since the first IF is a keyword, the next is a variable name, etc. But be reasonable about keywords. They are sometimes necessary, but not always as often as people believe.

torsdag, januari 03, 2008

Scala testing with specs

So. The story about unit testing is over for now. I will use specs. Eric made an incredible job and got the JUnit support working with JUnit 4 too very quickly. Today I reintegrated it, and everything works fine.

So, if you want working Ant integration for your Scala testing, I recommend specs.

Of course, this still doesn't explain while all the other alternatives failed so miserably. Hopefully there will be a bit more testing in the community soon. Testing frameworks need competition to evolve well.

It's funny. One of the points to my original post about Scala unit testing was this quote: "Now some lovely Ruby people are looking at Scala, and the very first thing they must do (of course) is write the sacred unit tests:".

I'm not sure about you, but I would say that that's a good statement about Ruby people in general, if that's the way people view us. =)

Sweden visit

I will be coming to Sweden on Sunday, and I will stay a full 4 weeks. So if anyone feels like grabbing a beer and talking about anything remotely geeky, I'm up for it. Ping me or leave a comment on this post. I'll mostly be in Stockholm, but some time will be spent in Gothenburg too.

onsdag, januari 02, 2008

Language explorations

I blogged about looking at languages a while back. At that point I didn't know what my next language to explore would be. I got lots of excellent suggestions. In the end I decided to try OCaML, but gave that up quickly when I found out that half of the type system exists to cover up deficiencies in the other half of it. So I went back and decided to learn Scala. I haven't really had time to start with it though. Until now, that is.

So let's get back to the motivation here? Why do I want to learn another language? Aren't I happy with Ruby? Well, yes and no. But that's not really the point. You can always point to the Prags one-language-a-year, but that's not it either. I mean, it's really good advice, but there is a more urgent reason for me to learn Scala.

I know many people have said this before, but it bears repeating. Everyone doesn't share this opinion, but have a firm belief that the end of big languages is very close. There won't be a next big language. There might be some that are more popular than others, but the way development will happen will be much more divided into using different languages in the same project, where the different languages are suited for different things. This is the whole Polyglot idea. And my take on it is this: the JVM is the best platform there is for Polyglot platform, and I think we will see three language layers emerge in larger applications. Now, the languages won't necessarily be built on top of each other, but they will all run on the JVM.

The first layer is what I called the stable layer. It's not a very large part of the application in terms of functionality. But it's the part that everything else builds on top off, and is as such a very important part of it. This layer is the layer where static type safety will really help. Currently, Java is really the only choice for this layer. More about that later, though.

The second layer is the dynamic layer. This is where maybe half the application code resides. The language types here are predominantly dynamic, strongly typed languages running on the JVM, like JRuby, Rhino and Jython. This is also the layer where I have spent most of my time lately, with JRuby and so on. It's a nice and productive place to be, and obviously, with my fascination for JVM languages, I believe that it's the interplay between this layer and the stable layer that is really powerful.

The third layer is the domain layer. It should be implemented in DSL's, one or many depending on the needs of the system. In most cases it's probably enough to implement it as an internal DSL within the dynamic layer, and in those cases the second and third layer are not as easily distinguishable. But in some cases it's warranted to have an external DSL that can be interacted with. A typical example might be something like a rules engine (like Drools).

I think I realized a long time ago that Java is not a good enough language to implement applications. So I came up with the idea that a dynamic language on top of Java might be enough. But I'm starting to see that Java is not good enough for the stable layer either. In fact, I'm not sure if Java the language is good enough for anything, anymore. So that's what my language exploration is about. I have a suspicion that Scala might be a good language at the stable layer, but at this point the problem is there aren't any other potential languages for that layer. So what I'm doing is trying to investigate if Scala is good enough for that.

But I need to make one thing clear - I don't believe there will be a winner at any of these layers. In fact, I think it would be a clearly bad thing if any one language won at any layer. That means, I'm seeing a future where we have Jython and JRuby and Rhino and several other languages coexisting at the same layer. There doesn't need to be any rivalry or language wars. Similarly, I see even less point in Scala and Ruby being viewed as competing. In my point of view they aren't even on the same continent. And even if they were, I see no point in competing.

I got accused of being "religious" about languages yesterday. That was an interesting way of putting it, since I have always been incredibly motivated to see lots of languages coexisting, but coexisting on the JVM in a productive way.

Does established tools matter or - Is Ant support important?

This post is a bit of a follow up to my rant about Unit testing in Scala two days back. First let me tell you that that story actually has a happy ending. Eric Torreborne (creator of the Specs framework) immediately stepped up and helped me, and the problem seemed to be a lack of support for JUnit4 test running, which he subsequently implemented. I'm going to reintegrate
specs into my test suite later today. So that makes me happy. I might still retain JtestR. Actually, it would be a bit interesting to see the differences in writing tests in Ruby and Scala.

I spent some time on #scala on FreeNode yesterday. Overall it was an interesting experience. We ended up talking a bit about the unit testing bit, and Ant integration in particular. I'll get back to that conversation later. But this sparked in me the question why I felt that it was really important to have Ant integration. Does it actually matter?

I have kind of assumed that for a tool running on the JVM, that people might need during their build process, integration with Ant and Maven is more or less a must. It doesn't really matter what I think of these tools. If I want anyone to actually use the tool in question, this needs to work. In many cases that means it's not enough to just have a Java class that can be called with the Java-task in Ant. The integration part is a quite small step, but important enough. Or at least I think it is.

I am thinking that no matter what you think of Ant, it's one of those established tools that are here to stay for a long time. I know that I reach for Ant by default when I need something built. I know there are technologically better choices out there. Being a Ruby person, I know that Rake is totally superior, and that there is Raven and Buildr who both provide lots of support for building my project with Ruby. So why do I still reach for Ant whenever I start a new project?

I guess one of the reasons is that I'm almost always building open source projects. I want people to use my stuff, to do things with it. That means the barrier to entry need to be as low as possible. For Java, the de facto standard build system is still Ant, so the chance of people having it installed is good. Ant is easy enough to work with. Sure, it can be painful for larger things, but for smaller projects there really is no problem with Ant.

What do you think? Does it matter if you use established tools, that might be technologically inferior? Or should you always go for the best solution?

Sometimes I'm considering using other tools for build my own personal projects, but I never know enough to say with certainty I will never release it. That means I have two choices - either I use something else first and then convert it if I release it, or I just go with Ant directly.

Now, heading back to that conversation. It started with a comment about "an ant task for failing unit tests is severely overrated". Then it went rapidly downhill to "Haskell shits all over Ant for a build script for example", at which point I totally tuned out. Haskell might be extremely well suited for build scripts, but it's not an established tool that anyone can use from their tool chain. And further, I got two examples of how these build scripts would look, later in the conversation, and both of them were barely better than shell scripts for building and testing. Now there is a very good reason people aren't using shell scripts for their standard building and testing tool chain.

Or am I being totally unreasonable about this? (Note, I haven't in any way defended the technological superiority of Ant in this post, just to make it clear.)

Antlr lexing problem

I should probably post this on a mailing list instead, but for now I want to document my problem here. If anyone has any good suggestions I'd appreciate it.

I'm using Antlr to lex a language. The language is fixed and has some cumbersome features. One in particular is being really annoying and giving me some trouble to handle neatly with Antlr 3.

This problem is about sorting out Identifiers. Now, to make things really, really simple, an identifier can consist of the letter "s" and the character ":" in any order, in any quantity. An identifier can also be the three operators "=", ":=" and "::=". That is the whole language. It's really easy to handle with whitespace separation and so on. But these are the requirements that give me trouble. The first three are simple baseline examples:
  • "s" should lex into "s"
  • "s:" should lex into "s:"
  • "s::::" should lex into "s::::"
  • "s:=" should lex into "s" and ":="
  • "s::=" should lex into "s:" and ":="
  • etc.
Now, the problem is obviously that any sane way of lexing this will end up eating the last colon too. I can of course use a semantic predicate to make sure this isn't allowed when the last character is a colon and the next is "=". This helps for the 4th case, but not for the 5th.

Anyone care to help? =)