söndag, augusti 13, 2006

Lexing Ruby.

It has become apparent that JRuby's hand coded Lexer is a liability, in the long run. The code is hard to maintain and probably suboptimal with regards to speed. So two weeks ago I decided to check out the possibility of a Lexer generator. I haven't had much time, but the last days I've started reimplementing the JRuby lexer with the help of JFlex.

The first parts have been really easy, actually. I already have a simple version tied in with JRuby, working. There's not much of the syntax there yet, but some of it works really fine. Fine enough to check performance and see the road ahead. So, I have two test cases, the first one looks like this:


class H < Hash
end

puts H.new.class
puts H[:foo => :bar].class

and the second like this:

class H < Hash
def abc a, b, *c
1_0.times {
puts "Hellu"
}
end

def / g, &blk
2.times do
puts "well"
end
end
end

H.new.abc 1, 2
H.new/3

The first one test a corner case in subclassing core JRuby classes. The second one is just nonsense to test different parts of the syntax. Both of these parse and run correctly on JRuby with the JFlex-based lexer. It's not much, but it's something to build on.

Regarding performance, I've created a test program that uses the JRuby parser to parse a specified file 10 x 10_000 times, reporting each time and the total. I've run it on both the first and the second example, and both are about 8 to 10 percent faster with the new lexer. I also expect performance to improve more for bigger files. Right now the lexer keeps track on line number, offset and column number, which is also a performance drain. Removing it gives about 2-3 percent more.

Inga kommentarer: