Getting Code Ready for Ruby 1.9

Posted 4 months ago in FasterCSV, Ruby Tutorials, and The Standard Library.

The call came down from on high just before the Ruby 1.9 release: replace the standard csv.rb library with faster_csv.rb. With only hours to make the change it was a little harder than I expected. The FasterCSV code base was pretty vanilla Ruby, but it required more work than I would have guessed to get running on Ruby 1.9. Let me share a few of the tips I learned while doctoring the code in the hope that it will help others get their code ready for Ruby 1.9.

Ruby's String Class Grows Up

One of the biggest changes in Ruby 1.9 is the addition of m17n (multilingualization). This means that Ruby's Strings are now encoding aware and we must clarify in our code if we are working with bytes, characters, or lines.

This is a good change, but the odds are that most of us have lazily used the old way to our advantage in the past. If you've ever written code like:

lines = str.to_a

you have bad habits to break. I sure did. Under Ruby 1.9 that code would translate to:

lines = str.lines.to_a

String#lines() returns an Enumerable::Enumerator by default (more on that shortly), so you need to add the to_a() call unless you are going to follow-up with other iteration methods.

Now, if you need the code to run on both 1.8 and 1.9, you will need one more trick. First, if you just need to iterate over the lines you can use String#each_line() which is present in both versions. For less basic iterations, I recommend:

lines = str.send(str.respond_to?(:lines) ? :lines : :to_s).to_a

Here I just call String#lines() if it is available and a no-op String#to_s() when it's not. You can safely follow that with any Enumerable method and it will work in Ruby 1.8 and Ruby 1.9.

Enumerable#zip() Took a Beating

If you were a fan of Enumerable#zip() under Ruby 1.8, odds are good that it's going to surprise you under Ruby 1.9.

First, the standard Enumerable::Enumerator library has been moved into the core as we already saw with String#lines(). With this move the core iteration methods have been enhanced to return an Enumerable::Enumerator, if called without a block. This is generally a nice iterator chaining feature. For example, making the fictional but oft-requested map_with_index() is now as easy as:

enum.each_with_index.map { … }

Enumerable#zip() may be the exception though. It already had a meaningful return value when called without a block. That has been overridden by the new behavior though, so you will now get an Enumerable::Enumerator when you probably expected an Array. I've found that I now need to type the following to get what I usually want:

enum.zip(other_enum).to_a

It's hard to see that as an improvement, but the fact is that it gets worse. For some reason I can't justify, another change was made to Enumerable#zip(). Let's look at what happens with Enumerable objects of different sizes under Ruby 1.8:

>> short = [1, 2]
=> [1, 2]
>> long = %w[one two three four]
=> ["one", "two", "three", "four"]
>> short.zip(long)
=> [[1, "one"], [2, "two"]]
>> long.zip(short)
=> [["one", 1], ["two", 2], ["three", nil], ["four", nil]]

Note that the size of the result set is based on the size of the Enumerable that is used as the receiver for the Enumerable#zip() call. This works out well in practice, because you can always find the longer count if you need to preserve all of the data. If you want the shorter results, you can lead with the smaller set or filter out the nil objects. The choice is in your hands.

Unfortunately, Ruby 1.9 changes the rules:

>> short.zip(long).to_a
=> [[1, "one"], [2, "two"]]
>> long.zip(short).to_a
=> [["one", 1], ["two", 2]]

As you can see, the shortest Enumerable now limits the results no matter where it occurs. The problem with this change is that it discards data and you have to go out of your way to save it. This new behavior is documented though, so I assume it's intentional.

What do you do if you want a safe 1.8 data preserving Enumerable#zip() that works on 1.8 and 1.9? About the best I can come up with is:

require "enumerator"
zipped = long.enum_for(:each_with_index).map { |e, i| [e, short.to_a[i]] }

Obviously, I'm open to better ideas.

FasterCSV is the New CSV

I found the above incompatibilities by introducing a new one. FasterCSV has replaced the standard CSV class in the standard library. By replaced, I mean that it is now called CSV. This will cause code that used the old library problems.

The methods provided on the CSV object are similar, but the old CSV code used positional parameters where as the new library user a Hash argument syntax (e.g., :row_sep => "\r\n"). That's going to trip up any non-trivial usage.

The new library is feature rich and fully documented, so I don't expect anyone to have trouble getting their code working under 1.9. The problem will be writing code that works on both versions. For that, I recommend using code like the following to determined which library you are working with:

require "csv"
if CSV.const_defined? :Reader
  # use old CSV code here…
else
  # use FasterCSV style code, but with CSV class, here…
end

Feel free to email me with any other CSV compatibility questions.

This is Just a Start

The above is a short list of issues I've run into a couple of times now. Please feel free to add your own observations about Ruby 1.9 compatibility in the comments below. Let's do our best to make this post a generally useful resource for all.

Sam Ruby added 19 minutes later:

Porting REXML to Ruby 1.9 overlaps slightly and covers some additional ground.

Alex Fenton added about 1 hour later:

Ruby 1.9 introduces an incompatible syntax change for conditional statements such as 'if' and 'case/when'. Previously a colon could be used as a shorthand for a 'then' statement; this is perhaps most useful with multiple 'when' statements on one line.

The following is legitimate ruby in 1.8:

case x 
when Regexp  : puts 'a regex'
when Hash    : puts 'a regex'
when Numeric : puts 'a number'
when String  : puts 'a string'
end

But not in ruby 1.9; now an explicit 'then' statement must be used:

case x
when Regexp then puts 'a regex'
...
James Edward Gray II added about 1 hour later:

Just to be clear the "then" keyword was also supported in Ruby 1.8 so using it for conditionals is fine for both versions.

Frederick Cheung added about 15 hours later:

A bunch of methods like instance_variables, constants etc... that used to return strings now return symbols.

Francisco Laguna added about 21 hours later:

I found what Frederick said is especially important for the typical BlankSlate type of class. What was in 1.8:

class BlankSlate
  instance_methods.each { |meth| undef_method(meth) unless meth =~ /\A__/ }
  ...
end

becomes in 1.9 (for example):

class BlankSlate
  instance_methods.each { |meth| undef_method(meth) unless meth.to_s =~ /\A__/ }
  ...
end

Other than that, I noticed that Thread#critical and Thread#critical= went away, but for those of us who want to explicitely schedule stuff, Fibers are nicer anyway.

IO.getc will return a String thats one character long instead of the ASCII value of the character itself.

1.8:

STDIN.getc
a
=> 97

1.9:

irb(main):002:0> STDIN.getc
a
=> "a"

This also breaks the excellent HighLine lib. hint hint

James Edward Gray II added about 21 hours later:

Yes, I do need to get HighLine working under 1.9. I'll try to get to that before too long now. Thanks for reminding me.

Sander Land added about 23 hours later:

Using each or map on a result from Enumerable#zip is also extremely slow in 1.9. I discovered this when an application took twice as long in 1.9 as in 1.8. The innermost loop had a zip_with() call (zip -> map) which caused this.

require 'benchmark'
a = Array.new(25){rand}
Benchmark.bmbm{|x|
  x.report("zip"){ 1_000_000.times { a.zip(a).map{} } }
}

Results in 1.8.6

          user     system      total        real
zip  16.170000   0.320000  16.490000 ( 16.576643)

Results in 1.9

          user     system      total        real
zip 192.360000   1.430000 193.790000 (195.467429)

Using to_a gives the same results.

And this is with the slow 1.8.6 ubuntu/enable-pthread version vs an -O3/no pthread compiled 1.9. On most code the 1.9 version is about four times as fast as the 1.8.6 version.

gga added 1 day later:

For extension writers, ruby1.9 has, incorrectly in my opinion, deprecated ruby's version.h file.
This means it is not possible to know the ruby version easily and you now MUST write a Makefile of some sort to pass the proper defines or to check if your ruby supports some feature through some try-compile checks. This probably ranks as one of the worst changes in ruby 1.9. This obviously begs the question why this was done (as there's no benefit) and what should extension developers do if some function exists in both ruby1.8 and ruby1.9 but has different functionality (as some of the cases show here).

hk added 1 day later:

Alex Fenton:

"But not in ruby 1.9; now an explicit 'then' statement must be used"

Or you could just do what everyone else does, and put what happens "then" on a new line. Then, you won't need "then", and your code is more consistent and readable.

I prefer this implementation. Allowing same-line "then" with a colon was bad style, IMO - as is the use of colons as meaningful operators in general.

Chris Gaffney added 1 day later:

If you still prefer the single character single line notation you can just substitute the colon with a semi-colon

1.9:

case sound
    when /bamf/i; puts 'Nightcrawler'
    when /boff/i; puts 'Batman'
end
Robert Dober added 2 days later:

James I am just working on RQ#151 and I want my solution to be version agnostic, up to now the following was my idea:

Write the code in v1.9 and just require a file to upgrade 1.8 ruby just enough to run your code, such the require can go away one day, here is a very first shot:

   class String
      unless instance_methods.include?( "to_char" ) then
        require "enumerator"
        def each_char &blk
          return enum_for(:each_byte).map{ |b| b.chr } unless                  
                blk
          enum_for(:each_byte).each do |b| blk.call b.chr end
        end 
      end
   end

of course it would be much better to wrap the whole include file into a version test, but that very test might be a tough one, the following is rather a bad example:

 begin
   "".to_a
   def to_char...
      ...
   end
 rescue
   nil
 end

Going for the ruby version constant

   if /^1\.8/ === RUBY_VERSION then
      ...
   end

is probably a sound decision after all.

What do you think?

Cheers Robert

James Edward Gray II added 3 days later:

I would probable just do:

require "jcode" unless "".respond_to? :each_char
Robert Dober added 3 days later:

James first of all sorry for the double comment above, but I guess it was better not to post another comment saying: "Sorry for being stupid", as everyone could see I have been stupid ;)

Now for the idea of saying

   require "jcode" unless "".respond_to? :each_char

This is an approach I have seen first in Javascript for Browser Quirks but after some thoughts I believe that it is a bad idea for libraries. What if a require before our require just added each_char to String? And that is not exactly far fetched an idea either.

For applications however it will work, unless you require third part libraries carelessly before the code above, this however can be debugged easily...

For libraries there would be no way to debug or even fix it in a general manner.

One could of course argue that someone could tamper with RUBY_VERSION too, but well we still have to let people kill themselves if they insist, sigh!

Robert

James Edward Gray II added 3 days later:

Well, I hope that any each_char() implementation would give me the expected one character at a time.

My main point though was that I felt safer using the each_char() method that comes with Ruby 1.8 than building my own.

Robert Dober added 4 days later:

I see we are talking about two different things.

  1. I wanted a guard against the ruby version for lots of definitions, not only String#to_char.
  2. I cannot use Ruby's String#to_char because I need the 1.9 functionality of the returned Enumerator in case it is called without a block.

Maybe the idea to write version agnostic code was not really what you are after here, and you emphasis on 1.9, in that case I am a little bit OT, as usual...

Robert

Daniel Luz added 4 days later:

Francisco Laguna, since symbols now respond to #=~, I don't think that change is necessary, unless I am missing something. It should be noted, though, that for some reason Ruby 1.9 now emits a warning about undef'ing object_id, so you may want to preserve it too.

James Edward Gray II added 4 days later:

Robert: I guess I am still a little confused about our discussion. You've mentioned both String#to_char() and String#each_char() and your code checks for one but creates the other. String#to_char() isn't a method I'm familiar with and I can't locate and documentation on it.

Just FYI, I believe your code also has a bug in it. Checks like instance_methods.include?("some_str") don't work as expected in Ruby 1.9. Those method names are now returned as Symbol objects so include?() will fail to match the String.

You can load jcode and enumerator and use enum_for(:each_char) to get an Enumerable::Enumerator in Ruby 1.8 or 1.9. I do now understand that we were discussing many methods instead of a specific example though, so that may not help.

Robert Dober added 4 days later:

James

many apologies about such many typos, I was referring to #each_char only. Thanks for the hint with jcode and instance_methods.include?. I missed jcode's functionality.

Robert

Sander Land added 6 days later:

The "zip" problems appear to be fixed with the january 8th version, it's still ~50% slower than 1.8, but that's manageable.

James, I saw you post about this on Ruby-CORE mailing list. Is this the way to go for posting bugs? I'm asking this because I discovered what I think is a rather serious bug and posted a bug report on rubyforge about three weeks ago, but there is no reply to the report or any of my follow-ups, other than the bug getting assigned to Matz.

James Edward Gray II added 6 days later:

Yes, I was able to sway Matz and Enumerable#zip() has been "repaired."

Opinions seem to differ on whether on not to use the bug tracker on Rubyforge or the Ruby Core mailing list. I believe the core team is trying to get more into the bug tacker habit, so it's probably best to start there for most things. I find I have more success with topics that should be discussed, like the Enumerable#zip() issue, on Ruby Core though. For serious issues, I recommend putting it on in the bug tracker then drawing attention to it on Ruby Core.

Gavin Sinclair added 11 days later:

When Ruby 1.8 was on the horizon and 1.6 was the normal version for people to use, someone created a library called "shim", which allowed the use of 1.7/1.8-style features in 1.6 code.

With compatibility between 1.8 and 1.9 a key issue for many people, such a "shim" library could be very useful.

Add Your Thoughts

You can use Markdown in the body of your comment to format text and make links.

Note that I reserve the right to edit any content you post here. I typically exercise this right to fix formatting issues. All posts must be approved so spam will never been seen on these pages.

Author:
URL or Email (optional):
Body: