Getting Code Ready for Ruby 1.9
The call came down from on high just before the Ruby 1.9 release: replace the standard csv.rb library with faster_csv.rb. With only hours to make the change it was a little harder than I expected. The FasterCSV code base was pretty vanilla Ruby, but it required more work than I would have guessed to get running on Ruby 1.9. Let me share a few of the tips I learned while doctoring the code in the hope that it will help others get their code ready for Ruby 1.9.
Ruby's String Class Grows Up
One of the biggest changes in Ruby 1.9 is the addition of m17n (multilingualization). This means that Ruby's Strings are now encoding aware and we must clarify in our code if we are working with bytes, characters, or lines.
This is a good change, but the odds are that most of us have lazily used the old way to our advantage in the past. If you've ever written code like:
lines = str.to_a
you have bad habits to break. I sure did. Under Ruby 1.9 that code would translate to:
lines = str.lines.to_a
String#lines() returns an Enumerable::Enumerator by default (more on that shortly), so you need to add the to_a() call unless you are going to follow-up with other iteration methods.
Now, if you need the code to run on both 1.8 and 1.9, you will need one more trick. First, if you just need to iterate over the lines you can use String#each_line() which is present in both versions. For less basic iterations, I recommend:
lines = str.send(str.respond_to?(:lines) ? :lines : :to_s).to_a
Here I just call String#lines() if it is available and a no-op String#to_s() when it's not. You can safely follow that with any Enumerable method and it will work in Ruby 1.8 and Ruby 1.9.
Enumerable#zip() Took a Beating
If you were a fan of Enumerable#zip() under Ruby 1.8, odds are good that it's going to surprise you under Ruby 1.9.
First, the standard Enumerable::Enumerator library has been moved into the core as we already saw with String#lines(). With this move the core iteration methods have been enhanced to return an Enumerable::Enumerator, if called without a block. This is generally a nice iterator chaining feature. For example, making the fictional but oft-requested map_with_index() is now as easy as:
enum.each_with_index.map { … }
Enumerable#zip() may be the exception though. It already had a meaningful return value when called without a block. That has been overridden by the new behavior though, so you will now get an Enumerable::Enumerator when you probably expected an Array. I've found that I now need to type the following to get what I usually want:
enum.zip(other_enum).to_a
It's hard to see that as an improvement, but the fact is that it gets worse. For some reason I can't justify, another change was made to Enumerable#zip(). Let's look at what happens with Enumerable objects of different sizes under Ruby 1.8:
>> short = [1, 2]
=> [1, 2]
>> long = %w[one two three four]
=> ["one", "two", "three", "four"]
>> short.zip(long)
=> [[1, "one"], [2, "two"]]
>> long.zip(short)
=> [["one", 1], ["two", 2], ["three", nil], ["four", nil]]
Note that the size of the result set is based on the size of the Enumerable that is used as the receiver for the Enumerable#zip() call. This works out well in practice, because you can always find the longer count if you need to preserve all of the data. If you want the shorter results, you can lead with the smaller set or filter out the nil objects. The choice is in your hands.
Unfortunately, Ruby 1.9 changes the rules:
>> short.zip(long).to_a
=> [[1, "one"], [2, "two"]]
>> long.zip(short).to_a
=> [["one", 1], ["two", 2]]
As you can see, the shortest Enumerable now limits the results no matter where it occurs. The problem with this change is that it discards data and you have to go out of your way to save it. This new behavior is documented though, so I assume it's intentional.
What do you do if you want a safe 1.8 data preserving Enumerable#zip() that works on 1.8 and 1.9? About the best I can come up with is:
require "enumerator"
zipped = long.enum_for(:each_with_index).map { |e, i| [e, short.to_a[i]] }
Obviously, I'm open to better ideas.
FasterCSV is the New CSV
I found the above incompatibilities by introducing a new one. FasterCSV has replaced the standard CSV class in the standard library. By replaced, I mean that it is now called CSV. This will cause code that used the old library problems.
The methods provided on the CSV object are similar, but the old CSV code used positional parameters where as the new library user a Hash argument syntax (e.g., :row_sep => "\r\n"). That's going to trip up any non-trivial usage.
The new library is feature rich and fully documented, so I don't expect anyone to have trouble getting their code working under 1.9. The problem will be writing code that works on both versions. For that, I recommend using code like the following to determined which library you are working with:
require "csv"
if CSV.const_defined? :Reader
# use old CSV code here…
else
# use FasterCSV style code, but with CSV class, here…
end
Feel free to email me with any other CSV compatibility questions.
This is Just a Start
The above is a short list of issues I've run into a couple of times now. Please feel free to add your own observations about Ruby 1.9 compatibility in the comments below. Let's do our best to make this post a generally useful resource for all.
Porting REXML to Ruby 1.9 overlaps slightly and covers some additional ground.
Ruby 1.9 introduces an incompatible syntax change for conditional statements such as 'if' and 'case/when'. Previously a colon could be used as a shorthand for a 'then' statement; this is perhaps most useful with multiple 'when' statements on one line.
The following is legitimate ruby in 1.8:
But not in ruby 1.9; now an explicit 'then' statement must be used:
Just to be clear the "then" keyword was also supported in Ruby 1.8 so using it for conditionals is fine for both versions.
A bunch of methods like instance_variables, constants etc... that used to return strings now return symbols.
I found what Frederick said is especially important for the typical BlankSlate type of class. What was in 1.8:
becomes in 1.9 (for example):
Other than that, I noticed that Thread#critical and Thread#critical= went away, but for those of us who want to explicitely schedule stuff, Fibers are nicer anyway.
IO.getc will return a String thats one character long instead of the ASCII value of the character itself.
1.8:
1.9:
This also breaks the excellent HighLine lib. hint hint
Yes, I do need to get HighLine working under 1.9. I'll try to get to that before too long now. Thanks for reminding me.
Using each or map on a result from Enumerable#zip is also extremely slow in 1.9. I discovered this when an application took twice as long in 1.9 as in 1.8. The innermost loop had a zip_with() call (zip -> map) which caused this.
Results in 1.8.6
Results in 1.9
Using to_a gives the same results.
And this is with the slow 1.8.6 ubuntu/enable-pthread version vs an -O3/no pthread compiled 1.9. On most code the 1.9 version is about four times as fast as the 1.8.6 version.
For extension writers, ruby1.9 has, incorrectly in my opinion, deprecated ruby's version.h file.
This means it is not possible to know the ruby version easily and you now MUST write a Makefile of some sort to pass the proper defines or to check if your ruby supports some feature through some try-compile checks. This probably ranks as one of the worst changes in ruby 1.9. This obviously begs the question why this was done (as there's no benefit) and what should extension developers do if some function exists in both ruby1.8 and ruby1.9 but has different functionality (as some of the cases show here).
Alex Fenton:
"But not in ruby 1.9; now an explicit 'then' statement must be used"
Or you could just do what everyone else does, and put what happens "then" on a new line. Then, you won't need "then", and your code is more consistent and readable.
I prefer this implementation. Allowing same-line "then" with a colon was bad style, IMO - as is the use of colons as meaningful operators in general.
If you still prefer the single character single line notation you can just substitute the colon with a semi-colon
1.9:
James I am just working on RQ#151 and I want my solution to be version agnostic, up to now the following was my idea:
Write the code in v1.9 and just require a file to upgrade 1.8 ruby just enough to run your code, such the require can go away one day, here is a very first shot:
class String unless instance_methods.include?( "to_char" ) then require "enumerator" def each_char &blk return enum_for(:each_byte).map{ |b| b.chr } unless blk enum_for(:each_byte).each do |b| blk.call b.chr end end end endof course it would be much better to wrap the whole include file into a version test, but that very test might be a tough one, the following is rather a bad example:
begin "".to_a def to_char... ... end rescue nil endGoing for the ruby version constant
if /^1\.8/ === RUBY_VERSION then ... endis probably a sound decision after all.
What do you think?
Cheers Robert
I would probable just do:
James first of all sorry for the double comment above, but I guess it was better not to post another comment saying: "Sorry for being stupid", as everyone could see I have been stupid ;)
Now for the idea of saying
This is an approach I have seen first in Javascript for Browser Quirks but after some thoughts I believe that it is a bad idea for libraries. What if a require before our require just added
each_charto String? And that is not exactly far fetched an idea either.For applications however it will work, unless you require third part libraries carelessly before the code above, this however can be debugged easily...
For libraries there would be no way to debug or even fix it in a general manner.
One could of course argue that someone could tamper with
RUBY_VERSIONtoo, but well we still have to let people kill themselves if they insist, sigh!Robert
Well, I hope that any
each_char()implementation would give me the expected one character at a time.My main point though was that I felt safer using the
each_char()method that comes with Ruby 1.8 than building my own.I see we are talking about two different things.
String#to_char.String#to_charbecause I need the 1.9 functionality of the returned Enumerator in case it is called without a block.Maybe the idea to write version agnostic code was not really what you are after here, and you emphasis on 1.9, in that case I am a little bit OT, as usual...
Robert
Francisco Laguna, since symbols now respond to #=~, I don't think that change is necessary, unless I am missing something. It should be noted, though, that for some reason Ruby 1.9 now emits a warning about undef'ing object_id, so you may want to preserve it too.
Robert: I guess I am still a little confused about our discussion. You've mentioned both
String#to_char()andString#each_char()and your code checks for one but creates the other.String#to_char()isn't a method I'm familiar with and I can't locate and documentation on it.Just FYI, I believe your code also has a bug in it. Checks like
instance_methods.include?("some_str")don't work as expected in Ruby 1.9. Those method names are now returned asSymbolobjects soinclude?()will fail to match theString.You can load jcode and enumerator and use
enum_for(:each_char)to get anEnumerable::Enumeratorin Ruby 1.8 or 1.9. I do now understand that we were discussing many methods instead of a specific example though, so that may not help.James
many apologies about such many typos, I was referring to
#each_charonly. Thanks for the hint withjcodeandinstance_methods.include?. I missedjcode's functionality.Robert
The "zip" problems appear to be fixed with the january 8th version, it's still ~50% slower than 1.8, but that's manageable.
James, I saw you post about this on Ruby-CORE mailing list. Is this the way to go for posting bugs? I'm asking this because I discovered what I think is a rather serious bug and posted a bug report on rubyforge about three weeks ago, but there is no reply to the report or any of my follow-ups, other than the bug getting assigned to Matz.
Yes, I was able to sway Matz and
Enumerable#zip()has been "repaired."Opinions seem to differ on whether on not to use the bug tracker on Rubyforge or the Ruby Core mailing list. I believe the core team is trying to get more into the bug tacker habit, so it's probably best to start there for most things. I find I have more success with topics that should be discussed, like the
Enumerable#zip()issue, on Ruby Core though. For serious issues, I recommend putting it on in the bug tracker then drawing attention to it on Ruby Core.When Ruby 1.8 was on the horizon and 1.6 was the normal version for people to use, someone created a library called "shim", which allowed the use of 1.7/1.8-style features in 1.6 code.
With compatibility between 1.8 and 1.9 a key issue for many people, such a "shim" library could be very useful.
I thought the semi colon functioning as an alias for 'then' was great, its easier to look read IMO
It's actually the colon, not semicolon, that use to stand in for then. It was removed because it is being used in other ways, like the new
Hashsyntax.Hi James,
Thanks for sharing. Based on your code I've tried the following for backwards compatiblity with ruby 1.8 where everything uses the CSV class constant.
Thanks to Michael Barton. His fix is just what I sought.