|
|
|
SearchCategories
Books by the AuthorOther Ruby Projects |
No Longer the Fastest Game in Town
Posted over 2 years ago
in FasterCSV.
If your number one concern when working with CSV data in Ruby is raw speed, you might want to know that FasterCSV is no longer the fastest option. There are a couple of new contenders for Ruby CSV processing including a C extension called SimpleCSV and a pure Ruby library called LightCsv. I haven't been able to test SimpleCSV locally, because I can't get it to build on my box, but users do tell me it's faster. I have run some trivial benchmarks for LightCsv though and it too is pretty quick:
It's important to note that LightCsv is indeed very "light." FasterCSV has grown up into a feature rich library that provides many different ways to look at your data. In contrast, LightCsv doesn't yet allow you to set column or row separators. Given that, it's only an option for vanilla CSV you just need to iterate over. If that's what you have though, and speed counts, it might just be the right choice. For the curious, LightCsv achieves its speed advantage in two ways. First, it uses StringScanner to manage the parsing. StringScanner is a C extension, though it is a standard library installed with Ruby. More importantly, I suspect, LightCsv uses an input buffer for reading while FasterCSV works line by line. I suspect this second difference accounts for the majority of the speed increase since the buffered code will hit the hard drive quite a bit less for the average CSV file. This does require more memory though, of course. Aside from these differences, FasterCSV and LightCsv have very similar parsers. |
|
|
|
LightCsv do not use StringIO. It use StringScanner.
Oops. Good catch. I have corrected the article.