RE: Importing Excel files in GemStone

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

RE: Importing Excel files in GemStone

Thierry Thelliez
Update #2.

I just wanted to share the latest/greatest since my last post on this topic (and this email list is too quiet!). So in case you have to import Excel files, here is what we do.

Several months ago, I started producing CSV files with JODConverter. Everything was fine. A simple 'System performOnServer' calling JODConverter was good enough. This worked for a while, until we had to convert multisheet Excel files. JODConverter does not support that.

We tried ROO, a ruby library. ROO supports multiple sheets, all the xls, xslx, xslm files we have and formulas. But we found the hard way that they have a limited support for unicode characters (we have Asian encoded sheets) and they have issues with merged cells...

So... we were about to reconsider deploying Excel + OLE on the server when we found that JODConverter uses UNO (OpenOffice API) and that UNO could be used to access a given sheet. (Example of script in Python at http://www.linuxjournal.com/content/convert-spreadsheets-csv-files-python-and-pyuno-part-1v2).

So far so good. All our files are correctly converted. On the downside, it is slow because OpenOffice has to be restarted often to avoid a memory leak issue. It takes about 30 seconds on a dev machine to convert one sheet.

One last thing, we found that OpenOffice has a more consistent error management system than Excel. That's great but it leads to different formula results in some conditions.

Back to Smalltalk.
Cheers,
Thierry



-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Thierry Thelliez
Sent: July 22, 2010 12:00 PM
To: GemStone Smalltalk Customer Forum
Subject: [gemstone-smalltalk] RE: Importing Excel files in GemStone

Just an update on what I ended up choosing.

Again, my requirements were to load a mix of xls AND xlsx spreadsheets. There is no real-time requirements and, at maximum, a dozen files a day are to be imported. Also these files are coming from a password protected template. In other words the files to import are password protected (but not encrypted). Finally, some cells are formulas.


The different libraries I tried (different Ruby based systems and Apache POI) all had limitations. Finally the simplest way, I think, was to use OpenOffice API and save the results to CSV. To ease the task I used: http://www.artofsolving.com/opensource/jodconverter  This resulted in a one-line solution to get to the CSV.  

The CSV files are then imported/parsed with a ReadStream in GemStone.

The meta-data seem to evolve faster than anticipated. I might improve the model to be more dynamic and include meta-data versioning but that's another story.

Cheers,
Thierry Thelliez