Appendable saving in R

2011-09-19 by . 4 comments

Post to Twitter

One of the most crucial problems in HPC is that every error you make have much greater impact than in the normal computing — there is nothing more amusing than finding out that few-day simulation broke few minutes before the end because of an unfortunate value thrown by a random generator, typo in result saving code or just a cluster failure and its result are dissolving abandoned in RAM. Even if all goes OK, often it would be quite nice to have some insight inside the results before the job is completed.

Of course there is a trivial solution to this — frequent saving of preliminary results. Unfortunately this is easier say than done, basically because R’s save cannot append new object to the same file — thus I used to end up either with few giant text files which were somewhere in between something human readable and easily parsable or with a plethora of small RDatas which on the other hand required writing nontrivial script to be reintegrated in something usable (not to mention fuse-ssh freezes and crashes).

To this end I have come up with writing R package intended to expand save with appending; thus rtape was born. How does it work? You simply call

R> rtapeAdd(some_object_worth_saving,"myTape.rtape")

and your first tape appears in the working dir; following rtapeAdds will further populate the file with more objects not removing the previously stored. You can then load the entire stuff with rtapeAsList:

R> rtapeAsList("myTape.rtape")

Note that there is no initialization or finalization step —  yup, there is no header which makes the tape format simple (for instance you can just cat several files into one, not using R at all) and always ready to use, unfortunately also invalidates any attempt to make it random-readable or mutable in any other way than append / delete all. Though this is not something you expect from a failsafe data dump, do you? Ok, not counting situations when this dump has grown too big to be easily manipulated in the memory; hopefully rtape can also map a function to each tape object storing only current one in the memory, with a help of rtapeLapply and rtape_apply functions. Finally there is also some confidence stuff.

Oh, and rtapeAdd is not (yet) thread-safe — you have been warned (-;

Filed under R tips&tricks


Subscribe to comments with RSS.

  • Gray says:

    Does it store the state of the random number generator?

  • says:

    @Gray Nah, but it can be emulated easily by saving/restoring .Random.seed variable.

  • MattBagg says:

    Minor point, but can you spell out HPC in this post so people like me don’t leave to look it up and end up reading a task view on CRAN instead of this post? 🙂

  • Comments have been closed for this post