R performance with data reshaping -


I am trying to resize a data frame in R and use the suggested method to do this There is a problem in doing The data frame has the following structure:

  id DATE1 DATE2 VALTYPE VALUE 'abcd1233' 2009-11-12 2009-12-23 'TYPE1' 123.45 ...  

VALTYPE is a string and is a factor with only 2 values ​​( TYPE1 and TYPE2 ). I need to change the following data frame ("broad" relocation) based on common id and dates:

  ID DATE1 DATE2 VALUE.TYPE1 VALUE.TYPE2 'abcd1233' 2009-11-12 2009-12-23 123.45 NA ...  

There are more than 4,500,000 comments in the data frame (although about 70% VALUE s about NA < / Code>). The machine is Intel based Linux workstation with 4 GB RAM. By loading data (from a compressed Rdata file) into a fresh R process, it increases to about 250 MB, which obviously leaves a lot of space for the rescheduling.

These are my experiences so far:

, "DATE1", "DATE2"), TIMEWAYER = "VALTYPE");

RESULT: Error: size can not allocate vectors of 4.8 GB

  • Cast () Use the reshape package of the method:

    tbl2 & lt; - cast (tbl), id + DATE1 + DATE2 ~ VALTYPE);

  • RESULT: R process There is no end in the consumption of all RAM. Eventually this process had to be killed.

    • Using to () and merge () :

      SP & LT; - (TBL [C (1,2,3,5)], TBL $ walpie, function (x) x); TBL & LT; - Merge (SP [["TYPE1"]], SP [["TYPE2"]] = C ("ID", "DATE1", "DATE2"), all = true, sort = true);

    RESULT: Works fine, though it is not very elegant and stupid (i.e. it will break if more types are added).

    In order to add harm to the injury, in almost 3 lines of AOWK or Pearl (and, with hardly any RAM), the operation can be achieved, the question is whether all available What is a better way to do this operation using recommended methods without using RAM?

A useful move is to combine the id variables into a character vector and then to rearrange.

  tbl $ NEWID & lt; - (tbl, paste (id, DATE1, DATE2, SP = ";" ")) TB2 and LT; - Restat (TB2, Nude ~ Valtipie, Measurement =" VALUE ")  

In the same size problem it is almost 40% faster than the pair of my Intel Core 2 2.2ghz in the MacBook.


Comments

Popular posts from this blog

c# - How to capture HTTP packet with SharpPcap -

php - Multiple Select with Explode: only returns the word "Array" -

php - jQuery AJAX Post not working -