r - Applying an aggregate function over multiple different slices -

- April 15, 2014

i have data array contains information people , projects such:

person_id | project_id | action | time --------------------------------------         1 |          1 |      w |    1         1 |          2 |      w |    2         1 |          3 |      w |    2         1 |          3 |      r |    3         1 |          3 |      w |    4         1 |          4 |      w |    4         2 |          2 |      r |    2         2 |          2 |      w |    3

i'd augment data couple of more fields called "first_time" , "first_time_project" collectively identify first time action person seen , first time developer saw action on project. in end, data should this:

person_id | project_id | action | time | first_time | first_time_project ------------------------------------------------------------------------         1 |          1 |      w |    1 |          1 |                  1         1 |          2 |      w |    2 |          1 |                  2         1 |          3 |      w |    2 |          1 |                  2         1 |          3 |      r |    3 |          1 |                  2         1 |          3 |      w |    4 |          1 |                  2         1 |          4 |      w |    4 |          1 |                  4         2 |          2 |      r |    2 |          2 |                  2         2 |          2 |      w |    3 |          2 |                  2

my naive way of doing write couple of loops:

for (pid in unique(data$person_id)) {     data[data$pid==pid, "first_time"] = min(data[data$pid==pid, "time"])     (projid in unique(data[data$pid==pid, "project_id"])) {         data[data$pid==pid & data$project_id==projid, "first_time_project"] = min(data[data$pid==pid & data$project_id==projid, "time"]     } }

now, doesn't take genius see going glacially slow doubly nested loops. however, can't figure out way handle in r. i'm kinda emulating group option sql. know might able help, can't figure out how multiple slices.

any hints on how take code glacially slow bit faster? i'd happy snail right now.

try ave :

transform(data,     first_time = ave(time, person_id, fun = min),    first_time_project = ave(time, person_id, project_id, drop = true, fun = min) )

Search This Blog

Manage

r - Applying an aggregate function over multiple different slices -

Comments

Post a Comment

Popular posts from this blog

How do .net 4.0 [named] tuples work under the hood? -

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -