r - Applying an aggregate function over multiple different slices -
i have data array contains information people , projects such:
person_id | project_id | action | time -------------------------------------- 1 | 1 | w | 1 1 | 2 | w | 2 1 | 3 | w | 2 1 | 3 | r | 3 1 | 3 | w | 4 1 | 4 | w | 4 2 | 2 | r | 2 2 | 2 | w | 3
i'd augment data couple of more fields called "first_time" , "first_time_project" collectively identify first time action person seen , first time developer saw action on project. in end, data should this:
person_id | project_id | action | time | first_time | first_time_project ------------------------------------------------------------------------ 1 | 1 | w | 1 | 1 | 1 1 | 2 | w | 2 | 1 | 2 1 | 3 | w | 2 | 1 | 2 1 | 3 | r | 3 | 1 | 2 1 | 3 | w | 4 | 1 | 2 1 | 4 | w | 4 | 1 | 4 2 | 2 | r | 2 | 2 | 2 2 | 2 | w | 3 | 2 | 2
my naive way of doing write couple of loops:
for (pid in unique(data$person_id)) { data[data$pid==pid, "first_time"] = min(data[data$pid==pid, "time"]) (projid in unique(data[data$pid==pid, "project_id"])) { data[data$pid==pid & data$project_id==projid, "first_time_project"] = min(data[data$pid==pid & data$project_id==projid, "time"] } }
now, doesn't take genius see going glacially slow doubly nested loops. however, can't figure out way handle in r. i'm kinda emulating group option sql. know might able help, can't figure out how multiple slices.
any hints on how take code glacially slow bit faster? i'd happy snail right now.
try ave
:
transform(data, first_time = ave(time, person_id, fun = min), first_time_project = ave(time, person_id, project_id, drop = true, fun = min) )
Comments
Post a Comment