quackerzdb
New Member
- Joined
- Aug 7, 2013
- Messages
- 1
I have a very large dataset (~5000 points in a 2D array) which I need to plot and fit a linear trendline to. There are 90 subjects each of which has a daily binary event checked over its lifespan. eg:
Day: 1 2 3 4 5 6 7 8 9 10
subj 1 0 1 0 0 0 (dead)
subj 2 1 1 1 0 1 0 1 0 0 0(dead)
subj 3 0 0 1 0(dead)
I simply averaged the number of events per day per subject and plotted them but near the long-lived end of the data the variability gets really high. I have normalized the lifespan for each subject such that instead of the event occurring on day 10, it occurred at 0.2 of its total lifespan (dying on day 50). Now that my data is in this format, I want to create an average for all subjects over the domain 0-1 representing birth-death. The trouble is that now each subject's domain has different intervals between points (eg. 0.1 for subjects living 10 days and 0.006993 for those living 143 days). Any ideas on how to now space out the cells (somewhat) automatically so I can average the columns? My array is 90 x 150 so doing it manually is not really possible. Or, alternatively, is there a way to create a continuous average rather than a discrete one? Some sort of function that goes through each infinitely small (or even a discrete really tiny interval) point of the domain (0-1) and averages the binary y-value?
I hope I described this reasonably well; it's tough even for me to keep it straight in my head. I can provide more detail if necessary or even provide a snippet of data to give you an idea of what I'm trying to do.
Day: 1 2 3 4 5 6 7 8 9 10
subj 1 0 1 0 0 0 (dead)
subj 2 1 1 1 0 1 0 1 0 0 0(dead)
subj 3 0 0 1 0(dead)
I simply averaged the number of events per day per subject and plotted them but near the long-lived end of the data the variability gets really high. I have normalized the lifespan for each subject such that instead of the event occurring on day 10, it occurred at 0.2 of its total lifespan (dying on day 50). Now that my data is in this format, I want to create an average for all subjects over the domain 0-1 representing birth-death. The trouble is that now each subject's domain has different intervals between points (eg. 0.1 for subjects living 10 days and 0.006993 for those living 143 days). Any ideas on how to now space out the cells (somewhat) automatically so I can average the columns? My array is 90 x 150 so doing it manually is not really possible. Or, alternatively, is there a way to create a continuous average rather than a discrete one? Some sort of function that goes through each infinitely small (or even a discrete really tiny interval) point of the domain (0-1) and averages the binary y-value?
I hope I described this reasonably well; it's tough even for me to keep it straight in my head. I can provide more detail if necessary or even provide a snippet of data to give you an idea of what I'm trying to do.