helpneeded123
New Member
- Joined
- Mar 19, 2019
- Messages
- 4
Hi!
I'm trying to clean a dataset and only retain families with at least parent at home. I started going through it manually but it took 7-8 hours to do 3,000 entries and I have 30,000 entries left to do! There must be an easier way so I figured I'd add the criteria here and maybe someone could help me.
Thanks in advance!
Katie
***Important to do within each DUID group as PIDs repeated throughout dataset
I'm trying to clean a dataset and only retain families with at least parent at home. I started going through it manually but it took 7-8 hours to do 3,000 entries and I have 30,000 entries left to do! There must be an easier way so I figured I'd add the criteria here and maybe someone could help me.
Thanks in advance!
Katie
- Each family has a DUID, and each individual has a PID and a combined DUID and PID (DUPERSID)
- For each DUID group (same DUID);
- Retain “child” PID rows if MOPID53X not equal -1 OR DAPID53X not equal -1 AND AGE15X = 3-18 inclusive
- If child is younger than 3, older than 18 or has both MOPID53X/ DAPID53X set to -1, remove
- There can be more than 1 child per family - need to retain all children (within age range) and each child may have different MOPID53X/ DAPID53X from each other
- Use MOPID53X AND DAPID53X values (from above retained columns) to retain “mother” and “father” rows connected to each “child” PID (can have one parent or two)
- If retained MOPID53X or DAPID53X row has any values within their individual PIDs MOPID53X/DAPID53X values, reset to -1 (don’t want to connect to their parents (the children's grandparents, if present in same household)
- Reset PID in child's MOPID53X or DAPID53X column to respective parents linked DUPERSID
- IF MOPID53X AND DAPID53X PID row is missing for a particular child, remove related child row
- Delete other rows in DUID which don’t meet above criteria (and correspond to other family members)
- Retain “child” PID rows if MOPID53X not equal -1 OR DAPID53X not equal -1 AND AGE15X = 3-18 inclusive
***Important to do within each DUID group as PIDs repeated throughout dataset