Clean data help - remove rows

helpneeded123

New Member
Joined
Mar 19, 2019
Messages
4
Hi!

I'm trying to clean a dataset and only retain families with at least parent at home. I started going through it manually but it took 7-8 hours to do 3,000 entries and I have 30,000 entries left to do! There must be an easier way so I figured I'd add the criteria here and maybe someone could help me.

Thanks in advance!
Katie



  • Each family has a DUID, and each individual has a PID and a combined DUID and PID (DUPERSID)
  • For each DUID group (same DUID);
    • Retain “child” PID rows if MOPID53X not equal -1 OR DAPID53X not equal -1 AND AGE15X = 3-18 inclusive
      • If child is younger than 3, older than 18 or has both MOPID53X/ DAPID53X set to -1, remove
      • There can be more than 1 child per family - need to retain all children (within age range) and each child may have different MOPID53X/ DAPID53X from each other
    • Use MOPID53X AND DAPID53X values (from above retained columns) to retain “mother” and “father” rows connected to each “child” PID (can have one parent or two)
      • If retained MOPID53X or DAPID53X row has any values within their individual PIDs MOPID53X/DAPID53X values, reset to -1 (don’t want to connect to their parents (the children's grandparents, if present in same household)
      • Reset PID in child's MOPID53X or DAPID53X column to respective parents linked DUPERSID
    • IF MOPID53X AND DAPID53X PID row is missing for a particular child, remove related child row
    • Delete other rows in DUID which don’t meet above criteria (and correspond to other family members)

***Important to do within each DUID group as PIDs repeated throughout dataset
 

Excel Facts

Return population for a City
If you have a list of cities in A2:A100, use Data, Geography. Then =A2.Population and copy down.
When I don't have to repeat a data reduction process or its a first attempt at a difficult process, I create helper columns until I understand better how to simplify it.

Maybe in your case you could create simple helper columns that each answer once criteria in the rules. Use True and FALSE as the result for each question. Then you can test the whole True and FALSE row to see if all are true. I don't know what columns you have your data in, so the formulas below are guesses.

Test if MOPID53X is -1
IF(C3=-1,FALSE,TRUE)

Test if Age is 3-18
=AND(D3>=3,D3<=18)

Some of those criteria need explaining to me. I'm not sure about the parent section criteria.
 
Upvote 0
Thanks Jeffrey for your help! I ended up using individual filters to find the individual rows I wanted and then manually combined them. Not the most sophisticated route but it got me there!
 
Upvote 0

Forum statistics

Threads
1,223,228
Messages
6,170,871
Members
452,363
Latest member
merico17

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top