Missing values per case

iotapsi321

New Member
Joined
Oct 6, 2017
Messages
22
Hello! I am new to excel and have learned a lot from these messages boards and hope to keep the learning momentum going.

I have a large dataset with over 25k rows and hundreds of columns. I would like to delete rows (participants) who have over 50% missing values per variables. Each column reflects an item for a variable. For most variables, the sum of 4-5 consecutive columns (items) constitutes a variable. So in the case of a 4 item variable, I would like delete rows in where there is no value for 3 or more items. Any guidance you can provide would be a huge help.
 

Excel Facts

Copy PDF to Excel
Select data in PDF. Paste to Microsoft Word. Copy from Word and paste to Excel.
You can put several examples covering all possible cases, there you indicate which are the rows that should be deleted.
Also comments, in which row the records begin and in which column the data to review begins
 
Upvote 0
Ok thanks. As an example, data begins on row 1 column 1. If a row is missing more than 2 values in columns A - D (variable 1), more than 3 values in columns E - J (variable 2), and more than 2 values in K- N (variable 3); then, I would like for that row to be deleted. Is this what you were looking for?
 
Upvote 0
There are only 3 variables?
Are they always those columns?
Should the 3 conditions be met?
 
Upvote 0
There are more than these 3, perhaps too many to list. Just wanted to provide and example. Yes the data is always in those columns and the main condition to be met is that across each of the 3 variables there should be no more than 50% of missing values.
 
Upvote 0
I'm sorry but I am not understanding.
You put three examples:

A - D (variable 1),
E - J (variable 2),
K- N (variable 3)


1. Are there more variables?

2. Is there a pattern to know how many columns are there per variable?

3. What is the condition?
a) If only one variable is met.
b) The 3 variables must be met.

Please, could you answer in that order.

It would also be great if you put 10 examples of the data you have on your sheet.

*If contains confidential information, you could replace it with generic data.
 
Upvote 0
I apologize for the lack of clarity. Below is a sample of generic data. I would like to delete participants 1,3, & 8 because they have over 50% of missing values on all 3 variables (so the condition is 50% across all 3 variables, which is why participant 6 retained and is not deleted). There is no consistent pattern for variables. Some variables have more items (columns) than others. Thanks again for all your help!

[TABLE="width: 1406"]
<colgroup><col><col span="19"></colgroup><tbody>[TR]
[TD]Id#[/TD]
[TD]QMI1[/TD]
[TD]QMI2[/TD]
[TD]QMI3[/TD]
[TD]QMI4[/TD]
[TD]QMI5[/TD]
[TD]QMI6[/TD]
[TD]IAI1[/TD]
[TD]IAI2[/TD]
[TD]IAI3[/TD]
[TD]IAI4[/TD]
[TD]IAI5[/TD]
[TD]IAI6[/TD]
[TD]IAI7[/TD]
[TD]IAI8[/TD]
[TD]CT1[/TD]
[TD]CT2[/TD]
[TD]CT3[/TD]
[TD]CT4[/TD]
[TD]CT5 [/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD]6[/TD]
[TD][/TD]
[TD]7[/TD]
[TD][/TD]
[TD]4[/TD]
[TD][/TD]
[TD]2[/TD]
[TD][/TD]
[TD]2[/TD]
[TD]4[/TD]
[TD][/TD]
[TD]4[/TD]
[TD][/TD]
[TD]3[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]6[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]4[/TD]
[TD]3[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]4[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD][/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]9[/TD]
[TD]2[/TD]
[TD]3[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD]3[/TD]
[TD]2[/TD]
[TD]4[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]7[/TD]
[TD][/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]9[/TD]
[TD][/TD]
[TD]3[/TD]
[TD]4[/TD]
[TD]3[/TD]
[TD][/TD]
[TD]2[/TD]
[TD]3[/TD]
[TD]4[/TD]
[TD][/TD]
[TD]6[/TD]
[TD]6[/TD]
[TD][/TD]
[TD]6[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]7[/TD]
[TD]6[/TD]
[TD]8[/TD]
[TD]3[/TD]
[TD]3[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]3[/TD]
[TD]5[/TD]
[TD]4[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]4[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD]5[/TD]
[TD]6[/TD]
[TD]6[/TD]
[TD]4[/TD]
[TD]5[/TD]
[TD]3[/TD]
[TD]3[/TD]
[TD]5[/TD]
[TD]3[/TD]
[TD]2[/TD]
[TD]3[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]1[/TD]
[TD]3[/TD]
[TD]1[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]6[/TD]
[TD][/TD]
[TD][/TD]
[TD]6[/TD]
[TD][/TD]
[TD]9[/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD][/TD]
[TD]3[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]2[/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]8[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD][/TD]
[TD]2[/TD]
[TD]2[/TD]
[TD]4[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]5[/TD]
[TD]5[/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]7[/TD]
[TD]8[/TD]
[TD]2[/TD]
[TD]4[/TD]
[TD]4[/TD]
[TD]2[/TD]
[TD][/TD]
[TD]2[/TD]
[TD]4[/TD]
[TD]3[/TD]
[TD]2[/TD]
[TD]1[/TD]
[TD]1[/TD]
[TD]2[/TD]
[TD]4[/TD]
[/TR]
</tbody>[/TABLE]


[TABLE="width: 500"]
<tbody>[TR]
[TD]
[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
We need to find a pattern to determine the total columns per variable, I don't need to explain why?

QMI1 QMI2 QMI3 QMI4 QMI5 QMI6 (In this case the pattern could be QMI?)
IAI1 IAI2 IAI3 IAI4 IAI5 IAI6 IAI7 IAI8 (In this case the pattern could be IAI?)
CT1 CT2 CT3 CT4 CT5 (In this case the pattern could be CT?)

Those letters could be the pattern, that is, if the macro eliminates the numerical value, the letters remain, and that way we know how many columns are for the variable QMI, IAI and CT.

It makes sense to you.
 
Upvote 0
Oh yes that does. Here is the pattern of the ACTUAL variable names:
[TABLE="width: 1320"]
<tbody>[TR]
[TD="width: 88"]RelAttend[/TD]
[TD="width: 88"]Religious[/TD]
[TD="width: 88"]Spiritual [/TD]
[TD="width: 88"]Dep[/TD]
[TD="width: 88"]Anx[/TD]
[TD="width: 88"]Combat[/TD]
[TD="width: 88"]Hmechng[/TD]
[TD="width: 88"]DA[/TD]
[TD="width: 88"]Dplyexp[/TD]
[TD="width: 88"]Stress[/TD]
[TD="width: 88"]Relation[/TD]
[TD="width: 88"]DplyRel[/TD]
[TD="width: 88"]Discrim[/TD]
[TD="width: 88"]Unit[/TD]
[TD="width: 88"]StressExp[/TD]
[/TR]
</tbody>[/TABLE]
 
Upvote 0
Do you have that list of variables in some other sheet, in some column as a list?

Or You can create that list on some sheet, say "variable" sheet in Column A starting in cell A2.
With that I start the macro.
 
Upvote 0

Forum statistics

Threads
1,223,244
Messages
6,170,976
Members
452,372
Latest member
Natalie18

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top