Cleaning Data - HELP!

sawdeeka

New Member
Joined
Aug 21, 2019
Messages
6
Hi,
I need urgent help to develop a formula for the following:

I have an extremely large data set, approximately 600,000 rows. There are a number of variables that need to be considered but basically I need to delete positive and negative numbers (hours paid) per employee (employee id), per date (date of leave). I am pretty good with excel but this one is hurting! Appreciate any suggestions!


[TABLE="class: grid, width: 500"]
<tbody>[TR]
[TD]Employee ID[/TD]
[TD]Leave Type[/TD]
[TD]Date of Leave[/TD]
[TD]Hours Paid[/TD]
[/TR]
[TR]
[TD]123[/TD]
[TD]Sick Leave[/TD]
[TD]01/01/2000[/TD]
[TD]7.5[/TD]
[/TR]
[TR]
[TD]123[/TD]
[TD]Sick Leave[/TD]
[TD]01/01/2000[/TD]
[TD]-7.5[/TD]
[/TR]
[TR]
[TD]123[/TD]
[TD]Rec Leave[/TD]
[TD]02/02/2001[/TD]
[TD]10.00[/TD]
[/TR]
[TR]
[TD]456[/TD]
[TD]Rec Leave[/TD]
[TD]13/02/2001[/TD]
[TD]8.00[/TD]
[/TR]
[TR]
[TD]456[/TD]
[TD]Sick Leave[/TD]
[TD]13/02/2001[/TD]
[TD]-8.00[/TD]
[/TR]
[TR]
[TD]789[/TD]
[TD]Family Leave[/TD]
[TD]01/01/2000[/TD]
[TD]8.00[/TD]
[/TR]
[TR]
[TD]789[/TD]
[TD]Family Leave[/TD]
[TD]02/01/2000[/TD]
[TD]8.00[/TD]
[/TR]
[TR]
[TD]789[/TD]
[TD]Family Leave[/TD]
[TD]03/01/2000[/TD]
[TD]8.00[/TD]
[/TR]
[TR]
[TD]789[/TD]
[TD]Family Leave [/TD]
[TD]04/01/2000[/TD]
[TD]8.00[/TD]
[/TR]
</tbody>[/TABLE]
 

Excel Facts

Show numbers in thousands?
Use a custom number format of #,##0,K. Each comma after the final 0 will divide the displayed number by another thousand
you can apply a filter and delete the result
you can also use a vba macro
what would be the rules to delete or not ? You're going to delete the entire row or the cell value?
backup before doing any test
 
Upvote 0
you can apply a filter and delete the result
you can also use a vba macro
what would be the rules to delete or not ? You're going to delete the entire row or the cell value?
backup before doing any test

Hi, if I was to apply a filter wouldn't I need to do that per employee, per date? I would be looking to delete the entire row.
 
Upvote 0
Sorry I may not have been clear.

To confirm I would be looking to delete rows that relate to positive and negative entries (hours paid) for individual employees (employee id), on matching dates (date of leave).

Sorry I am not sure I am articulating correctly
 
Upvote 0
Are the positive and negative entries (that needs to be deleted) always adjacent to each other?
 
Upvote 0
There seems to be more questions than answers !!
Can you post the above data set AFTER the lines are deleted /
and
Is the data sorted ?
 
Upvote 0
There seems to be more questions than answers !!
Can you post the above data set AFTER the lines are deleted /
and
Is the data sorted ?

Sorry not too sure I follow, post the data set after the lines are deleted?

I have custom sort the data by Employee ID (smallest to largest) and Date of Leave (oldest to newest)
 
Upvote 0
Ok, assuming the positive and negative entries (that needs to be deleted) always adjacent to each other, try this on a small sample first:
I use col E as a temporary helper column, you may change that to suit
Code:
[FONT=lucida console][COLOR=Royalblue]Sub[/COLOR] a1107635a()
[I][COLOR=seagreen]'https://www.mrexcel.com/forum/excel-questions/1107635-cleaning-data-help.html[/COLOR][/I]
[COLOR=Royalblue]Dim[/COLOR] i [COLOR=Royalblue]As[/COLOR] [COLOR=Royalblue]Long[/COLOR], n [COLOR=Royalblue]As[/COLOR] [COLOR=Royalblue]Long[/COLOR]
[COLOR=Royalblue]Dim[/COLOR] va, vb
n = Range([COLOR=brown]"A"[/COLOR] & Rows.count).[COLOR=Royalblue]End[/COLOR](xlUp).Row
va = Range([COLOR=brown]"A1:C"[/COLOR] & n)
vb = Range([COLOR=brown]"D1:D"[/COLOR] & n)

[COLOR=Royalblue]For[/COLOR] i = [COLOR=crimson]2[/COLOR] [COLOR=Royalblue]To[/COLOR] n
[COLOR=Royalblue]If[/COLOR] vb(i, [COLOR=crimson]1[/COLOR]) < [COLOR=crimson]0[/COLOR] [COLOR=Royalblue]Then[/COLOR]
    [COLOR=Royalblue]If[/COLOR] vb(i - [COLOR=crimson]1[/COLOR], [COLOR=crimson]1[/COLOR]) = -vb(i, [COLOR=crimson]1[/COLOR]) [COLOR=Royalblue]And[/COLOR] va(i - [COLOR=crimson]1[/COLOR], [COLOR=crimson]1[/COLOR]) = va(i, [COLOR=crimson]1[/COLOR]) [COLOR=Royalblue]And[/COLOR] va(i - [COLOR=crimson]1[/COLOR], [COLOR=crimson]3[/COLOR]) = va(i, [COLOR=crimson]3[/COLOR]) [COLOR=Royalblue]Then[/COLOR]
         vb(i - [COLOR=crimson]1[/COLOR], [COLOR=crimson]1[/COLOR]) = [COLOR=brown]"True"[/COLOR]: vb(i, [COLOR=crimson]1[/COLOR]) = [COLOR=brown]"True"[/COLOR]
    [COLOR=Royalblue]ElseIf[/COLOR] vb(i + [COLOR=crimson]1[/COLOR], [COLOR=crimson]1[/COLOR]) = -vb(i, [COLOR=crimson]1[/COLOR]) [COLOR=Royalblue]And[/COLOR] va(i + [COLOR=crimson]1[/COLOR], [COLOR=crimson]1[/COLOR]) = va(i, [COLOR=crimson]1[/COLOR]) [COLOR=Royalblue]And[/COLOR] va(i + [COLOR=crimson]1[/COLOR], [COLOR=crimson]3[/COLOR]) = va(i, [COLOR=crimson]3[/COLOR]) [COLOR=Royalblue]Then[/COLOR]
         vb(i + [COLOR=crimson]1[/COLOR], [COLOR=crimson]1[/COLOR]) = [COLOR=brown]"True"[/COLOR]: vb(i, [COLOR=crimson]1[/COLOR]) = [COLOR=brown]"True"[/COLOR]
    [COLOR=Royalblue]End[/COLOR] [COLOR=Royalblue]If[/COLOR]
[COLOR=Royalblue]End[/COLOR] [COLOR=Royalblue]If[/COLOR]
[COLOR=Royalblue]Next[/COLOR]

[I][COLOR=seagreen]'Col E as helper column[/COLOR][/I]
Range([COLOR=brown]"E:E"[/COLOR]).ClearContents
[COLOR=Royalblue]With[/COLOR] Range([COLOR=brown]"E1:E"[/COLOR] & n)
.Value = vb
[COLOR=Royalblue]On[/COLOR] [COLOR=Royalblue]Error[/COLOR] [COLOR=Royalblue]Resume[/COLOR] [COLOR=Royalblue]Next[/COLOR]
.SpecialCells(xlConstants, xlLogical).EntireRow.Delete
[COLOR=Royalblue]On[/COLOR] [COLOR=Royalblue]Error[/COLOR] [COLOR=Royalblue]GoTo[/COLOR] [COLOR=crimson]0[/COLOR]
[COLOR=Royalblue]End[/COLOR] [COLOR=Royalblue]With[/COLOR]
Range([COLOR=brown]"E:E"[/COLOR]).ClearContents

[COLOR=Royalblue]End[/COLOR] [COLOR=Royalblue]Sub[/COLOR][/FONT]
 
Upvote 0
Simple question really....post the sample dataset, in the first post AFTER your lines had been deleted.
That way we see a before and after result...

But I see @Akuini has given you a solution...:biggrin:
 
Last edited:
Upvote 0

Forum statistics

Threads
1,223,911
Messages
6,175,324
Members
452,635
Latest member
laura12345

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top