I have 500 000 data records. The variable fields of interest are amount, issue date and age. Data can be identical across the three fields and I want to remove some of the data records.
For example, if i have 10 data records with the same amount, issue date and age, I need to remove 20% of the records (i.e. 2 records) and only have 8 records left. 20% is the scaling factor.
The code will therefore find the total number of records that are identical and remove 20% of such records.
Due to the size of the data, I cannot do that manually so I would really really appreciate your help.
Thank you
Evans
For example, if i have 10 data records with the same amount, issue date and age, I need to remove 20% of the records (i.e. 2 records) and only have 8 records left. 20% is the scaling factor.
The code will therefore find the total number of records that are identical and remove 20% of such records.
Due to the size of the data, I cannot do that manually so I would really really appreciate your help.
Thank you
Evans