Hello,
I am trying to do some analysis on a customer's sales history to identify what types of product they are buying.
I have the list of products each customer buys. I have a list of the items and the categories of those products, and my aim is to step through each customer's sales history, identify the main categories they buy from, then move to the next customer.
I have it working, but I suspect that I am doing it in a very inefficient way, currently, it takes 1 minute to analyze each item they have bought, some customers will by in excess of 200 products, and there are in the region of 2000 customers. Assuming no crashes that would be in the region of 9 months to run the report
The process I am using is:
identify a unique list of all potential categories (600+)
make these column headings on the sales history sheet
set the area under those column headings as a range (down to the last item)
run the below code to identify if that product is listed with that category.
I then sum these columns and do a left to right sort so the categories that are most bought from are to the left.
Seeing how long this took, i thought I could make this better by just using a formula (all that filtering!), so worked out a sumproduct formula to count how many items matched the item code and the category (would only be one if it did match and 0 if not), then looped through each cell in the range entering the formula and then replacing it with the value of the formula. It took 50 minutes to run on 25 items, so doubled the time taken ....
My list of categories are laid out as per below, there may be multiple lines for each item if that item has multiple categories.
https://drive.google.com/open?id=0B66AzgJiYSXsYTlWWmZXdWdyNjA
this table is 144000 rows (and steadily growing each day as more items are added and categorised)
the customer's sales history is laid out as below
https://drive.google.com/file/d/0B66AzgJiYSXsZS1zNHBnSkRPdms/view?usp=sharing
row 1 goes on for 600+ column headings, the list of titles averages out at around 180 to each customer but can be from 1 to 780 at the most.
Can you think of a way to increase the speed of what I am doing? or do you know of a different way I should be approaching this that would make more sense?
I am trying to do some analysis on a customer's sales history to identify what types of product they are buying.
I have the list of products each customer buys. I have a list of the items and the categories of those products, and my aim is to step through each customer's sales history, identify the main categories they buy from, then move to the next customer.
I have it working, but I suspect that I am doing it in a very inefficient way, currently, it takes 1 minute to analyze each item they have bought, some customers will by in excess of 200 products, and there are in the region of 2000 customers. Assuming no crashes that would be in the region of 9 months to run the report
The process I am using is:
identify a unique list of all potential categories (600+)
make these column headings on the sales history sheet
set the area under those column headings as a range (down to the last item)
run the below code to identify if that product is listed with that category.
Code:
For Each cell In rng
With awo.Sheets("Misccat")
.AutoFilterMode = False
.UsedRange.AutoFilter
.UsedRange.AutoFilter field:=1, Criteria1:=ash2.Cells(cell.Row, 1).Value
.UsedRange.AutoFilter field:=2, Criteria1:=ash2.Cells(1, cell.Column).Value 'filters the catagory list
If .AutoFilter.Range.Rows.SpecialCells(xlCellTypeVisible).count > 3 Then cell.Value = 1
End With
Next
I then sum these columns and do a left to right sort so the categories that are most bought from are to the left.
Seeing how long this took, i thought I could make this better by just using a formula (all that filtering!), so worked out a sumproduct formula to count how many items matched the item code and the category (would only be one if it did match and 0 if not), then looped through each cell in the range entering the formula and then replacing it with the value of the formula. It took 50 minutes to run on 25 items, so doubled the time taken ....
Code:
cell.Formula = "=SUMPRODUCT(--(Misccat!$A:$A=""" & Cells(cell.Row, 1).Value & """),--(Misccat!$B:$B=""" & Cells(1, cell.Column).Value & """))"
cell.Value = cell.Value
My list of categories are laid out as per below, there may be multiple lines for each item if that item has multiple categories.
https://drive.google.com/open?id=0B66AzgJiYSXsYTlWWmZXdWdyNjA
this table is 144000 rows (and steadily growing each day as more items are added and categorised)
the customer's sales history is laid out as below
https://drive.google.com/file/d/0B66AzgJiYSXsZS1zNHBnSkRPdms/view?usp=sharing
row 1 goes on for 600+ column headings, the list of titles averages out at around 180 to each customer but can be from 1 to 780 at the most.
Can you think of a way to increase the speed of what I am doing? or do you know of a different way I should be approaching this that would make more sense?