Hi Guys. Sorry been on holiday hence reason for silence. I don't want to waste too much of your time but could you give me a steer on how big a job i'm looking at. My big problem is there is no "customerIdentifier" and I'm strugglingto think of how to make one. However couple of strategic questions. I did the select distinct for the invoice static data and below is a typical output.
<tbody>
[TD="align: left"]
InvNum
[/TD]
[TD="width: 90, bgcolor: #FFFFFF, align: left"]
RegNum
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Salut
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Surname
[/TD]
[TD="width: 159, bgcolor: #FFFFFF, align: left"]
ADDR1
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
ADDR2
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
ADDR3
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Postcode
[/TD]
[TD="width: 129, bgcolor: #FFFFFF, align: left"]
Mobile
[/TD]
[TD="width: 174, bgcolor: #FFFFFF, align: left"]
Email
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"]
123456
[/TD]
[TD="width: 90, bgcolor: #FFFFFF, align: left"]
OE08 DYO
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Mrs
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Smith
[/TD]
[TD="width: 159, bgcolor: #FFFFFF, align: left"]
Flat 15, 1 Jones Ave
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Stoke
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"][/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
ST1 4RQ
[/TD]
[TD="width: 129, bgcolor: #FFFFFF, align: left"]
07901 533985
[/TD]
[TD="width: 174, bgcolor: #FFFFFF, align: left"]
sally.smith@gmail.com
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"]
123457
[/TD]
[TD="width: 90, bgcolor: #FFFFFF, align: left"]
OE08 DYO
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Mr
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Smith
[/TD]
[TD="width: 159, bgcolor: #FFFFFF, align: left"]
Flat 15 1 Jones Ave
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Stoke
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"][/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
ST1 4RQ
[/TD]
[TD="width: 129, bgcolor: #FFFFFF, align: left"]
07977 659867
[/TD]
[TD="width: 174, bgcolor: #FFFFFF, align: left"]
tim.smith@gmail.com
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"]
123458
[/TD]
[TD="width: 90, bgcolor: #FFFFFF, align: left"]
OE08 DYO
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
,
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Smith
[/TD]
[TD="width: 159, bgcolor: #FFFFFF, align: left"]
Flat 15, 1 Jones Ave
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Stoke
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"][/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
ST14RQ
[/TD]
[TD="width: 129, bgcolor: #FFFFFF, align: right"]
7901533985
[/TD]
[TD="width: 174, bgcolor: #FFFFFF, align: left"]
sally.smith@gmail.com
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"]
123459
[/TD]
[TD="width: 90, bgcolor: #FFFFFF, align: left"]
OE08 DYO
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Mrs
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
?
[/TD]
[TD="width: 159, bgcolor: #FFFFFF, align: left"]
,,
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
Stoke
[/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: right"][/TD]
[TD="width: 84, bgcolor: #FFFFFF, align: left"]
ST1 4RQ
[/TD]
[TD="width: 129, bgcolor: #FFFFFF, align: left"]
07901 533985
[/TD]
[TD="width: 174, bgcolor: #FFFFFF, align: left"]
sally.smith.gmail.com
[/TD]
</tbody>
As you can see data is sometimes incomplete (blanks or spacial characters), invalid (no @ in email) or just different (use of comma or spaces). We do not always have the car reg either. It is obvious to the human eye this is one household with 2 people bringing in the same vehicle. So I have 2 strategic questions:
Clearly we need to improve the quality of the data capture, but we have no control over the level of validation at the input stage. We are looking at changing system by the end of the year, but in the short term I have to work with what we've got. The question is what sort of level of investment in time will I need to make to firstly learn the skills and implement a solution, and how good could it realisitcally be if the raw data is as above? Basically do I accept I see 4 unique customers with the downside of undervlauing them and also overmarketing to them.