Shortmeister1
Board Regular
- Joined
- Feb 19, 2008
- Messages
- 207
- Office Version
- 365
- 2016
- 2013
- 2010
- 2007
- 2003 or older
- Platform
- Windows
Hi all
I've got an interesting problem for you that I am feeling fairly clueless about. I'm trying to create a dataset of my entire personal PDF bank statements. The data is a little messy to put it mildly and I am definitely out of my depth on this one.
My first attempt was relatively successful. I took all of the bank statements up to June '23, combined them and did an extensive but somewhat painful clean-up of the irrelevant table data.
Then Nat West Bank (bless them! ) decided to radically change their statement table format from July. Once again in most cases, I'm capable of cleaning the data, although it is an even more complex procedure - combining all of the columns into one string and then selecting out the relevant data. Where I've come unstuck is that sometimes the individual transaction is in one distinct record and some times it has spilled onto a second row.
Clear as mud? Sigh!
If anyone knows how to deal with something like this, then I'd love you to respond, but I don't think it can be done as there are way too many random variables.
Thanks
Martin
(This is an excerpt from my real bank statement, but it is heavily anonymised. Please excuse the garish colours, but it does make it easier to see which transaction is which.)
I've got an interesting problem for you that I am feeling fairly clueless about. I'm trying to create a dataset of my entire personal PDF bank statements. The data is a little messy to put it mildly and I am definitely out of my depth on this one.
My first attempt was relatively successful. I took all of the bank statements up to June '23, combined them and did an extensive but somewhat painful clean-up of the irrelevant table data.
Then Nat West Bank (bless them! ) decided to radically change their statement table format from July. Once again in most cases, I'm capable of cleaning the data, although it is an even more complex procedure - combining all of the columns into one string and then selecting out the relevant data. Where I've come unstuck is that sometimes the individual transaction is in one distinct record and some times it has spilled onto a second row.
Clear as mud? Sigh!
If anyone knows how to deal with something like this, then I'd love you to respond, but I don't think it can be done as there are way too many random variables.
Thanks
Martin
(This is an excerpt from my real bank statement, but it is heavily anonymised. Please excuse the garish colours, but it does make it easier to see which transaction is which.)