BuJay
Board Regular
- Joined
- Jun 24, 2020
- Messages
- 75
- Office Version
- 365
- 2019
- 2016
- 2013
- Platform
- Windows
I am seeing really, really strange behavior right now. Would appreciate any and all thoughts.
I created an excel file with 2500 columns and 100,000 rows and saved it as a csv to practice loading large csv files into Python using Pandas. I was able to load the 100,000 row file with pandas successfully.
Then, I simply copied the 100,000 rows (excluding headers) and pasted in rows 100,001 through 200,000 to create a 200,000 row csv file. I was able to load the 200,000 row file with pandas successfully.
Then, I simply added another 100,000 rows (excluding headers) and pasted in rows 200,001 through 300,000 to create a 300,000 row csv file. I was able to load the 300,000 row file with pandas successfully.
Here is where is gets strange. The size of the 300,000 row csv is 4.19 GB.
When I open that file and add another 100,000 rows and save as a 400,000 row csv file, the file size remains 4.19 GB and something is corrupting the csv file as its structure appears to change and I cannot load it successfully.
I am deducing that something is corrupting it during save process. Any thoughts?
As an aside, I know there isn't any real reason to use this process for large files - I get that. I am still curious as to what is going on. Also, it is not a python nor pandas issue.
Thanks
I created an excel file with 2500 columns and 100,000 rows and saved it as a csv to practice loading large csv files into Python using Pandas. I was able to load the 100,000 row file with pandas successfully.
Then, I simply copied the 100,000 rows (excluding headers) and pasted in rows 100,001 through 200,000 to create a 200,000 row csv file. I was able to load the 200,000 row file with pandas successfully.
Then, I simply added another 100,000 rows (excluding headers) and pasted in rows 200,001 through 300,000 to create a 300,000 row csv file. I was able to load the 300,000 row file with pandas successfully.
Here is where is gets strange. The size of the 300,000 row csv is 4.19 GB.
When I open that file and add another 100,000 rows and save as a 400,000 row csv file, the file size remains 4.19 GB and something is corrupting the csv file as its structure appears to change and I cannot load it successfully.
I am deducing that something is corrupting it during save process. Any thoughts?
As an aside, I know there isn't any real reason to use this process for large files - I get that. I am still curious as to what is going on. Also, it is not a python nor pandas issue.
Thanks