Vote count: 0
I am processing a big CSV file (>600 MB) using Pandas. And doing several group by to get frequency stats on various variables in the dataset (Similar to PROC FREQ in SAS). But Pandas is hanging the system down. The memory RAM usage (by Python) is shooting up to 4 GB.
I have other columns also (like amount, date, etc) on which i want to enable stats.
Is there a way that Pandas do not Load to memory and instead processes them in disk, so the system performance doesnt hog down ? Or any suggestions to process efficiently would be great ?
Code below:
colNamesOutputFile=["PROGRAM_NAME", "TEST_GROUP", "NAME", "OFFER"]
inputDF=pd.read_csv(InputFile
, skiprows=1
, names=colNamesOutputFile
, converters={'PROGRAM_NAME': convert_to_string, 'TEST_GROUP': convert_to_string, 'NAME': convert_to_string, 'OFFER': convert_to_string}
, index_col=False)
inputDF1SUM = pd.DataFrame({'Count' : inputDF.groupby(['PROGRAM_NAME','TEST_GROUP']).size()}).reset_index()
inputDF2SUM = pd.DataFrame(inputDF.groupby('NAME')).reset_index()
inputDF3SUM = pd.DataFrame(inputDF.groupby('OFFER')).reset_index()
print(inputDF1SUM)
print(inputDF2SUM)
print(inputDF3SUM)
asked 16 secs ago
Python Pandas Desktop RAM crashing when processing a big file (>600 MB) and doing groupby
Aucun commentaire:
Enregistrer un commentaire