mercredi 8 février 2017

Python Pandas Desktop RAM crashing when processing a big file (>600 MB) and doing groupby

Vote count: 0

I am processing a big CSV file (>600 MB) using Pandas. And doing several group by to get frequency stats on various variables in the dataset (Similar to PROC FREQ in SAS). But Pandas is hanging the system down. The memory RAM usage (by Python) is shooting up to 4 GB.

I have other columns also (like amount, date, etc) on which i want to enable stats.

Is there a way that Pandas do not Load to memory and instead processes them in disk, so the system performance doesnt hog down ? Or any suggestions to process efficiently would be great ?

Code below:

colNamesOutputFile=["PROGRAM_NAME", "TEST_GROUP", "NAME", "OFFER"]
inputDF=pd.read_csv(InputFile
, skiprows=1
, names=colNamesOutputFile
, converters={'PROGRAM_NAME': convert_to_string, 'TEST_GROUP': convert_to_string, 'NAME': convert_to_string, 'OFFER': convert_to_string}
, index_col=False)
inputDF1SUM = pd.DataFrame({'Count' : inputDF.groupby(['PROGRAM_NAME','TEST_GROUP']).size()}).reset_index()
inputDF2SUM = pd.DataFrame(inputDF.groupby('NAME')).reset_index()
inputDF3SUM = pd.DataFrame(inputDF.groupby('OFFER')).reset_index()
print(inputDF1SUM)
print(inputDF2SUM)
print(inputDF3SUM)

asked 16 secs ago

Let's block ads! (Why?)



Python Pandas Desktop RAM crashing when processing a big file (>600 MB) and doing groupby

Aucun commentaire:

Enregistrer un commentaire