Vote count:
0
I'm currently working with Pandas (0.14.1) in Python 3.4.2 importing data from a Mongo database using pymongo (2.8). Upon a simple import,
cur = db.collection.find()
df = pd.DataFrame(list(cur))
I'm getting the following error:
InvalidBSON: 'utf-8' codec can't decode byte 0xed in position 3123: invalid continuation byte
Import note: Previously, I was doing the same tasks (importing the same collections into a pandas dataframe for processing) using pandas in Python 2.7+ and all of the imports worked without issue. For other reasons, I would now prefer to stay in the 3.4+ environment.
While I cannot share the data, I can say it is UTF-8 encoded (which makes the error confusing) line-delimited JSON documents I bulk imported into MongoDB. Some of the fields contain many unicode characters. Up until now, working in the mongo console and python 2.7+ with read-only (from the db) tasks, I have not run into the above problem. As a check, after getting this error in python 3.4, I ran the same code in 2.7 (for the same db collection) and it imported fine.
Is anyone able to provide some insight into what is happening, and perhaps provide some support to remedy the problem? I am willing to provide any additional information I can.
InvalidBSON on MongoDB import - Pandas
Aucun commentaire:
Enregistrer un commentaire