jeudi 28 août 2014

Python - Strange results opening/reading peculiar type of data file


Vote count:

0




Ready for a brain teaser?


I'm trying to crack a bit of a strange problem. I have a data file, it is the database of a program for a scientific instrument (Mettler DSC / STARe software), and I'm trying to grab sample information from experiments. It's a .t00 file, over 40 mb in size (it stores essentially all the data of the runs), and I know very little about the encoding. I can open this file in Wordpad and can see the information I'm looking for (sample names, timestamp, experiment parameters), surrounded by lots of gobbledygook (¶+ú@”‹ø@ðßö@¨...).


I can read the file into python with a basic file handler and use regex to get some of the pieces of info I want.



def textOpenLines(filename,mode='r'):
with open(filename, mode) as content_file:
return [line for line in content_file]


I'm able to take that list and search it for relevant strings and get the sample name from it. BUT from looking at the file in Wordpad, I found that the sample name is listed twice, the second time it has the datestamp following it (e.g. 'Dibenzoylperoxid 120 C 03.05.1994 14:24:30'). In python, I can't find this string. I can't find even the timestamp by itself. When I look at the line where it is supposed to occur, I get a bunch of random bytes. Opening in Notepad looks like the python output.


I suspect it's an encoding issue. I've tried reading the file in as Unicode, I've tried taking snippets of lines and reading those in, but I can't crack it. I'm stumped.


Any thoughts on how to read this in so that it decodes right? Wordpad got it right (though now subsequently trying to open it, it looks like the Notepad output).


Thanks!!



asked 2 mins ago







Python - Strange results opening/reading peculiar type of data file

Aucun commentaire:

Enregistrer un commentaire