mardi 22 mars 2016

numpy UnicodeDecodeError am I using the right approach with genfromtxt

I am stuck. I want to read a simple csv file into a Numpy array and seem to have dug myself into a hole. I am new to Numpy and I am SURE I have messed this up somehow as usually I can read CSV files easily in Python 3.4. I don't want to use Pandas so I thought I would use Numpy to increase my skillset but I really am not getting this at all. If someone could tell me if I am on the right track using genfromtxt OR is there an easier way and give me a nudge in the right direction I would be grateful. I want to read in the CSV file manipulate the datetime column to 8/4/2014 then put it in a numpy array together with the remaining columns. Here is what I have so far and the error which I am having trouble coding around. I can get the date part way there but don't see how to add the date.strftime("%Y-%m-%d") to the datefunc. Also I don't see how to format the string for SYM to get round the error. Any help would be appreciated.

the data

 2015-08-04 02:14:05.249392, AA, 0.0193103612, 0.0193515212, 0.0249713335, 30.6542480634, 30.7195875454, 39.640763021, 0.2131498442 29.0406746589, 13524.5347810182, 89, 57, 99 
 2015-08-04 02:14:05.325113, AAPL, 0.0170506271, 0.0137941891, 0.0105915637, 27.0670313481, 21.8975963326, 16.8135861893, -19.0986405157, -23.2172064279, 21.5647072302, 33, 26, 75 
 2015-08-04 02:14:05.415193, AIG, 0.0080808151, 0.0073296055, 0.0076213535, 12.8278962785, 11.635388035, 12.0985236788, -9.2962105215, 3.980405659, -142.8175077335, 71, 42, 33 
 2015-08-04 02:14:05.486185, AMZN, 0.0235649449, 0.0305828226, 0.0092703502, 37.4081902773, 48.5487257749, 14.7162247572, 29.7810062852, -69.6877219282, -334.0005615016, 2, 92, 10 

the "code" sorry still learning

import numpy as np

from datetime import datetime
from datetime import date,time


datefunc = lambda x: datetime.strptime(x.decode("utf-8"), '%Y-%m-%d %H:%M:%S.%f')
a = np.genfromtxt('/home/dave/Desktop/development/hvanal2016.csv',delimiter = ',',
converters = {0:datefunc},dtype='object,str,float,float,float,float,float,float,float,float,float,float,float,float',
names = ["date","sym","20sd","10sd","5sd","hv20","hv10","hv5","2010hv","105hv","abshv","2010rank","105rank","absrank"])

print(a["date"])
print(a["sym"])
print(a["20sd"])
print(a["hv20"])
print(a["absrank"])

the error

Python 3.4.3+ (default, Oct 14 2015, 16:03:50) 
[GCC 5.2.1 20151010] on linux
Type "copyright", "credits" or "license()" for more information.
>>> 
============================================================================== RESTART: /home/dave/3 9 15 my slope.py ===============================================================================
[datetime.datetime(2015, 8, 4, 2, 14, 5, 249392)
 datetime.datetime(2015, 8, 4, 2, 14, 5, 325113)
 datetime.datetime(2015, 8, 4, 2, 14, 5, 415193) ...,
 datetime.datetime(2016, 3, 18, 1, 0, 25, 925754)
 datetime.datetime(2016, 3, 18, 1, 0, 26, 26400)
 datetime.datetime(2016, 3, 18, 1, 0, 26, 114828)]
 Traceback (most recent call last):
 File "/home/dave/3 9 15 my slope.py", line 19, in <module>
  print(a["sym"])
 File "/usr/lib/python3/dist-packages/numpy/core/numeric.py", line 1615, in array_str
 return array2string(a, max_line_width, precision, suppress_small, ' ', "", str)
File "/usr/lib/python3/dist-packages/numpy/core/arrayprint.py", line 454, in array2string
separator, prefix, formatter=formatter)
File "/usr/lib/python3/dist-packages/numpy/core/arrayprint.py", line 328, in _array2string
_summaryEdgeItems, summary_insert)[:-1]
File "/usr/lib/python3/dist-packages/numpy/core/arrayprint.py", line 490, in _formatArray
word = format_function(a[i]) + separator
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000) 



numpy UnicodeDecodeError am I using the right approach with genfromtxt

Aucun commentaire:

Enregistrer un commentaire