3D grphique: Unusual behavior of sklearn.datasets.make

lundi 1 septembre 2014

Unusual behavior of sklearn.datasets.make_classification

Vote count:

0

I have generated an unusual bug when using sklearn.datasets.make_classification, as follows:

Starting with the code "plot_classifier_comparison.py" that is located here http://ift.tt/W3MYDD, I change the following statement (which runs fine)


X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                       random_state=1, n_clusters_per_class=1)

to this (i.e., just adding one more feature):


X, y = make_classification(n_features=3, n_redundant=0, n_informative=2,
                       random_state=1, n_clusters_per_class=1)

and receive the following error traceback (where the pathnames are of course local to my machine):


Traceback (most recent call last):
  File "F:/Python Packages/ChartyPy3/plot_classifier_comparison.py", line 94, in <module>
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
  File "F:\Anaconda\lib\site-packages\sklearn\neighbors\classification.py", line 190, in predict_proba
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "F:\Anaconda\lib\site-packages\sklearn\neighbors\base.py", line 311, in kneighbors
    return_distance=return_distance)
  File "binary_tree.pxi", line 1298, in sklearn.neighbors.kd_tree.BinaryTree.query (sklearn\neighbors\kd_tree.c:10427)
ValueError: query data dimension must match training data dimension

Now, I've determined that the first two data sets (i.e., "make_moons" and "make_circles") run fine through all the classifiers. But the third data set (i.e., "linearly_separable") does not: Applying "KNeighborsClassifier(3)" to the third data set generates the error traceback from a call to sklearn.neighbors.kd_tree.BinaryTree.query. I also tried using all default values for make_classification, i.e.,


X, y = make_classification(n_samples=100, n_features=20, n_informative=2,
                       n_redundant=2, n_repeated=0, n_classes=2, n_clusters_per_class=2,
                       weights=None, flip_y=0.01, class_sep=1.0, hypercube=True, shift=0.0,
                       scale=1.0, shuffle=True, random_state=None)

but this too generated the same error traceback, with the same error message, i.e., "ValueError: query data dimension must match training data dimension"

I can't understand why changing the total number of features, or, using only the default values, as input to "make_classification", should generate this error. I am using Python 3.4.1 (64 bit implementation) along with the developer's 64 bit version of scikit-learn. Any guidance on this error and/or how to work around it, would be appreciated.

asked 34 secs ago

user3990797

1

Unusual behavior of sklearn.datasets.make_classification

3D grphique

lundi 1 septembre 2014

Unusual behavior of sklearn.datasets.make_classification

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

lundi 1 septembre 2014

Unusual behavior of sklearn.datasets.make_classification

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0