Vote count: 0
For three text document vectors having different length in their vectors in VSM where entries are tf-idf of terms:
Q1: how cosine similarity used by k-means does then how the clusters are constructed.
Q2: when I use TF-IDF algo. Its produce a negative values is there any problem in my calculation?
Please use the following docs vectors is VSM (tf.idf) where all have different vector length for explanation purposes.
Doc1 (0.134636045, -0.000281926, -0.000281926, -0.000281926, -0.000281926, 0)
Doc2 (-0.002354898, 0.012411358, 0.012411358, 0.09621575, 0.3815553)
Doc3(-0.001838258, 0.009688438, 0.019376876, 0.05633028, 0.59569238, 0.103366223, 0)
i will thank any one can give explanation about my question.
asked 32 secs ago
How does cosine similarity used with K-means algorithm?
Aucun commentaire:
Enregistrer un commentaire