samedi 11 avril 2015

Good training data for LDA generic classification?


Vote count:

0




I'm classifying content based on LDA into generic topics such as Music, Technology, Arts, Science


This is the process i'm using,


9 topics -> Music, Technology, Arts, Science etc etc.


9 documents -> Music.txt, Technology.txt, Arts.txt, Science.txt etc etc.


I've filled in each document(.txt file) with about 10,000 lines of content of what i think is "pure" categorical content


I then classify a test document, to see how well the classifier is trained


My Question is,


a.) Is this an efficient way to classify text (using the above steps)


b.) Where should i be looking for "pure" topical content to fill each of these files? Sources which are not too large (text data > 1GB)



asked 29 secs ago







Good training data for LDA generic classification?

Aucun commentaire:

Enregistrer un commentaire