dimanche 1 mars 2015

Word frequency table - different results


Vote count:

0




I'm trying to get a frequency table of words in linux, using shell commands:


I'd like to know how many times "?xml" occurs in the file.


so option1 is grep, sort, unique and :



cat allWords.txt | grep "<?xml" | sort | uniq -c
1 Data=<?xml
12 'http://ift.tt/1AJdaVl'><bCard><?xml
1 <?xml?>
1099 <?xml
4 '<?xml'
3 '<?xml


which is correct.


option 2 is to use faster awk:



awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" allWords.txt | grep "<?xml"
554 <?xml
6 'http://ift.tt/1AJdaVl'><bCard><?xml
3 '<?xml'


which incorrect for some words, but correct for others. awk could be just find for me if I could get it to work right.


So why there's a difference?



asked 19 secs ago

Guy L

685






Word frequency table - different results

Aucun commentaire:

Enregistrer un commentaire