3D grphique: Word frequency table

dimanche 1 mars 2015

Word frequency table - different results

Vote count:

0

I'm trying to get a frequency table of words in linux, using shell commands:

I'd like to know how many times "?xml" occurs in the file.

so option1 is grep, sort, unique and :


 cat allWords.txt | grep  "<?xml"  | sort | uniq -c  
      1 Data=<?xml
     12 'http://ift.tt/1AJdaVl'><bCard><?xml
      1 <?xml?>
   1099 <?xml
      4 '<?xml'
      3 '<?xml

which is correct.

option 2 is to use faster awk:


awk '{a[$1]++}END{for(k in a)print a[k],k}' RS=" |\n" allWords.txt  | grep "<?xml"
554 <?xml
6 'http://ift.tt/1AJdaVl'><bCard><?xml
3 '<?xml'

which incorrect for some words, but correct for others. awk could be just find for me if I could get it to work right.

So why there's a difference?

asked 19 secs ago

Guy L

685

Word frequency table - different results

3D grphique

dimanche 1 mars 2015

Word frequency table - different results

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

dimanche 1 mars 2015

Word frequency table - different results

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0