samedi 22 mars 2014

Hadoop: Nested FOR loop using Map Reduce?


Vote count:

0




I have two files with records and I want to do the following on Hadoop:


(Easy part)



For each Record in both files
compute some values from record and store in array representing the record


Then(The messy part)



For each record array computed in previous step from fileA
For each record array computed in previous step from FileB
IF they have X number of elements in common
print to output


This is what I am trying to do using Hadoop however I have no idea how to do this efficiently without using one reducer for the nested For Loop.


Any suggestions/ideas on how best to go about such a task?


I would prefer to use Python and streaming jar in hadoop.


Thanks



asked 1 min ago

Mo.

2,513





Aucun commentaire:

Enregistrer un commentaire