I have tab delimited data that I am reading into an RDD that looks roughly like the following. Let's say there are 12 rows for this example (one for each month in 2010).
Data read into myRDD
...
1/2010 Red 500 Up
2/2010 Blue 300 Left
3/2010 Red 650 Down
4/2010 Green 200 Left
5/2010 Blue 250 Right
6/2010 Blue 300 Up
... ... ... ...
I am trying to use this data to mock up a larger RDD by doing something like the following, effectively doubling the size.
var biggerRDD = myRDD.union(myRDD)
With the union of the RDD on itself I want to increment the date so it appears to span not only the original 2010 dates, but 2011 as well (essentially incrementing the dates in the second half by one year.
I am unsure of how to do this and have been unsuccessful with my attempts.
Data Expansion in Spark (Scala)
Aucun commentaire:
Enregistrer un commentaire