mardi 22 mars 2016

Data Expansion in Spark (Scala)

I have tab delimited data that I am reading into an RDD that looks roughly like the following. Let's say there are 12 rows for this example (one for each month in 2010).

Data read into myRDD...

1/2010    Red    500    Up
2/2010    Blue   300    Left
3/2010    Red    650    Down
4/2010    Green  200    Left
5/2010    Blue   250    Right
6/2010    Blue   300    Up
...       ...    ...    ...

I am trying to use this data to mock up a larger RDD by doing something like the following, effectively doubling the size.

var biggerRDD = myRDD.union(myRDD)

With the union of the RDD on itself I want to increment the date so it appears to span not only the original 2010 dates, but 2011 as well (essentially incrementing the dates in the second half by one year.

I am unsure of how to do this and have been unsuccessful with my attempts.



Data Expansion in Spark (Scala)

Aucun commentaire:

Enregistrer un commentaire