jeudi 29 janvier 2015

how does a stage be splited into tasks in Apache Spark?


Vote count:

0




In spark-summit 2014, Aaron gives the speak A Deeper Understanding of Spark Internals , in his slide, page 17 show a stage has been splited into 4 tasks as bellow:

enter image description here


Here I wanna know three things about how does a stage be splited into tasks?




  1. in this example above, it seems that tasks' number are created based on the file number, am I right?




  2. if I'm right in point 1, so if there was just 3 files under directory names, will it just create 3 tasks?




  3. If I'm right in point 2, what if there is just one but very large file? Does it just split this stage into 1 task? And what if when the data is coming from a streaming data source?




thanks a lot, I feel confused in how does the stage been splited into tasks.



asked 36 secs ago







how does a stage be splited into tasks in Apache Spark?

Aucun commentaire:

Enregistrer un commentaire