3D grphique: Spark SQL Join Speed is Slow

vendredi 12 septembre 2014

Spark SQL Join Speed is Slow

Vote count:

0

(Spark 1.0.1): I have two files on my local, t1 and t2.

t1:


a,1
a,2
a,3
a,4
b,6
b,7
b,8
b,8
b,10

t2:


a,matthew
b,crouse

My code looks like this (python):


from pyspark.sql import *
sqlContext = SQLContext(sc)

t1raw = sc.textFile("../../data/t1.csv")
t1fields = t1raw.map(lambda l: l.split(","))
t1 = t1fields.map(lambda p: {"alias" : p[0], "qty": int(p[1])})
schemat1 = sqlContext.inferSchema(t1)
schemat1.registerAsTable("t1")

t2raw = sc.textFile("../../data/t2.csv")
t2fields = t2raw.map(lambda l: l.split(","))
t2 = t2fields.map(lambda p: {"alias" : p[0], "name" : p[1]})
schemat2 = sqlContext.inferSchema(t2)
schemat2.registerAsTable("t2")

q = sqlContext.sql("SELECT * FROM t1")
q.collect() #immediate

q = sqlContext.sql("SELECT * FROM t2")
q.collect() #immediate

q = sqlContext.sql("SELECT * FROM t1 WHERE qty > 3")
q.collect() #immediate

q = sqlContext.sql("SELECT t1.alias, t2.name, t1.qty FROM t1 INNER JOIN t2 ON t1.alias = t2.alias")
q.collect() # half a minute

SELECT *s on a single table feel immediate. a WHERE on a single table seems immediate. However joining one table to the other takes about half a minute.

I was under the impression that it would have been much faster? Is there something about inner joins that makes Spark SQL slow?

3D grphique

vendredi 12 septembre 2014

Spark SQL Join Speed is Slow

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

vendredi 12 septembre 2014

Spark SQL Join Speed is Slow

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0