lundi 13 février 2017

Optimize join query with SUM, Group By and ORDER By Clauses

Vote count: 0

I have the following database schema

keywords(id, keyword, lang) :( about 8M records)
topics(id, topic, lang) : ( about 2.6M records)
topic_features(topic_id, keyword_id, weight) : (200M records)

In a script, I have about 50-100 keywords with an additional field keyword_score and I want to retrieve the top 20 topics that corresponds to those keywords based on the following formula : SUM(keyword_score * topic_weight)

A solution I implemented currently in my script is :

  • I create a temporary table as follow temporary_keywords(keyword_id, keyword_score )
  • Insert all 50-100 keywords to it with their keyword_score
  • Then execute the following query to retrieve topics

    SELECT topic_id,  SUM(weight * keyword_score) AS score
    FROM temporary_keywords
    JOIN topic_features USING keyword_id
    GROUP BY topic_id
    ORDER BY score DESC
    LIMIT 20

This solution works, but it takes in some cases up to 3 seconds to execute, which is too much for me.

I'm asking if there is a way to optimize this query? or should I redesign the data structure into a NoSQL database?

Any other solutions or ideas beyond what is listed above are most appreciated

asked 1 min ago

Let's block ads! (Why?)

Optimize join query with SUM, Group By and ORDER By Clauses

Aucun commentaire:

Enregistrer un commentaire