Vote count: 0
I have the following database schema
keywords(id, keyword, lang) :( about 8M records)
topics(id, topic, lang) : ( about 2.6M records)
topic_features(topic_id, keyword_id, weight) : (200M records)
In a script, I have about 50-100 keywords with an additional field keyword_score
and I want to retrieve the top 20 topics that corresponds to those keywords based on the following formula : SUM(keyword_score * topic_weight)
A solution I implemented currently in my script is :
- I create a temporary table as follow
temporary_keywords(keyword_id, keyword_score )
- Insert all 50-100 keywords to it with their
keyword_score
-
Then execute the following query to retrieve topics
SELECT topic_id, SUM(weight * keyword_score) AS score FROM temporary_keywords JOIN topic_features USING keyword_id GROUP BY topic_id ORDER BY score DESC LIMIT 20
This solution works, but it takes in some cases up to 3 seconds to execute, which is too much for me.
I'm asking if there is a way to optimize this query? or should I redesign the data structure into a NoSQL database?
Any other solutions or ideas beyond what is listed above are most appreciated
asked 1 min ago
Optimize join query with SUM, Group By and ORDER By Clauses
Aucun commentaire:
Enregistrer un commentaire