Quantcast
Channel: Poor relevance (possibly mysql-8) | WordPress.org
Viewing all articles
Browse latest Browse all 12

Reply To: Poor relevance (possibly mysql-8)

$
0
0

Having a poke around with settings more like what we usually use. With content matching, the weirdness around NOT IN (SELECT id) seems to go away, but the matches are really different to the previous server. These results may not mean very in isolation, but for interest, this query

SELECT 40775 AS reference_ID, ID, ROUND(0 + (MATCH (post_content) AGAINST ('solar cabbage brassicas caterpillars farms panels beneficial may plants butterflies chemical glucobrassicin eggs fields increase insects insect wildlife offer damaged')) * 1 + (MATCH (post_title) AGAINST ('butterfly updates')) * 1 + COUNT(DISTINCT IF( terms.term_taxonomy_id IN (919,920), terms.term_taxonomy_id, null )) * 1 + COUNT(DISTINCT IF( terms.term_taxonomy_id IN (4344,112,6708,6707,58,6703,6246,132,6706,6705,6712,6709,6711,6710,4213,6704,5901,6713), terms.term_taxonomy_id, null )) * 3,4) AS score
FROM blog_posts left join blog_term_relationships as terms on ( terms.object_id = blog_posts.ID )
WHERE post_status IN ( 'publish', 'static' )
AND post_password =''
AND post_type IN ('post')
AND blog_posts.ID NOT IN ( 40775)
GROUP BY ID
HAVING score >= 2.000000
AND ID != 0
AND bit_or(terms.term_taxonomy_id IN (792,15,4612,1354,791,9)) = 0 and COUNT(DISTINCT IF( terms.term_taxonomy_id IN (919,920), terms.term_taxonomy_id, null )) >= 1
ORDER BY score DESC;

produces quite different top matches on the old database

40775 33720 71.9465
40775 21211 67.4809
40775 29723 26.0072
40775 39918 22.9360
40775 24623 21.6051
40775 34554 21.5452
40775 39009 21.0883
40775 726 18.6029
40775 15599 18.4169
40775 21522 18.4026
40775 35919 17.5108
40775 1863 17.4167
40775 34839 16.8549
40775 23774 16.7163
40775 24606 16.6431
40775 5681 14.9384
40775 30694 14.2196
40775 33751 14.2144
40775 34244 14.0386

compared to the new database

40775 33706 102.2484
40775 11699 63.8313
40775 22225 33.9329
40775 39804 33.7642
40775 29721 32.4923
40775 34829 31.8152
40775 39001 27.5836
40775 36323 25.9082
40775 36239 25.4413
40775 24606 25.388
40775 35906 25.3144
40775 21492 25.0634
40775 40406 24.9763
40775 1848 23.8333
40775 5665 23.1519
40775 493 22.135
40775 33739 21.754
40775 20986 21.574
40775 38189 20.232

e.g. in the old database results the top hit 33720 is about solar panels & butterflies – highly revelant – but in the new results 33706 is about swifts & their nests, no keywords present. It’s very weird!


Viewing all articles
Browse latest Browse all 12

Trending Articles