With yarpp still set to only consider titles, I pulled this query out by using a query monitor:
insert into blog_yarpp_related_cache (reference_ID,ID,score)
SELECT 40775 AS reference_ID, ID, ROUND(0 + (MATCH (post_title) AGAINST ('butterfly updates')) * 3,4) AS score
FROM blog_posts left join blog_term_relationships as terms on ( terms.object_id = blog_posts.ID )
WHERE post_status IN ( 'publish', 'static' )
AND post_password =''
AND post_type IN ('post','page','attachment')
AND blog_posts.ID NOT IN (40775)
GROUP BY ID
HAVING score >= 2.000000
AND ID != 0
AND bit_or(terms.term_taxonomy_id IN (792,15,4612,1354,791,9)) = 0
ORDER BY score DESC
LIMIT 5 on duplicate key update date = now()
I then noticed that this subquery was giving me different results on mysql8 compared to mysql5.7
SELECT ROUND(0 + (MATCH (post_title) AGAINST ('butterfly updates')) * 3,4) AS score
FROM blog_posts left join blog_term_relationships as terms on ( terms.object_id = blog_posts.ID )
WHERE post_status IN ( 'publish', 'static' )
AND post_password =''
AND post_type IN ('post','page','attachment')
AND blog_posts.ID NOT IN (40775)
GROUP BY ID
HAVING score >= 2.000000
AND ID != 0
AND bit_or(terms.term_taxonomy_id IN (792,15,4612,1354,791,9)) = 0
ORDER BY score DESC
in particular, the AND blog_posts.ID NOT IN (40775)
appeared to be behaving weirdly – with the clause present, the blog with id 40775 was being excluded, but also some other blogs with different ids. mysql-5.7 was returning 14 blogs, mysql-8 with the clause was returning 11 blogs, and mysql-8 with the clause removed was returning 15 blogs (the 14 plus the blog that should be excluded 40775). Amending this clause to AND blog_posts.ID NOT IN (SELECT 40775)
appears to give the same results as mysql-5.7. Whilst sort of interesting, this hasn’t fixed my relevance issues, though!
The yarpp.php file says I’m using version 5.30.10, although the WordPress admin interface is offering me “beta (4.0.7b1)”.