I think that reviews that are recent should have a higher weight in finding leeches, so the list is more relevant to the current status of the word in the user’s mind.
Have a “leech rating” for each word that is used to determine the top leeches.
The leech rating is calculated by:
Sum of (failure_value * 0.9^(position))
So in the rating, the most recent review with the word (position 0) has a weight of 100%, so failing there gives 1 in the leech rating, if the previous failure was 2 reviews ago (position 2), that failure has 81% weight so it adds 0.81 to the leech rating.
You don’t need to store the review history to calculate the values, though. You only need to store the current leech rating. This variable can be easily calculated with an online algorithm:
For each new word, initialize the leech rating to 0.
Each time the user gets a review on the word correctly, the leech rating is decreased by 10% overall (multiplied by 0.9)
Each time the user gets a review wrong, the leech rating changes to:
LR := 1 + LR*0.9