Hi, I noticed the text “known words” percentage seemed a bit of and did a little digging. When looking at the “wordlist” returned by the internal API, there seem to be parsing issues related to special characters. For example, for the text "「 !? 大学…… 名前!」 " (Doesn’t have any meaning, it is just crafted as a demonstration), the word list returned by the API is:
"々",
"!?",
"大学々々",
"名前々々"
even though “々” never appears in the text. The text is marked as 25% known (even though the text contains no unknown words, especially 大学, 名前 are known). In my actual texts, there are a lot of words in the wordlist that have multiple “々” characters appended for no apparent reason (other than there being special characters), thus making the words count as unknown. Could these special characters be excluded from the known words parsing? Otherwise I could of course pre-process the texts before uploading them to Kitsun, to include no special characters but that would also hurt readability.