Heya
I’m afraid that right now that functionality is not implemented yet. It’s still on my priority list for the short-term future, but in order to prevent the audio generation bills from becoming larger than the actual server costs, there has to be a set limit on the amount of characters first.
This is because the audio generation has a cost per character, which is fine for vocabulary but with custom cards it would be easy to generate audio for complete sentences, paragraphs or even books.
That said, I’ve been taking a look at another audio generation service, which claims to be able to give pitch accurate audio for Japanese. I might end up replacing the current audio generation service in the near future while working on custom audio. But in order to do so, I’d need to add a pitch accent database to Kitsun first, so it would slow down the schedule slightly. I’ll let you know once I got it implemented!