Oh, I’ve just realized that it’s also possible if the video doesn’t have a transcript. You can download the audio and feed it into OpenAI Whisper (which is currently the best available audio transcription model), and pass the transcript to the LLM. And Whisper isn’t even too expensive.
Not sure about the legality of it though.
This one:
https://github.com/SleeplessOne1917/lemmy-bot