As usual with these things: it is impressive that stuff like this can be generated so quickly at all, but the content is very superficial and often wrong or at least misleading. It's unusable for learning, but great for spamming platforms, just like NotebookLM for instance.
As an example, I asked about the Cantor function. It generated a 1:24 video, which is laughably short, explained correctly how the Cantor set is defined but showed a flawed visual representation, then simply skipped over how the Cantor function is constructed and simply states the basic properties. Sorry, but this is garbage content.
Those videos are apex quality videos. You might as well ask for Nobel prize literature quality essays from ChatGPT.
You can probably imitate the structure/scaffolding of a 3b1b video in a cargo cult way, but you are losing domain expert level verification of quality (which is why AI fails, because it's not a domain expert).
So heres how I'm hearing yoir question, and it answers itself: "how do I get domain expert quality from a non-domain expert AI?"....
It's a Bible reference. Jesus supposedly said that "A prophet is honored everywhere except in his own hometown and among his own family". When you grew up with someone everything will be colored by your past experience with them. You remember all the grandiose stupid things they said in the past. If they are saying actually wise things now you will not believe them as much as you should. On the other hand if they are still saying stupid things other people have not yet learned caution.
I guess the connection is that when you're far away from home you can project an aura of confidence/respectability that you may not have at home where people know you since the days you were not yet an expert.
Of course people can't be born experts and for every expert there must be a prior step in their personal growth when they were less expert. But that doesn't prevent people from using an imperfect heuristic for judging whether you can trust someone's expertise.
It's possible. I don't know. Sometimes people just come up with the same ideas over and over.
Classic example is the pyramid structures around the world. The fact that people found that it's easy to pile rocks that way in order to build tall stable structures doesn't mean that there was a single culture that span the entire globe.
I'm not really a bear on llms, but using it for explanations in this way needs a lot of caution. It's very easy to give an argument that "seems" right at first glance, e.g. https://xkcd.com/803/
totally fair! I like the XKCD comic as well because it hints at a potential solution - even if you can't always be correct, how you respond to critical questions can really help. I'm working on a feature for users to ask follow up questions and definitely going to consider how to make it most honest and curious
In his last lecture he mentions that the best part of teaching is when a student sends them a photo of a rainbow 10 years after graduating, saying that they remember the whole physics behind it.
Same for me. I think of the cones of light being formed when I see the rainbow and that amazing lecture.
The voice is just OpenAI’s default tts voice. I agree that Veritasium video is an incredible work and the ai version is absurd by comparison!
This is mostly a proof of concept that this is possible at all, and as LLMs get smarter it’ll be interesting to see if the quality automatically improves. For now, the tool is really only useful for very specific or personal questions that wouldn’t already exist on YouTube.
Well, I'm blown away. "Show me how information propagates through a neural net."
I feel like this is the one thing that's been missing from all the LLMs: actual visual explainers, whether image or video. python plots only get you so far, and all the Diffusion stuff is nonsensical. This is amazing.
I have to give you props for not requiring me to sign up, I’ve seen many ShowHN posts lately that require me to unnecessarily create an account which always prompts me to close the tab immediately.
I ran it a second time and got a longer video of 1:55 but it primarily just created Text. It also didn't explain Bellman's equation and wrote it incorrectly:
These are amazing examples! Thanks for all the feedback, detailed info, and persistence in trying! HN hug of death means I'm running into Gemini rate limits unfortunately :( will def make that more clear when it happens in the UI and try to find some workarounds.
The other issues are bugs with my streaming logic retrying clips which failed to generate. LLMs aren't yet perfect at writing Manim, so to keep things smooth I try to skip clips which fail to render properly. Still also have layout issues which are hard to automatically detect.
I expect with a few more generations of LLM updates, prompt iterating, and better streaming/retrying logic on my end this will become more reliable
There is a job queue on the backend with statuses, just not worth breaking the streaming experience to ask the LLM rewrite broken manim segments out of order
MathGPT also has this (exactly in the same 3blue1brown style, so I guess they also use manim), and in my experience it does actually work better and tries to explain math and write the equations.
I think they use some extremely cheap model for writing the code, probably 4o-mini or similar.
Whether I click an existing example or type in my own it doesn't seem to work. A dialog pops up for a second saying 'generating video' and then disappears.
The Manim is not great, words overlapping etc, the page itself needs a lot of work, generating and then nothing happens, and it seems in a lot of case your backend workflows have issues, with descriptions only starting and ending after a say 30s when it needs to go another 1-2 minutes at least
Wow, this is awesome! Thanks for building. I didn't realize there was a protocol for streaming while rendering, though I noticed sumo.ai doing something similar for audio. Gemini with grounding is new to me also, very nice!
Haha that’s funny, the beehive one lays out the hexagons as if they were squares - so they overlap and have empty space lol! But still it’s a promising concept.
Btw for some reason on iOS I had to download to view the video
Hmm the initial version of the app only took me about a day to get something working, but that version took minutes to generate a single video and even then only worked a third of the time. It took a solid 2 weeks from there to add all the edge cases to the prompt to increase reliability, add GPU rendering and streaming to improve performance/latency, and shore up the infra for scaling.
As an example, I asked about the Cantor function. It generated a 1:24 video, which is laughably short, explained correctly how the Cantor set is defined but showed a flawed visual representation, then simply skipped over how the Cantor function is constructed and simply states the basic properties. Sorry, but this is garbage content.
You can probably imitate the structure/scaffolding of a 3b1b video in a cargo cult way, but you are losing domain expert level verification of quality (which is why AI fails, because it's not a domain expert).
So heres how I'm hearing yoir question, and it answers itself: "how do I get domain expert quality from a non-domain expert AI?"....
"An expert is a person who is far away from home and gives advice"
Of course people can't be born experts and for every expert there must be a prior step in their personal growth when they were less expert. But that doesn't prevent people from using an imperfect heuristic for judging whether you can trust someone's expertise.
Classic example is the pyramid structures around the world. The fact that people found that it's easy to pile rocks that way in order to build tall stable structures doesn't mean that there was a single culture that span the entire globe.
Compared to https://www.youtube.com/watch?v=24GfgNtnjXc this video is absurdly limited https://tma.live/video/9c8e725e-ec21-41a7-984a-317d84216497
https://youtu.be/pj0wXRLXai8
Same for me. I think of the cones of light being formed when I see the rainbow and that amazing lecture.
I feel like this is the one thing that's been missing from all the LLMs: actual visual explainers, whether image or video. python plots only get you so far, and all the Diffusion stuff is nonsensical. This is amazing.
1. Doesn't work at all on Firefox 133.0.3 (64-bit)
2. Works on Chrome 131.0.6778.205 (Official Build) (64-bit)
3. No existing links do anything but a sub second "Generating" which disappears quickly
4. Does not work in Incognito on Chrome 131.0.6778.205 (Official Build) (64-bit)
My prompt kind of worked but ended at 48 seconds
Prompt: "Describe a honeybee learning which flower to land on in a field using Markov Decision Process with Bellman updating":
https://tma.live/video/88f535b5-0e5f-41ca-9bd8-e35e7aa8a95a
I ran it a second time and got a longer video of 1:55 but it primarily just created Text. It also didn't explain Bellman's equation and wrote it incorrectly:
https://tma.live/video/88f535b5-0e5f-41ca-9bd8-e35e7aa8a95a
The second prompt kind of worked but ends at 47 seconds and then loops the final 4 seconds forever.
Prompt: "Describe how the OODA Loop, MDP and SPA learning approaches are the same"
https://tma.live/video/ee7b5048-3fde-4f1a-8ec1-c8bb48883c75
Overall this worked as described. It's more than fast enough, but fails to deliver on consistency and graphics.
A few more iterations and fine tuning and you'll have a solid Beta. I can see this being very useful after another year or so of use and tuning.
Great work and congrats on shipping.
The other issues are bugs with my streaming logic retrying clips which failed to generate. LLMs aren't yet perfect at writing Manim, so to keep things smooth I try to skip clips which fail to render properly. Still also have layout issues which are hard to automatically detect.
I expect with a few more generations of LLM updates, prompt iterating, and better streaming/retrying logic on my end this will become more reliable
I think they use some extremely cheap model for writing the code, probably 4o-mini or similar.
I asked a history question - tell me about Reddy kings rule. It made up a physics rule and started talking about electrons.
Although it is pretty impressive for what an LLM can generate these days.
- The LLM is prompted to generate an explainer video as sequence of small Manim scene segments with corresponding voiceovers
- LLM streams response token-by-token as Server-Sent-Events
- Whenever a complete Manim segment is finished, send it to Modal to start rendering
- Start streaming the rendered partial video files from manim as they are generated via HLS
Btw for some reason on iOS I had to download to view the video