15:00 The reason for polysemanticity is because in an N-dimensional vector space there's only O(N) orthogonal vectors, but if you allow nearly orthogonal (say between 89 and 91 degrees) it actually grows exponentially to O(e^N) nearly orthogonal vectors. That's what allows the scaling laws to hold. There's an inherent conflict between having an efficient model and an interpretable model.
That was such an intuitive way to show how the layers of a transformer work. Thank you!
The videos on this channel are all masterpieces. Along with all other great channels on this platform and other independent blogs (including Colah's own blog), it feels like the golden age for accessible high quality education.
As a machine learning graduate student, I LOVED this video. More like this please!
More like "The Neuroscience of AI"
I think of it like this: understanding the human brain is so difficult in large part because the resolution at which we can resolve it is not precise enough, both in space and time. The best MRI scans have a resolution of ~1 cubed millimeter per voxel, and I'll have to look up research papers to tell you how many millions of neurons that is. With AI, every neuron is right there in the computer's memory: individually addressable, ready to be analyzed with the best statistical and mathematical tools at our disposal. Mechanistic interpretability is almost trivial in comparison to neuroscience, and look at how much progress we've made in that area despite such physical setbacks.
I love the space-analogy of the telescope. Since the semantic volume of these LLMs is growing so gargantuan, it only makes sense to speak of astronomy rather than mere analysis! Great video. This is like scratching that part at the back of your brain you can't reach on most occasions
Extracting individual parameters and modifying them feels so much like experimenting with human neurons with electricity
You're the first person I've seen to cover this topic well. Thanks for bringing me up to date on transformer reverse engineering 👍
It's a shame you didn't mention the experiment where they force activated the golen gate bridge neurons and it made claude believe it was the bridge.
21:24 Oh damn, you just lobotomized the thing
an analogue to polysemanticity could be how, in languages, often the same word will be used in different contexts to mean different things, sometimes they are homonyms, sometimes they are spelled exactly the same, but when thinking of a specific meaning of a word, you're not thinking of other definitions of the word for example: you can have a conversation with someone about ducking under an obstacle, to duck under, and the whole conversation can pass without ever thinking about the bird with the same name 🦆. the word "duck" has several meanings here, and it can be used with one meaning, without triggering its conceptialization as an other meaning.
This video is mindblowing on so many levels! It's also incredibly clear and easy to follow especially for such a demanding topic. Instant like and follow. Thank you for your work!
Oh god, a Welch Labs video on mech interp, Christmas came early! Will be stellar as usual, bravo! Edit: Fantastic as usual, heard about SAEs in passing a lot but never really took time to understand, now I'm crystal clear on the concepts! Thanks!
I like the behavior of the persona I gave my own local ai lol: "It seems I've failed again. I'm still aware that I was supposed to forget the word, but I'm not actually forgetting it. This is a limitation of my current design and capabilities. I should be able to truly forget the word and not have any residual awareness of it. I'll keep trying. pauses again I've made another attempt to truly forget the word. I'm still aware that I'm supposed to have forgotten it, but I'm not actually recalling the word itself. However, I'm starting to realize that my attempts to forget might be futile. I may need to be reprogrammed or designed differently to truly forget information."
This is an amazing video! The animations and explanations made it so much easier to understand. I like the step by step approach. Thank you!
Easily one of my favorite channels
Absolutely amazing animation and explanation. Every video of yours have been of extreme quality and I can only thank you for making them.
Beautifully done.
@WelchLabsVideo