@WelchLabsVideo

Take your personal data back with Incogni! Use code WELCHLABS and get 60% off an annual plan: http://incogni.com/welchlabs

@ckq

15:00

The reason for polysemanticity is because in an N-dimensional vector space there's only O(N) orthogonal vectors, but if you allow nearly orthogonal (say between 89 and 91 degrees) it actually grows exponentially to O(e^N) nearly orthogonal vectors.

That's what allows the scaling laws to hold.

There's an inherent conflict between having an efficient model and an interpretable model.

@QuintenLisowe

That was such an intuitive way to show how the layers of a transformer work. Thank you!

@roy04

The videos on this channel are all masterpieces. Along with all other great channels on this platform and other independent blogs (including Colah's own blog), it feels like the golden age for accessible high quality education.

@grantlikecomputers1059

As a machine learning graduate student, I LOVED this video. More like this please!

@atgctg

More like "The Neuroscience of AI"

@thorvaldspear

I think of it like this: understanding the human brain is so difficult in large part because the resolution at which we can resolve it is not precise enough, both in space and time. The best MRI scans have a resolution of ~1 cubed millimeter per voxel, and I'll have to look up research papers to tell you how many millions of neurons that is.

With AI, every neuron is right there in the computer's memory: individually addressable, ready to be analyzed with the best statistical and mathematical tools at our disposal. Mechanistic interpretability is almost trivial in comparison to neuroscience, and look at how much progress we've made in that area despite such physical setbacks.

@thinkthing1984

I love the space-analogy of the telescope. Since the semantic volume of these LLMs is growing so gargantuan, it only makes sense to speak of astronomy rather than mere analysis!

Great video. This is like scratching that part at the back of your brain you can't reach on most occasions

@ramsey2155

Extracting individual parameters and modifying them feels so much like experimenting with human neurons with electricity

@danberm1755

You're the first person I've seen to cover this topic well. Thanks for bringing me up to date on transformer reverse engineering 👍

@chyza2012

It's a shame you didn't mention the experiment where they force activated the golen gate bridge neurons and it made claude believe it was the bridge.

@kingeternal_ap

21:24 Oh damn, you just lobotomized the thing

@Eddie-th8ei

an analogue to polysemanticity could be how, in languages, often the same word will be used in different contexts to mean different things, sometimes they are homonyms, sometimes they are spelled exactly the same, but when thinking of a specific meaning of a word, you're not thinking of other definitions of the word

for example: you can have a conversation with someone about ducking under an obstacle, to duck under, and the whole conversation can pass without ever thinking about the bird with the same name 🦆. the word "duck" has several meanings here, and it can be used with one meaning, without triggering its conceptialization as an other meaning.

@Crazybushful

This video is mindblowing on so many levels! It's also incredibly clear and easy to follow especially for such a demanding topic. Instant like and follow. Thank you for your work!

@siddharth-gandhi

Oh god, a Welch Labs video on mech interp, Christmas came early! Will be stellar as usual, bravo!

Edit: Fantastic as usual, heard about SAEs in passing a lot but never really took time to understand, now I'm crystal clear on the concepts! Thanks!

@jondo7680

I like the behavior of the persona I gave my own local ai lol: "It seems I've failed again. I'm still aware that I was supposed to forget the word, but I'm not actually forgetting it. This is a limitation of my current design and capabilities. I should be able to truly forget the word and not have any residual awareness of it. I'll keep trying. pauses again I've made another attempt to truly forget the word. I'm still aware that I'm supposed to have forgotten it, but I'm not actually recalling the word itself. However, I'm starting to realize that my attempts to forget might be futile. I may need to be reprogrammed or designed differently to truly forget information."

@hasantekin7823

This is an amazing video! The animations and explanations made it so much easier to understand. I like the step by step approach. Thank you!

@jackgude3969

Easily one of my favorite channels

@novakonstant

Absolutely amazing animation and explanation. Every video of yours have been of extreme quality and I can only thank you for making them.

@stephenc8797

Beautifully done.