mechanistic interpretability
Attention Programs
causal hypothesis testing · transformer heads
⚄
Search
~0 tokens
0/160 chars
Model
GPT-2
BERT
Layer
Head
↻ Again
computing···
enter a sentence · select a layer and head · press search
best fit program
—
similarity:
Real Activation
Program
0
1
⌥ view program definition
program definition
✕
curious what to search for?
BERT
Strong fits
L2 H0
next token
L2 H9
next token
L1 H6
cls
L0 H0
uniform
L0 H4
uniform
L8 H2
eos
L5 H6
special token
L7 H11
special token
Novel programs
L9 H9
coreference resolution
L4 H3
coordination attention
L5 H1
pronoun reference
L8 H5
sentence initial dominance
GPT-2
Strong fits
L5 H1
cls
L7 H2
cls
L6 H9
cls
L4 H11
previous token
L0 H1
repeated
Novel programs
L0 H1
conjunction resolution
L3 H0
semantic grouping
L3 H4
semantic grouping
L0 H5
conjunction resolution