Attention Programs

~0 tokens 0/160 chars

Model

Layer

Head

computing···

enter a sentence · select a layer and head · press search

best fit program

—

Real Activation

Program

program definition

curious what to search for?

BERT

Strong fits

L2 H0 next token L2 H9 next token L1 H6 cls L0 H0 uniform L0 H4 uniform L8 H2 eos L5 H6 special token L7 H11 special token

Novel programs

L9 H9 coreference resolution L4 H3 coordination attention L5 H1 pronoun reference L8 H5 sentence initial dominance

GPT-2

Strong fits

L5 H1 cls L7 H2 cls L6 H9 cls L4 H11 previous token L0 H1 repeated

Novel programs

L0 H1 conjunction resolution L3 H0 semantic grouping L3 H4 semantic grouping L0 H5 conjunction resolution