Back to Foundation TrackIn Progress
Project P11
Mini Attention Notebook
Concept: Dot Product Self-Attention Calculations
Problem Statement
Self-attention calculations are frequently treated as opaque library calls, making it hard to debug vector representations.
What Concept It Teaches
It teaches Query/Key/Value projection transforms, scaling factors normalization, and Softmax attention mapping.
Why This Matters
Self-attention is the core mathematical mechanism driving sequence modeling in transformers, determining how tokens attend to each other.
System Architecture
Dynamic math workbook running forward projections, scaling matrix outputs, and plotting soft heatmaps.
Character Tokens ArrayQKV Projections WeightsQK Dot Product ScoreScaling Normalized MatrixAttention Weights Heatmap
Execution Data Flow
- 1. User enters 3 token words.
- 2. System displays projection matrix weights values.
- 3. Matrix multiplication resolves alignment scores.
- 4. Softmax displays connection strengths.
Tech Stack
ReactTypeScriptMath.js
Implementation Plan
- 1.1. Write custom matrix multiplication functions.
- 2.2. Build dynamic coordinate grid visualizers.
- 3.3. Animate cell changes during matrix multiplications.
Technical Interview Defense
Defense Concept:
Why is self-attention scaled by the square root of key dimension dimension sizes?
Verification Audit
Repository Checked: Yes
Repository Exists: Yes
Live Demo Verified: Yes
Demo Exists: Yes