Project P11

Mini Attention Notebook

Concept: Dot Product Self-Attention Calculations

Problem Statement

Self-attention calculations are frequently treated as opaque library calls, making it hard to debug vector representations.

What Concept It Teaches

It teaches Query/Key/Value projection transforms, scaling factors normalization, and Softmax attention mapping.

Why This Matters

Self-attention is the core mathematical mechanism driving sequence modeling in transformers, determining how tokens attend to each other.

System Architecture

Dynamic math workbook running forward projections, scaling matrix outputs, and plotting soft heatmaps.

Character Tokens ArrayQKV Projections WeightsQK Dot Product ScoreScaling Normalized MatrixAttention Weights Heatmap

Execution Data Flow

  • 1. User enters 3 token words.
  • 2. System displays projection matrix weights values.
  • 3. Matrix multiplication resolves alignment scores.
  • 4. Softmax displays connection strengths.

Tech Stack

ReactTypeScriptMath.js

Implementation Plan

  • 1.1. Write custom matrix multiplication functions.
  • 2.2. Build dynamic coordinate grid visualizers.
  • 3.3. Animate cell changes during matrix multiplications.

Technical Interview Defense

Defense Concept:

Why is self-attention scaled by the square root of key dimension dimension sizes?

Source & Deployment Links

GitHub Repo:GitHub
Live Demo:Live Demo
Verification Audit
Repository Checked: Yes
Repository Exists: Yes
Live Demo Verified: Yes
Demo Exists: Yes