TRANSFORMER

Thinking in Vectors: How Transformer Models Learn

An intuitive look into how transformer models like LLMs train, learn attention patterns, and refine knowledge via backpropagation, with a nod to stochastic techniques like Monte Carlo methods.

April 13, 2025

Attention Heads

Some notes on how attention heads in a transformer model develop through training, are used in the model and combined to provide final weights.

January 12, 2025