TRANSFORMER
An intuitive look into how transformer models like LLMs train, learn attention patterns, and refine knowledge via backpropagation, with a nod to stochastic techniques like Monte Carlo methods.
Some notes on how attention heads in a transformer model develop through training, are used in the model and combined to provide final weights.