-AI > Sarathi > Flashcards
What percent of compute time during LLM inference is spent on attention?
5-10%
Ffn_ln1 stands for
Feed forward network layer normalization 1