[LG] Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size[UC Berkeley & Microsoft Research]arxiv.org