The paper introduces "InterPLM," a systematic framework for interpreting protein language models (PLMs) using sparse autoencoders (SAEs). This method successfully extracts thousands of interpretable features from PLMs like ESM-2, revealing biological concepts such as binding sites and functional domains that are stored in superposition within the model's neurons. The research demonstrates that SAE features show significantly stronger alignment with known biological annotations than individual neurons and that larger PLMs capture a broader range of concepts. Furthermore, the framework leverages large language models for automated feature description and validation, showing that feature activations can identify missing database annotations and enable the targeted steering of sequence generation.
References:
* Simon E, Zou J. Interplm: Discovering interpretable features in protein language models via sparse autoencoders, 2024[J]. URL arxiv. org/abs/2412.12101.

SHARE

COMMENT

VOICE_COMMENT

COMMENT_PAGE

CLAP

PICK

VOTE

AI_SUMMARIZE

Sharing research articles, tracking the latest developments

AI_SUMMARIZE_EPISODE

Paper Talk

222-InterPLM: Interpretable Protein Language Models

687d22d6225fdac1efe2ba9e/lhcsc8HqDkrA-yZSBVLE4rE8b89x.m4a