Computer Science > Machine Learning

arXiv:2403.07809 (cs)

[Submitted on 12 Mar 2024]

Title:pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Authors:Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

View PDF HTML (experimental)

Abstract:Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce $\textbf{pyvene}$, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. $\textbf{pyvene}$ supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how $\textbf{pyvene}$ provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at this https URL.

Comments:	8 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2403.07809 [cs.LG]
	(or arXiv:2403.07809v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.07809

Submission history

From: Zhengxuan Wu [view email]
[v1] Tue, 12 Mar 2024 16:46:54 UTC (1,242 KB)

Computer Science > Machine Learning

Title:pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators