Xudong Zhu

I am a PhD student in Computer Science at The Ohio State University, advised by Prof. Zhihui Zhu. My research interests lie in the mechanistic interpretability of large language models, with a current focus on exploring novel tools for feature discovery and representation analysis.

Research Interests

My current research focuses on:

  • Mechanistic Interpretability: Understanding how large language models work internally
  • Sparse Autoencoders (SAEs): Using SAEs for feature discovery and representation analysis
  • Difference-in-Means Techniques: Combining statistical methods with SAEs to better characterize and validate learned features
  • Model Geometry: Exploring the internal geometry and reasoning processes of neural networks

Recent Work

I have been investigating how difference-in-means techniques can be combined with SAEs to better characterize and validate the learned features. My goal is to advance methods that not only improve the interpretability of model internals but also provide insights into their geometry and reasoning processes.

Background

I received my B.S. in Computer Science from the University of Electronic Science and Technology of China in 2024, where I graduated with a GPA of 3.98. I am currently pursuing my Ph.D. at The Ohio State University, expected to complete in 2029.

Contact