CV
Xudong Zhu
Summary
I am a PhD student in Computer Science at The Ohio State University, advised by Prof. Zhihui Zhu. My research focuses on the mechanistic interpretability of large language models, with particular interest in understanding the Linear Representation Hypothesis and the geometric structure of learned representations. I investigate why semantic features and behaviors align with linear directions in representation space, and how this structure enables effective interpretation and control through linear steering. My work combines sparse autoencoders and geometric analysis to uncover and characterize the internal structure of model representations.
Education
- Ph.D in Computer Science2029The Ohio State University
- B.S. in Computer Science2024University of Electronic Science and Technology of ChinaGPA: 3.98
Publications
- AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features2026ICLR 2026Sparse autoencoders (SAEs) are widely used for LLM interpretability, but existing variants often impose non-negativity that prevents single features from representing bidirectional concepts. We derive SAE variants from unrolled proximal gradient updates, identify this structural limitation, and propose AbsTopK SAE with magnitude-based hard thresholding. Across four LLMs and seven probing/steering tasks, AbsTopK improves reconstruction and interpretability while enabling single features to encode contrasting concepts. OpenReview: https://openreview.net/forum?id=EEs6I4cO7S
- From Emergence to Control: Probing and Modulating Self-Reflection in Language Models2025arxiv 2025We study the emergence and control of self-reflection in large language models. Our probing method reveals that pretrained models already contain a latent capacity for reflection, which can be amplified without additional training. By identifying and manipulating a “self-reflection vector” in activation space, we achieve bidirectional control over reflective behavior, improving reasoning accuracy or reducing computation as needed. This work deepens understanding of self-reflection and demonstrates how model internals can enable precise behavioral control.
- Alleviating subgraph-induced oversmoothing in link prediction via coarse graining2025Neurocomputing 2025We address the oversmoothing problem in link prediction caused by repetitive high-degree nodes across subgraphs. Our method introduces a coarse-graining strategy that merges strongly correlated nodes, yielding more diverse receptive fields and reducing subgraph size. This not only mitigates oversmoothing but also improves scalability and efficiency of GNN-based link prediction.
- FCDS: Fusing Constituency and Dependency Syntax into Document-Level Relation Extraction2024Coling 2024We introduce FCDS, a document-level relation extraction model that fuses constituency and dependency syntax. By combining sentence-level aggregation from constituency trees with dependency-based graph reasoning, FCDS better captures cross-sentence relations between entities. Experiments across multiple domains show significant performance gains, highlighting the effectiveness of integrating both syntactic views.