CV

Xudong Zhu

zhu.3944@osu.edu
Columbus, Ohio, US

Summary

I am a PhD student in Computer Science at The Ohio State University, advised by Prof. Zhihui Zhu. My research focuses on the mechanistic interpretability of large language models, with particular interest in understanding the Linear Representation Hypothesis and the geometric structure of learned representations. I investigate why semantic features and behaviors align with linear directions in representation space, and how this structure enables effective interpretation and control through linear steering. My work combines sparse autoencoders and geometric analysis to uncover and characterize the internal structure of model representations.

Education

  • Ph.D in Computer Science
    2029
    The Ohio State University
  • B.S. in Computer Science
    2024
    University of Electronic Science and Technology of China
    GPA: 3.98

Publications

  • AbsTopK: Rethinking Sparse Autoencoders For Bidirectional Features
    2026
    ICLR 2026
    Sparse autoencoders (SAEs) are widely used for LLM interpretability, but existing variants often impose non-negativity that prevents single features from representing bidirectional concepts. We derive SAE variants from unrolled proximal gradient updates, identify this structural limitation, and propose AbsTopK SAE with magnitude-based hard thresholding. Across four LLMs and seven probing/steering tasks, AbsTopK improves reconstruction and interpretability while enabling single features to encode contrasting concepts. OpenReview: https://openreview.net/forum?id=EEs6I4cO7S
  • From Emergence to Control: Probing and Modulating Self-Reflection in Language Models
    2025
    arxiv 2025
    We study the emergence and control of self-reflection in large language models. Our probing method reveals that pretrained models already contain a latent capacity for reflection, which can be amplified without additional training. By identifying and manipulating a “self-reflection vector” in activation space, we achieve bidirectional control over reflective behavior, improving reasoning accuracy or reducing computation as needed. This work deepens understanding of self-reflection and demonstrates how model internals can enable precise behavioral control.
  • Alleviating subgraph-induced oversmoothing in link prediction via coarse graining
    2025
    Neurocomputing 2025
    We address the oversmoothing problem in link prediction caused by repetitive high-degree nodes across subgraphs. Our method introduces a coarse-graining strategy that merges strongly correlated nodes, yielding more diverse receptive fields and reducing subgraph size. This not only mitigates oversmoothing but also improves scalability and efficiency of GNN-based link prediction.
  • FCDS: Fusing Constituency and Dependency Syntax into Document-Level Relation Extraction
    2024
    Coling 2024
    We introduce FCDS, a document-level relation extraction model that fuses constituency and dependency syntax. By combining sentence-level aggregation from constituency trees with dependency-based graph reasoning, FCDS better captures cross-sentence relations between entities. Experiments across multiple domains show significant performance gains, highlighting the effectiveness of integrating both syntactic views.