
Jason Zhang
Hi! I am a Computer Science student at Stanford University with a deep love for AI research. Currently a member of technical staff at Stanford's NLP Group, advised by Zhengxuan Wu and Hao Zhu. Recently, I've been focusing on model interpretability and social intelligence.
Featured Projects
Uncovering Latent CoT Vectors in Language Models[ICLR WSL 2025]
Applied Steering Vectors towards Chain of Thought Thinking. Show that steered systems can be steered towards CoT structure while maintaining competitive performance on reasoning benchmarks. Read arXiv preprint here.
The Structural Safety Generalization Problem [Neurips SafeGenAI 2024]
Introduce new subclass of AI Safety problems - failure of current safety techniques to generalize over structure, despite semantic equivalence. Read here.
Empirical Insights into Feature Geometry in Sparse Autoencoders [Less Wrong]
Interpretability Research with Sparse Autoencoders (SAEs) conducted under Zhengxuan Wu in Chris Pott's Lab. present the first demonstration of the lack of geometric relationships between semantically related concepts in the Feature Space of SAEs. Read here.
Building Better Benchmarks: Towards Standardized AI Evaluation
Blog post regarding how the field of benchmarking can move forward. TLDR: we need standardization. Read here.