Jiaqi Leng (冷家祺)

prof_pic.jpg

I am an incoming PhD student at NYU and am currently completing my undergraduate studies in Computer Science and Technology at Fudan University.

Under the supervision of Prof. Yucheng Lu, my current research focuses on efficient language modeling, with particular emphasis on byte-level architectures. I previously worked as a research intern at Ant Group, where I studied sparse attention mechanisms for large language models.

My research interests include:

  • Efficient deep learning and model architectures
  • Long-context modeling and length extrapolation
  • Sparse attention mechanisms

In Fall 2024, I was an exchange student at The University of Texas at Austin.

I welcome opportunities for academic collaboration and discussion.


News

Mar 2026 I will be attending ICLR and presenting our work on length-generalizable sparse attention. See you in Brazil!
Mar 2026 I will join NYU as a PhD student in Fall 2026.

Selected Publications

  1. ICLR 2026
    Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
    In Proceedings of the 14th International Conference on Learning Representations, 2026
  2. Preprint
    Distilling Token-Trained Models into Byte-Level Models
    Zishuo Bao, Jiaqi Leng, Junxiong Wang, Bowen Peng, and Yucheng Lu
    arXiv preprint, 2026
  3. NeurIPS 2025
    Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
    Xiang Hu, Jiaqi Leng, Jun Zhao, Kewei Tu, and Wei Wu
    In Proceedings of the 39th Conference on Neural Information Processing Systems, 2025