Publications

2026

  1. ICLR 2026
    Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
    Understanding and Improving Length Generalization in Hierarchical Sparse Attention Models
    In Proceedings of the 14th International Conference on Learning Representations, 2026
  2. Preprint
    Distilling Token-Trained Models into Byte-Level Models
    Distilling Token-Trained Models into Byte-Level Models
    Zishuo Bao, Jiaqi Leng, Junxiong Wang, Bowen Peng, and Yucheng Lu
    arXiv preprint, 2026

2025

  1. NeurIPS 2025
    Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
    Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
    Xiang Hu, Jiaqi Leng, Jun Zhao, Kewei Tu, and Wei Wu
    In Proceedings of the 39th Conference on Neural Information Processing Systems, 2025