Bio
We haven't found any bio for you yet.
Researcher Links
Loading links...
Publications by Type
Loading publications…
The last 5 uploaded publications
Locality-aware Fair Scheduling in LLM Serving
Shiyi Cao, Yichuan Wang, Ziming Mao, P.-h.J. Hsu, Liangsheng Yin, Tian Xia, Dacheng Li, Shu Liu, Yuanhang Zhang, Yang Zhou, Ying Sheng, Joseph E. Gonzalez, Ion Stoica (2025). Locality-aware Fair Scheduling in LLM Serving. , DOI: https://doi.org/10.48550/arxiv.2501.14312.
Preprint85 days agoSGLang: Efficient Execution of Structured Language Model Programs
Clark Barrett, Shiyi Cao, Joseph E. Gonzalez, Jeff Huang, Christos Kozyrakis, Ying Sheng, Ion Stoica, Chuyue Sun, Zhiqiang Xie, Liangsheng Yin, Chengtao Yu, Lianmin Zheng (2024). SGLang: Efficient Execution of Structured Language Model Programs. , DOI: https://doi.org/10.52202/079017-2000.
Article85 days agoPost-Training Sparse Attention with Double Sparsity
Shuo Yang, Ying Sheng, Joseph E. Gonzalez, Ion Stoica, Lianmin Zheng (2024). Post-Training Sparse Attention with Double Sparsity. , DOI: https://doi.org/10.48550/arxiv.2408.07092.
Preprint85 days agoMoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaoxuan Liu, Ying Sheng, Joseph E. Gonzalez, Matei Zaharia, Ion Stoica (2024). MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs. , DOI: https://doi.org/10.48550/arxiv.2411.11217.
Preprint85 days agoS-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nick Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica (2023). S-LoRA: Serving Thousands of Concurrent LoRA Adapters. , DOI: https://doi.org/10.48550/arxiv.2311.03285.
Preprint85 days ago