Bio
We haven't found any bio for you yet.
Researcher Links
Loading links...
Publications by Type
Loading publications…
The last 5 uploaded publications
BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Yilong Zhao, Shuo Yang, Kan Zhu, Lianmin Zheng, Baris Kasikci, Yang Zhou, Jiarong Xing, Ion Stoica (2024). BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching. , DOI: https://doi.org/10.48550/arxiv.2411.16102.
Preprint72 days agoSGLang: Efficient Execution of Structured Language Model Programs
Clark Barrett, Shiyi Cao, Joseph E. Gonzalez, Jeff Huang, Christos Kozyrakis, Ying Sheng, Ion Stoica, Chuyue Sun, Zhiqiang Xie, Liangsheng Yin, Chengtao Yu, Lianmin Zheng (2024). SGLang: Efficient Execution of Structured Language Model Programs. , DOI: https://doi.org/10.52202/079017-2000.
Article72 days agoPost-Training Sparse Attention with Double Sparsity
Shuo Yang, Ying Sheng, Joseph E. Gonzalez, Ion Stoica, Lianmin Zheng (2024). Post-Training Sparse Attention with Double Sparsity. , DOI: https://doi.org/10.48550/arxiv.2408.07092.
Preprint72 days agoS-LoRA: Serving Thousands of Concurrent LoRA Adapters
Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nick Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica (2023). S-LoRA: Serving Thousands of Concurrent LoRA Adapters. , DOI: https://doi.org/10.48550/arxiv.2311.03285.
Preprint72 days agoAlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving
Zhuohan Li, Lianmin Zheng, Yinmin Zhong, Vincent Liu, Ying Sheng, Xin Jin, Yanping Huang, Zhifeng Chen, Hao Zhang, Joseph E. Gonzalez, Ion Stoica (2023). AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving. , DOI: https://doi.org/10.48550/arxiv.2302.11665.
Preprint72 days ago