Raw Data Library
About
Aims and ScopeAdvisory Board Members
More
Who We Are?
User Guide
Green Science
​
​
EN
Kurumsal BaşvuruSign inGet started
​
​

About
Aims and ScopeAdvisory Board Members
More
Who We Are?
User GuideGreen Science

Language

Kurumsal Başvuru

Sign inGet started
RDL logo

Verified research datasets. Instant access. Built for collaboration.

Navigation

About

Aims and Scope

Advisory Board Members

More

Who We Are?

Contact

Add Raw Data

User Guide

Legal

Privacy Policy

Terms of Service

Support

Got an issue? Email us directly.

Email: info@rawdatalibrary.netOpen Mail App
​
​

© 2026 Raw Data Library. All rights reserved.
PrivacyTermsContact
  1. Raw Data Library
  2. /
  3. Publications
  4. /
  5. The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution

Verified authors • Institutional access • DOI aware
50,000+ researchers120,000+ datasets90% satisfaction
Preprint
en
2025

The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution

0 Datasets

0 Files

en
2025
DOI: 10.48550/arxiv.2501.12407arxiv.org/abs/2501.12407

Get instant academic access to this publication’s datasets.

Create free accountHow it works

Frequently asked questions

Is access really free for academics and students?

Yes. After verification, you can browse and download datasets at no cost. Some premium assets may require author approval.

How is my data protected?

Files are stored on encrypted storage. Access is restricted to verified users and all downloads are logged.

Can I request additional materials?

Yes, message the author after sign-up to request supplementary files or replication code.

Advance your research today

Join 50,000+ researchers worldwide. Get instant access to peer-reviewed datasets, advanced analytics, and global collaboration tools.

Get free academic accessLearn more
✓ Immediate verification • ✓ Free institutional access • ✓ Global collaboration
Access Research Data

Join our academic network to download verified datasets and collaborate with researchers worldwide.

Get Free Access
Institutional SSO
Secure
This PDF is not available in different languages.
No localized PDFs are currently available.
Ion Stoica
Ion Stoica

University of California, Berkeley

Verified
Frank Sifei Luan
Ziming Mao
R. Wang
+11 more

Abstract

While ML model training and inference are both GPU-intensive, CPU-based data processing is often the bottleneck. Distributed data processing systems based on the batch or stream processing models assume homogeneous resource requirements. They excel at CPU-based computation but either under-utilize heterogeneous resources or impose high overheads on failure and reconfiguration. We introduce the streaming batch model, a hybrid of batch and streaming that enables efficient and fault-tolerant heterogeneous execution. The key idea is to use partitions as the unit of execution to achieve elasticity, but to allow partitions to be dynamically created and streamed between heterogeneous operators for memory-efficient pipelining. We present Ray Data, a streaming batch system that improves throughput on heterogeneous batch inference pipelines by 2.5-12$\times$ compared to traditional batch and stream processing systems. By leveraging heterogeneous clusters, Ray Data improves training throughput for multimodal models such as Stable Diffusion by 31% compared to single-node ML data loaders.

How to cite this publication

Frank Sifei Luan, Ziming Mao, R. Wang, Chi‐Wei Lin, Amog Kamsetty, Hao Chen, Cheng Su, Balaji Veeramani, Scott Lee, SangBin Cho, Clark Zinzow, Eric Liang, Ion Stoica, Stephanie Wang (2025). The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution. , DOI: https://doi.org/10.48550/arxiv.2501.12407.

Related publications

Why join Raw Data Library?

Quality

Datasets shared by verified academics with rich metadata and previews.

Control

Authors choose access levels; downloads are logged for transparency.

Free for Academia

Students and faculty get instant access after verification.

Publication Details

Type

Preprint

Year

2025

Authors

14

Datasets

0

Total Files

0

Language

en

DOI

https://doi.org/10.48550/arxiv.2501.12407

Join Research Community

Access datasets from 50,000+ researchers worldwide with institutional verification.

Get Free Access