HCCL: Heterogeneity-Enriched Collective Communication Library

The mission of HCCL is to establish a next-generation Heterogeneity-Enriched Collective Communication Library designed for multi-rail, multi-accelerator, and heterogeneous computing systems. This framework aims to significantly improve communication efficiency in modern high-performance computing (HPC) and artificial intelligence (AI) workloads.

By orchestrating data movement across complex and diverse hardware resources, HCCL seeks to eliminate communication bottlenecks and accelerate scientific discovery and large-scale AI applications, advancing state-of-the-art cyberinfrastructure for both academia and industry.

Modeling

Building on communication models such as the α–β model, LogGP, and other state-of-the-art approaches, we aim to develop analytical models and simulation frameworks for collective communication in heterogeneous systems that support AI and HPC workloads.

Analysis & Optimization Tools

To transform the analysis and optimization of collective communication, we are developing novel tools and agentic AI systems that enable autonomous, end-to-end optimization in heterogeneous computing environments.

HCCL

We aim to develop HCCL for heterogeneous systems supporting diverse accelerators, including GPUs and emerging AI hardware, to enable efficient and scalable collective communication across complex environments.

Benchmark & Application Framework

To support real-world deployment of HCCL, we are developing benchmarks and application frameworks that enable systematic evaluation and optimization for large-scale AI and HPC workloads.

Publications

[JCST'26] HEC: Heterogeneity-Enriched Communication for AI Symphony

Xiaoyi Lu

Journal of Computer Science and Technology, 2026 (Invited Paper for the 40th Anniversary of JCST)

[Vision paper]

[ICSE'26] CCLInsight: Unveiling Insights in GPU Collective Communication Libraries via Primitive-Centric Analysis

Liuyao Dai, Adam Weingram, Weicong Chen, and Xiaoyi Lu

Proceedings of ICSE, the ACM/IEEE International Conference on Software Engineering, 2026

[Paper]

[ICS'25] Understanding the Idiosyncrasies of Emerging BlueField DPUs

Arjun Kashyap, Yuke Li, Darren Ng, Xiaoyi Lu

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

[Paper]

[IEEE Micro'24] High-Speed Data Communication with Advanced Networks in Large Language Model Training

Liuyao Dai, Hao Qi, Weicong Chen, Xiaoyi Lu

IEEE Micro, 2024

[Paper]

[IPDPS'24] Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures

Yuke Li, Arjun Kashyap, Weicong Chen, Yanfei Guo, Xiaoyi Lu

Proceedings of IPDPS, the IEEE International Parallel and Distributed Processing Symposium, 2024

[Paper]

[HotI '23] Performance Characterization of Large Language Models on High-Speed Interconnects

Hao Qi, Liuyao Dai, Weicong Chen, Zhen Jia, Xiaoyi Lu

Proceedings of the 30th IEEE Hot Interconnects Symposium, 2023

[Paper]

[JCST '23] xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning

Adam Weingram, Yuke Li, Hao Qi, Darren Ng, Liuyao Dai, Xiaoyi Lu

Journal of Computer Science and Technology, 2023

[Paper]

Interested in Collaboration?

We are too! Contact xiaoyilu (dot) ufl (at) edu, or click the button below to get more information about the project and how you can get involved.

Collaborators

NVIDIA

We gratefully acknowledge NVIDIA for their generous donation of DPUs.

AWS

We gratefully acknowledge AWS for their generous provision of AWS credits.

Google

We gratefully acknowledge Google for their generous provision of Google Cloud credits.

Acknowledgments

NSF Grants

This work is primarily supported by NSF Research Grants.

DOE Grants

This work was partially conducted using resources supported by DOE Research Grants.

University Support

We sincerely thank the University of Florida for its support.

Page updated

Google Sites

Report abuse