The mission of HCCL is to establish a next-generation Heterogeneity-Enriched Collective Communication Library designed for multi-rail, multi-accelerator, and heterogeneous computing systems. This framework aims to significantly improve communication efficiency in modern high-performance computing (HPC) and artificial intelligence (AI) workloads.
By orchestrating data movement across complex and diverse hardware resources, HCCL seeks to eliminate communication bottlenecks and accelerate scientific discovery and large-scale AI applications, advancing state-of-the-art cyberinfrastructure for both academia and industry.
Building on communication models such as the α–β model, LogGP, and other state-of-the-art approaches, we aim to develop analytical models and simulation frameworks for collective communication in heterogeneous systems that support AI and HPC workloads.
To transform the analysis and optimization of collective communication, we are developing novel tools and agentic AI systems that enable autonomous, end-to-end optimization in heterogeneous computing environments.
We aim to develop HCCL for heterogeneous systems supporting diverse accelerators, including GPUs and emerging AI hardware, to enable efficient and scalable collective communication across complex environments.
To support real-world deployment of HCCL, we are developing benchmarks and application frameworks that enable systematic evaluation and optimization for large-scale AI and HPC workloads.
[JCST'26] HEC: Heterogeneity-Enriched Communication for AI Symphony
Xiaoyi Lu
Journal of Computer Science and Technology, 2026 (Invited Paper for the 40th Anniversary of JCST)
[ICSE'26] CCLInsight: Unveiling Insights in GPU Collective Communication Libraries via Primitive-Centric Analysis
Liuyao Dai, Adam Weingram, Weicong Chen, and Xiaoyi Lu
Proceedings of ICSE, the ACM/IEEE International Conference on Software Engineering, 2026
[Paper]
[ICS'25] Understanding the Idiosyncrasies of Emerging BlueField DPUs
Arjun Kashyap, Yuke Li, Darren Ng, Xiaoyi Lu
Proceedings of the 39th ACM International Conference on Supercomputing, 2025
[Paper]
[IEEE Micro'24] High-Speed Data Communication with Advanced Networks in Large Language Model Training
Liuyao Dai, Hao Qi, Weicong Chen, Xiaoyi Lu
IEEE Micro, 2024
[Paper]
[IPDPS'24] Accelerating Lossy and Lossless Compression on Emerging BlueField DPU Architectures
Yuke Li, Arjun Kashyap, Weicong Chen, Yanfei Guo, Xiaoyi Lu
Proceedings of IPDPS, the IEEE International Parallel and Distributed Processing Symposium, 2024
[Paper]
[HotI '23] Performance Characterization of Large Language Models on High-Speed Interconnects
Hao Qi, Liuyao Dai, Weicong Chen, Zhen Jia, Xiaoyi Lu
Proceedings of the 30th IEEE Hot Interconnects Symposium, 2023
[Paper]
[JCST '23] xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning
Adam Weingram, Yuke Li, Hao Qi, Darren Ng, Liuyao Dai, Xiaoyi Lu
Journal of Computer Science and Technology, 2023
[Paper]
We gratefully acknowledge NVIDIA for their generous donation of DPUs.
We gratefully acknowledge AWS for their generous provision of AWS credits.
We gratefully acknowledge Google for their generous provision of Google Cloud credits.
This work is primarily supported by NSF Research Grants.
This work was partially conducted using resources supported by DOE Research Grants.
We sincerely thank the University of Florida for its support.