Google Scholar

Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters

W Jiang, Z He, S Zhang, K Zeng, L Feng… - Proceedings of the 27th …, 2021 - dl.acm.org

W Jiang, Z He, S Zhang, K Zeng, L Feng, J Zhang, T Liu, Y Li, J Zhou, C Zhang, G Alonso

Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data …, 2021•dl.acm.org

We present FleetRec, a high-performance and scalable recommendation inference system
within tight latency constraints. FleetRec takes advantage of heterogeneous hardware
including GPUs and the latest FPGAs equipped with high-bandwidth memory. By
disaggregating computation and memory to different types of hardware and bridging their
connections by high-speed network, FleetRec gains the best of both worlds, and can
naturally scale out by adding nodes to the cluster. Experiments on three production models …

We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs and the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation and memory to different types of hardware and bridging their connections by high-speed network, FleetRec gains the best of both worlds, and can naturally scale out by adding nodes to the cluster. Experiments on three production models up to 114 GB show that FleetRec outperforms optimized CPU baseline by more than one order of magnitude in terms of throughput while achieving significantly lower latency.

ACM Digital Library

Show moreShow less

Save Cite Cited by 55 Related articles All 5 versions

Cite

Advanced search

Saved to My library

Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters