Cited By
View all- Kim SSim EShin YCho YBaek W(2024)Activation Sequence Caching: High-Throughput and Memory-Efficient Generative Inference with a Single GPUProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676945(78-90)Online publication date: 14-Oct-2024
- Panda DChaudhary VFosler‐Lussier EMachiraju RMajumdar APlale BRamnath RSadayappan PSavardekar NTomko K(2024)Creating intelligent cyberinfrastructure for democratizing AIAI Magazine10.1002/aaai.1216645:1(22-28)Online publication date: 10-Mar-2024