FLASHATTENTION: fast and memory-efficient exact attention with IO-awareness
Abstract
Supplementary Material
- Download
- 2.05 MB
References
Index Terms
- FLASHATTENTION: fast and memory-efficient exact attention with IO-awareness
Recommendations
I/O complexity of attention, or how optimal is flashattention?
ICML'24: Proceedings of the 41st International Conference on Machine LearningAttention is at the heart of the popular Transformer architecture, yet suffers from quadratic time and memory complexity. In a recent significant development, FlashAttention shows that the I/O complexity of attention is the true bottleneck in scaling ...
Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA'16Emerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...
Mellow writes: extending lifetime in resistive memories through selective slow write backs
ISCA '16: Proceedings of the 43rd International Symposium on Computer ArchitectureEmerging resistive memory technologies, such as PCRAM and ReRAM, have been proposed as promising replacements for DRAM-based main memory, due to their better scalability, low standby power, and non-volatility. However, limited write endurance is a major ...
Comments
Please enable JavaScript to view thecomments powered by Disqus.Information & Contributors
Information
Published In
- Editors:
- S. Koyejo,
- S. Mohamed,
- A. Agarwal,
- D. Belgrave,
- K. Cho,
- A. Oh
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0