Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

TinyPredNet: A Lightweight Framework for Satellite Image Sequence Prediction

Published: 22 January 2024 Publication History

Abstract

Satellite image sequence prediction aims to precisely infer future satellite image frames with historical observations, which is a significant and challenging dense prediction task. Though existing deep learning models deliver promising performance for satellite image sequence prediction, the methods suffer from quite expensive training costs, especially in training time and GPU memory demand, due to the inefficiently modeling for temporal variations. This issue seriously limits the lightweight application in satellites such as space-borne forecast models. In this article, we propose a lightweight prediction framework TinyPredNet for satellite image sequence prediction, in which a spatial encoder and decoder model the intra-frame appearance features and a temporal translator captures inter-frame motion patterns. To efficiently model the temporal evolution of satellite image sequences, we carefully design a multi-scale temporal-cascaded structure and a channel attention-gated structure in the temporal translator. Comprehensive experiments are conducted on FengYun-4A (FY-4A) satellite dataset, which show that the proposed framework achieves very competitive performance with much lower computation cost compared to state-of-the-art methods. In addition, corresponding interpretability experiments are conducted to show how our designed structures work. We believe the proposed method can serve as a solid lightweight baseline for satellite image sequence prediction.

References

[1]
Md Zahangir Alom, Tarek M. Taha, Christopher Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Brian C. Van Esesn, Abdul A. S. Awwal, and Vijayan K. Asari. 2018. The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv:1803.01164. Retrieved from https://arxiv.org/abs/1803.01164
[2]
Ahmad O. Aseeri. 2023. Effective RNN-based forecasting methodology design for improving short-term power load forecasts: Application to large-scale power-grid time series. Journal of Computational Science 68 (2023), 101984.
[3]
Cong Bai, Minjing Zhang, Jinglin Zhang, Jianwei Zheng, and Shengyong Chen. 2022. LSCIDMR: Large-scale satellite cloud image database for meteorological research. IEEE Transactions on Cybernetics 52, 11 (2022), 12538–12550.
[4]
Nicolas Ballas, Li Yao, Chris Pal, and Aaron C. Courville. 2016. Delving deeper into convolutional networks for learning video representations. In Proceedings of the International Conference on Learning Representations.
[5]
Vitus Benson, Christian Requena-Mesa, Claire Robin, Lazaro Alonso, José Cortés, Zhihan Gao, Nora Linscheid, Mélanie Weynants, and Markus Reichstein. 2023. Forecasting localized weather impacts on vegetation as seen from space with meteo-guided video prediction. arXiv:2303.16198. Retrieved from https://arxiv.org/abs/2303.16198
[6]
Jean-Yves Bouguet. 2001. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation 5, 1–10 (2001), 4.
[7]
Zheng Chang, Xinfeng Zhang, Shanshe Wang, Siwei Ma, and Wen Gao. 2022. STRPM: A spatiotemporal residual predictive model for high-resolution video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13946–13955.
[8]
Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, and S.-H. Gary Chan. 2023. Run, don’t walk: chasing higher FLOPS for faster neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12021–12031.
[9]
Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1724–1734.
[10]
Kuai Dai, Xutao Li, Chi Ma, Shenyuan Lu, Yunming Ye, Di Xian, Lin Tian, and Danyu Qin. 2023. Learning spatial-temporal consistency for satellite image sequence prediction. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–17.
[11]
Kuai Dai, Xutao Li, Yunming Ye, Shanshan Feng, Danyu Qin, and Rui Ye. 2022. MSTCGAN: Multiscale time conditional generative adversarial network for long-term satellite image sequence prediction. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1–16.
[12]
Kuai Dai, Chi Ma, Zhaolin Wang, Yongshen Long, Xutao Li, Shanshan Feng, and Yunming Ye. 2023. Exploiting spatial-temporal dynamics for satellite image sequence prediction. IEEE Geoscience and Remote Sensing Letters 20 (2023), 1–5.
[13]
Lasse Espeholt, Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Jason Hickey, Aaron Bell, and Nal Kalchbrenner. 2021. Skillful twelve hour precipitation forecasts using large context neural networks. arXiv:2111.07470. Retrieved from https://arxiv.org/abs/2111.07470
[14]
Zhangyang Gao, Cheng Tan, Lirong Wu, and Stan Z. Li. 2022. SimVP: Simpler yet better video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3170–3180.
[15]
Yangliao Geng, Qingyong Li, Tianyang Lin, Jing Zhang, Liangtao Xu, Wen Yao, Dong Zheng, Weitao Lyu, and Heng Huang. 2020. A heterogeneous spatiotemporal network for lightning prediction. In Proceedings of the IEEE International Conference on Data Mining. 1034–1039.
[16]
Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv:1308.0850. Retrieved from https://arxiv.org/abs/1308.0850
[17]
Vincent Le Guen and Nicolas Thome. 2020. Disentangling physical dynamics from unknown factors for unsupervised video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11474–11484.
[18]
John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, and Bryan Catanzaro. 2022. Adaptive fourier neural operators: Efficient token mixers for transformers. In Proceedings of the International Conference on Learning Representations.
[19]
Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, and Tim Salimans. 2019. Axial attention in multidimensional transformers. arXiv:1912.12180. Retrieved from https://arxiv.org/abs/1912.12180
[20]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https://arxiv.org/abs/1704.04861
[21]
Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2017. Flownet 2.0: Evolution of optical flow estimation with deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2462–2470.
[22]
Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, and Duc Le. 2021. Improved neural language model fusion for streaming recurrent neural network transducer. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 7333–7337.
[23]
Ryan Lagerquist, Jebb Q. Stewart, Imme Ebert-Uphoff, and Christina Kumler. 2021. Using deep learning to nowcast the spatial coverage of convection from Himawari-8 satellite data. Monthly Weather Review 149, 12 (2021), 3897–3921.
[24]
Jae-Hyeok Lee, Sangmin S. Lee, Hak Gu Kim, Sa-Kwang Song, Seongchan Kim, and Yong Man Ro. 2019. Mcsip net: Multichannel satellite image prediction via deep neural network. IEEE Transactions on Geoscience and Remote Sensing 58, 3 (2019), 2212–2224.
[25]
Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, and Yong Man Ro. 2021. Video prediction recalling long-term motion context via memory alignment learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3054–3063.
[26]
Zhihui Lin, Maomao Li, Zhuobin Zheng, Yangyang Cheng, and Chun Yuan. 2020. Self-attention convlstm for spatiotemporal prediction. In Proceedings of the AAAI Conference on Artificial Intelligence. 11531–11538.
[27]
Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, and Yixuan Yuan. 2023. EfficientViT: Memory efficient vision transformer with cascaded group attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14420–14430.
[28]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10012–10022.
[29]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision. 116–131.
[30]
Min Min, Chen Bai, Jianping Guo, Fenglin Sun, Chao Liu, Fu Wang, Hui Xu, Shihao Tang, Bo Li, Di Di, Lixin Dong, and Jun Li. 2019. Estimating summertime precipitation from himawari-8 and global forecast system based on machine learning. IEEE Transactions on Geoscience and Remote Sensing 57, 5 (2019), 2557–2570.
[31]
Jie Nie, Lei Huang, Chengyu Zheng, Xiaowei Lv, and Rui Wang. 2023. Cross-scale graph interaction network for semantic segmentation of remote sensing images. ACM Transactions on Multimedia Computing, Communications and Applications 19, 6 (2023), 18 pages.
[32]
Shuliang Ning, Mengcheng Lan, Yanran Li, Chaofeng Chen, Qian Chen, Xunlai Chen, Xiaoguang Han, and Shuguang Cui. 2023. MIMO is all you need: A strong multi-in-multi-out baseline for video prediction. In Proceedings of the AAAI Conference on Artificial Intelligence.
[33]
Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, and Animashree Anandkumar. 2022. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv:2202.11214. Retrieved from https://arxiv.org/abs/2202.11214
[34]
Suman Ravuri, Karel Lenc, Matthew Willson, Dmitry Kangin, Remi Lam, Piotr Mirowski, Megan Fitzsimons, Maria Athanassiadou, Sheleem Kashem, Sam Madge, Rachel Prudden, Amol Mandhane, Aidan Clark, Andrew Brock, Karen Simonyan, Raia Hadsell, Niall Robinson, Ellen Clancy, Alberto Arribas, and Shakir Mohamed. 2021. Skilful precipitation nowcasting using deep generative models of radar. Nature 597, 7878 (2021), 672–677.
[35]
Xinming Ren, Huaxi Gu, and Wenting Wei. 2021. Tree-RNN: Tree structural recurrent neural network for network traffic classification. Expert Systems with Applications 167 (2021), 114363.
[36]
Christian Requena-Mesa, Vitus Benson, Markus Reichstein, Jakob Runge, and Joachim Denzler. 2021. EarthNet2021: A large-scale dataset and challenge for earth surface forecasting as a guided video prediction task. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 1132–1142.
[37]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4510–4520.
[38]
Christian Schön, Jens Dittrich, and Richard Müller. 2019. The error is the feature: How to forecast lightning using a model prediction error. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2979–2988.
[39]
Minseok Seo, Hakjin Lee, Doyi Kim, and Junghoon Seo. 2023. Implicit stacked autoregressive model for video prediction. arXiv:2303.07849. Retrieved from https://arxiv.org/abs/2303.07849
[40]
X. J. Shi, Z. R. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems. 802–810.
[41]
X. J. Shi, Z. H. Gao, L. Lausen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo. 2017. Deep learning for precipitation nowcasting: A benchmark and a new model. In Proceedings of the Advances in Neural Information Processing Systems. 5617–5627.
[42]
Bipasha Paul Shukla, Chandra M. Kishtawal, and Pradip K. Pal. 2013. Prediction of satellite image sequence for weather nowcasting using cluster-based spatiotemporal regression. IEEE Transactions on Geoscience and Remote Sensing 52, 7 (2013), 4155–4160.
[43]
Bipasha Paul Shukla, C. M. Kishtawal, and Pradip K. Pal. 2014. Prediction of satellite image sequence for weather nowcasting using cluster-based spatiotemporal regression. IEEE Transactions on Geoscience and Remote Sensing 52, 7 (2014), 4155–4160.
[44]
Bipasha Paul Shukla, Pradip K. Pal, and Prakash C. Joshi. 2011. Extrapolation of sequence of geostationary satellite images for weather nowcasting. IEEE Geoscience and Remote Sensing Letters 8, 2 (2011), 216–219.
[45]
Jiahao Su, Wonmin Byeon, Jean Kossaifi, Furong Huang, Jan Kautz, and Anima Anandkumar. 2020. Convolutional tensor-train LSTM for spatio-temporal learning. In Proceedings of the Advances in Neural Information Processing Systems. 13714–13726.
[46]
Mingzhen Sun, Weining Wang, Xinxin Zhu, and Jing Liu. 2023. MOSO: Decomposing motion, scene and object for video prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18727–18737.
[47]
Cheng Tan, Zhangyang Gao, Siyuan Li, and Stan Z. Li. 2022. Simvp: Towards simple yet powerful spatiotemporal predictive learning. arXiv:2211.12509. Retrieved from https://arxiv.org/abs/2211.12509
[48]
Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2820–2828.
[49]
Xiaochuan Tang, Mingzhe Liu, Hao Zhong, Yuanzhen Ju, Weile Li, and Qiang Xu. 2021. MILL: Channel attention-based deep multiple instance learning for landslide recognition. ACM Transactions on Multimedia Computing, Communications and Applications 17, 2s (2021), 11 pages.
[50]
Zachary Teed and Jia Deng. 2020. RAFT: Recurrent all-pairs field transforms for optical flow. In Proceedings of the European Conference on Computer Vision. 402–419.
[51]
Kevin Trebing, Tomasz Staczyk, and Siamak Mehrkanoon. 2021. SmaAt-UNet: Precipitation nowcasting using a small attention-UNet architecture. Pattern Recognition Letters 145 (2021), 178–186.
[52]
Ching-Hao Wang, Kang-Yang Huang, Yi Yao, Jun-Cheng Chen, Hong-Han Shuai, and Wen-Huang Cheng. 2022. Lightweight deep learning: An overview. IEEE Consumer Electronics Magazine (2022), 1–12.
[53]
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, and Liang-Chieh Chen. 2020. Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In Proceedings of the European Conference on Computer Vision. 108–126.
[54]
Yunbo Wang, Zhifeng Gao, Mingsheng Long, Jianmin Wang, and S. Yu Philip. 2018. Predrnn++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. In Proceedings of the International Conference on Machine Learning. 5123–5132.
[55]
Yunbo Wang, Jianjin Zhang, Hongyu Zhu, Mingsheng Long, Jianmin Wang, and Philip S. Yu. 2019. Memory in memory: A predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9154–9162.
[56]
Y. B. Wang, M. S. Long, J. M. Wang, Z. F. Gao, and P. S. Yu. 2017. PredRNN: Recurrent neural networks for predictive learning using spatiotemporal LSTMs. In Proceedings of the Advances in Neural Information Processing Systems. 879–888.
[57]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10734–10742.
[58]
Haixu Wu, Zhiyu Yao, Mingsheng Long, and Jianmin Wan. 2021. MotionRNN: A flexible model for video prediction with spacetime-varying motions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15435–15444.
[59]
Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, and Dacheng Tao. 2022. GMFlow: Learning optical flow via global matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8121–8130.
[60]
Xiaolong Xu, Zijie Fang, Lianyong Qi, Xuyun Zhang, Qiang He, and Xiaokang Zhou. 2021. TripRes: Traffic flow prediction driven resource reservation for multimedia IoV with edge computing. ACM Transactions on Multimedia Computing, Communications and Applications 17, 2 (2021), 21 pages.
[61]
Z. Xu, J. Du, J. J. Wang, C. X. Jiang, and Y. Ren. 2019. Satellite image prediction relying on GAN and LSTM neural networks. In Proceedings of the IEEE International Conference on Communications. 1–6.
[62]
Ziru Xu, Yunbo Wang, Mingsheng Long, Jianmin Wang, and M KLiss. 2018. PredCNN: Predictive learning with cascade convolutions. In Proceedings of the International Joint Conference on Artificial Intelligence. 2940–2947.
[63]
Wilson Yan, Yunzhi Zhang, Pieter Abbeel, and Aravind Srinivas. 2021. Videogpt: Video generation using vq-vae and transformers. arXiv:2104.10157. Retrieved from https://arxiv.org/abs/2104.10157
[64]
Wei Yu, Yichao Lu, Steve Easterbrook, and Sanja Fidler. 2020. Efficient and information-preserving future frame prediction and beyond. In Proceedings of the International Conference on Learning Representations.
[65]
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2022. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5728–5739.
[66]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6848–6856.
[67]
Yuchen Zhang, Mingsheng Long, Kaiyuan Chen, Lanxiang Xing, Ronghua Jin, Michael I. Jordan, and Jianmin Wang. 2023. Skilful nowcasting of extreme precipitation with NowcastNet. Nature 619, 7970 (2023), 526–532.
[68]
Dongxiaoyuan Zhao, Qiong Wang, Jinglin Zhang, and Cong Bai. 2023. Mine diversified contents of multispectral cloud images along with geographical information for multilabel classification. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1–15.
[69]
Chengyu Zheng, Ning Song, Ruoyu Zhang, Lei Huang, Zhiqiang Wei, and Jie Nie. 2023. Scale-semantic joint decoupling network for image-text retrieval in remote sensing. ACM Transactions on Multimedia Computing, Communications and Applications 20, 1 (2023), 20.

Cited By

View all
  • (2024)MEHGNet: a multi-feature extraction and high-resolution generative network for satellite cloud image sequence predictionEarth Science Informatics10.1007/s12145-024-01432-117:5(4931-4948)Online publication date: 6-Aug-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 5
May 2024
650 pages
EISSN:1551-6865
DOI:10.1145/3613634
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2024
Online AM: 28 December 2023
Accepted: 23 December 2023
Revised: 27 September 2023
Received: 16 July 2023
Published in TOMM Volume 20, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Satellite image sequence prediction
  2. deep learning
  3. lightweight
  4. interpretability

Qualifiers

  • Research-article

Funding Sources

  • Shenzhen Science and Technology Program
  • FengYun Application Pioneering Project
  • Science and Technology Innovation Team Project of Guangdong Meteorological Bureau
  • Innovation and Development Project of China Meteorological Administration
  • NSFC

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)377
  • Downloads (Last 6 weeks)50
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MEHGNet: a multi-feature extraction and high-resolution generative network for satellite cloud image sequence predictionEarth Science Informatics10.1007/s12145-024-01432-117:5(4931-4948)Online publication date: 6-Aug-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media