Transformers in multivariate time series forecasting: a review

Document Type : Review Paper

Authors

1 Department of Computer Science- Faculty of Mathematical Sciences- Yazd university- Yazd- Iran

2 Department of Computer Science- Faculty of Mathematical Sciences- Yazd University- Yazd- Iran

10.22108/msci.2025.144922.1740

Abstract

Long-term forecasting of multivariate time series is a fundamental challenge in the field of machine learning, with critical applications in numerous domains such as energy, transportation, and financial markets. The inherent complexity of this data, stemming from seasonal patterns, non-stationary trends, and interdependencies among variables, has constrained traditional forecasting methods. The importance of addressing this problem is evident in strategic decision-making, as the accuracy and efficiency of predictions directly impact energy resource management, traffic control, and financial risk analysis.
 
This review article provides a comprehensive framework for analyzing state-of-the-art Transformer-based architectures in the domain of multivariate time series forecasting. The proposed framework is structured around three main aspects: (1) quantitative performance evaluation of models on standard datasets, (2) analysis of computational considerations including time and memory complexity, and (3) examination of the models' capability in handling high-frequency time series. Accordingly, prominent models such as Autoformer, iTransformer, GTformer, FEDformer, ETSformer, and Pathformer are reviewed and comparatively analyzed.
 
The findings of this review indicate that recent innovations, including the application of frequency-domain processing, auto-correlation mechanisms, graph-based learning, and multi-scale architectures, have simultaneously led to improvements in forecasting accuracy and computational efficiency. These advancements outline a clear research path for the development of future architectures with a focus on greater interpretability, better scalability, and broader generalization capabilities.

Keywords

Main Subjects


[1] D. Cao and S. Zhang, AD-autoformer: Decomposition transformers with attention distilling for long sequence time-series forecasting, The Journal of Supercomputing, 2024 1–21.
[2] P. Chen, Y. Zhang, Y. Cheng, Y. Shu, Y. Wang, Q. Wen, B. Yang and C. Guo, Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting, arXiv preprint arXiv:2402.05956, 2024.
[3] W. Chen, W. Wang, B. Peng, Q. Wen, T. Zhou and L. Sun, Learning to rotate: Quaternion transformer for complicated periodical time series forecasting, Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (2022) 146–156.
[4] Y. Chen, S. Liu, J. Yang, H. Jing, W. Zhao and G. Yang, A Joint Time-Frequency Domain Transformer for multivariate time series forecasting, Neural Networks, 176 (2024) 106334.
[5] Z. Chen, M. Ma, T. Li, H. Wang and C. Li, Long sequence time-series forecasting with deep learning: A survey, Information Fusion, 97 (2023) 101819.
[6] R.-G. Cirstea, C. Guo, B. Yang, T. Kieu, X. Dong and S. Pan, Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting–Full Version, arXiv preprint arXiv:2204.13767, 2022.
[7] W. Guan, I. Smetannikov and T. Man, Survey on automatic text summarization and transformer models applicability, Proceedings of the 2020 1st International Conference on Control, Robotics and Intelligent System, (2020) 176–184.
[8] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, et al., A survey on vision transformer, IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 no. 1 (2022) 87–110.
[9] N. Kitaev, Ł. Kaiser and A. Levskaya, Reformer: The efficient transformer, arXiv preprint arXiv:2001.04451,2020.
[10] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang and X. Yan, Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting, Advances in Neural Information Processing Systems, 32 (2019) 5243–5253.
[11] A. Liang, X. Chai, Y. Sun and M. Guizani, GTformer: Graph-Based Temporal-Order-Aware Transformer for Long-Term Series Forecasting, IEEE Internet of Things Journal, (2024) 31467–31478.
[12] B. Lim, S. Ö. Arık, N. Loeff and T. Pfister, Temporal fusion transformers for interpretable multi-horizon time series forecasting, International Journal of Forecasting, 37 no. 4 (2021) 1748–1764.
[13] Y. Lin, I. Koprinska and M. Rana, SSDNet: State space decomposition neural network for time series forecasting, 2021 IEEE International Conference on Data Mining (ICDM), (2021) 370–378.
[14] S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu and S. Dustdar, Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting, The Tenth International Conference on Learning Representations (ICLR), (2022).
[15] X. Liu and W. Wang, Deep Time Series Forecasting Models: A Comprehensive Survey, Mathematics, 12 no. 10 (2024) 1504.
[16] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma and M. Long, iTransformer: Inverted Transformers Are Effective for Time Series Forecasting, The Twelfth International Conference on Learning Representations, (2024).
[17] Y. Liu, H. Wu, J. Wang and M. Long, Non-stationary transformers: Exploring the stationarity in time series forecasting, Advances in Neural Information Processing Systems, 35 (2022) 9881–9893.
[18] Y. Liu, H. Zhang, C. Li, X. Huang, J. Wang and M. Long, Timer: Generative Pre-trained Transformers Are Large Time Series Models, Forty-first International Conference on Machine Learning, (2024).
[19] Q. Ma, Z. Liu, Z. Zheng, Z. Huang, S. Zhu, Z. Yu and J. T. Kwok, A survey on time-series pre-trained models, IEEE Transactions on Knowledge and Data Engineering, (2024) 7536–7555.
[20] J. A. Miller, M. Aldosari, F. Saeed, N. H. Barna, S. Rana, I. B. Arpinar and N. Liu, A survey of deep learning and foundation models for time series forecasting. arXiv 2024, arXiv preprint arXiv:2401.13912, (2024).
[21] Y. Nie, N. H. Nguyen, P. Sinthong and J. Kalagnanam, A time series is worth 64 words: Long-term forecasting with transformers, arXiv preprint arXiv:2211.14730, (2022).
[22] K. Olorunnimbe and H. Viktor, Ensemble of temporal Transformers for financial time series, Journal of Intelligent Information Systems, (2024) 1–25.
[23] X. Piao, Z. Chen, T. Murayama, Y. Matsubara and Y. Sakurai, Fredformer: Frequency debiased transformer for time series forecasting, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (2024) 2400–2410.
[24] A. Shabani, A. Abdi, L. Meng and T. Sylvain, Scaleformer: Iterative multi-scale refining transformers for time series forecasting, arXiv preprint arXiv:2206.04038, (2022).
[25] D. V. da Silva, V. Estevam and D. Menotti, Towards a realistic Libras to Portuguese translation, 2023 36th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), (2023) 1–6.
[26] J. Tong, L. Xie, W. Yang, K. Zhang and J. Zhao, Enhancing time series forecasting: A hierarchical transformer with probabilistic decomposition representation, Information Sciences, 647 (2023) 119410.
[27] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. U. Kaiser and I. Polosukhin, Attention is all you need, Advances in Neural Information Processing Systems, 30 (2017) 1–15.
[28] Y. Wang, H. Wu, J. Dong, Y. Liu, M. Long and J. Wang, Deep time series models: A comprehensive survey and benchmark, arXiv preprint arXiv:2407.13278, (2024).
[29] Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan and L. Sun, Transformers in time series: A survey, arXiv preprint arXiv:2202.07125, (2022).
[30] G. Woo, C. Liu, D. Sahoo, A. Kumar and S. Hoi, CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting, International Conference on Learning Representations, (2022).
[31] G. Woo, C. Liu, D. Sahoo, A. Kumar and S. Hoi, Etsformer: Exponential smoothing transformers for time-series forecasting, arXiv preprint arXiv:2202.01381, (2022).
[32] H. Wu, J. Xu, J. Wang and M. Long, Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, Advances in Neural Information Processing Systems, 34 (2021) 22419–22430.
[33] Y. Yu, R. Ma and Z. Ma, Robformer: A robust decomposition transformer for long-term time series forecasting, Pattern Recognition, 153 (2024) 110552.
[34] A. Zeng, M. Chen, L. Zhang and Q. Xu, Are transformers effective for time series forecasting?, Proceedings of the AAAI Conference on Artificial Intelligence, 37 no. 9 (2023) 11121–11128.
[35] J. Zhang, H. Luan, M. Sun, F. Zhai, J. Xu, M. Zhang and Y. Liu, Improving the Transformer Translation Model with Document-Level Context, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (2018) 533–542.
[36] Y. Zhang, L. Ma, S. Pal, Y. Zhang and M. Coates, Multi-resolution Time-Series Transformer for Long-term Forecasting, International Conference on Artificial Intelligence and Statistics, (2024) 4222–4230.
[37] Y. Zhang and J. Yan, Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting, The Eleventh International Conference on Learning Representations, (2023).
[38] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong and W. Zhang, Informer: Beyond efficient transformer for long sequence time-series forecasting, Proceedings of the AAAI Conference on Artificial Intelligence, 35 no. 12 (2021) 11106–11115.
[39] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun and R. Jin, Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting, International Conference on Machine Learning, (2022) 27268–27286.
[40] S. Bandyopadhyay and S. N. Lahiri, Asymptotic properties of discrete Fourier transforms for spatial data, Sankhyā: The Indian Journal of Statistics, Series A, 71 no. 2 (2009) 221–259.
[41] R. S. Tsay, Analysis of Financial Time Series, 2nd ed., Hoboken, NJ: Wiley-Interscience [John Wiley & Sons], 2005, ISBN 978-0-471-69074-0, doi: 10.1002/0471746193.