A COMPARATIVE ANALYSIS OF PERFORMANCE AND ACCURACY AMONG CNN, LSTM, RNN, GRU, AND GAN ARCHITECTURES ON MNIST DATASET, AND CIFAR-10 DATASET
DOI:
https://doi.org/10.61841/b3k8gh96Keywords:
Deep Learning, Image Classification, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Units (GRU), Generative Adversarial Networks (GAN), MNIST, CIFAR-10, Performance EvaluationAbstract
Image categorization has been transformed by deep learning architectures, yet thorough comparisons between models are still essential for directing methodological decisions. Five well-known neural network architectures—Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), Recurrent Neural Networks (RNN), Gated Recurrent Units (GRU), and Generative Adversarial Networks (GAN)—are systematically and rigorously compared in this study using the popular MNIST and CIFAR-10 datasets. A variety of performance indicators, such as accuracy, precision, recall, F1-score, and training duration, are used to evaluate models that use consistent data preprocessing, augmentation methods, and architecture-specific hyperparameter tuning.
With an F1-score of 0.79 on CIFAR-10 and a test accuracy of 99.27% on MNIST, CNN beats the other architectures, according to the results, demonstrating its efficacy in spatial feature extraction. With test accuracies of 47.89% and 10.00%, respectively, LSTM and RNN models perform poorly on these tasks, although GRU exhibits modest performance improvements. Notably, the GAN, which is mostly intended for generative tasks, shows promise when modified for classification with a reasonable F1-score of 0.57 on CIFAR-10.
This thorough comparison clarifies the relative advantages and disadvantages of each architecture under uniform experimental settings, providing practitioners and researchers with important information to help them choose the best deep learning models for a range of intelligent systems applications. The results also point to areas for further research on transfer learning, real-world deployment, and model resilience.
References
Alzubaidi, M. A., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8(1), 53. https://doi.org/10.1186/s40537-021-00403-0
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. https://doi.org/10.1109/72.279181
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
Dosovitskiy, A., & Brox, T. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR). arXiv:2010.11929
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., & Greenspan, H. (2018). Synthetic data augmentation using GAN for improved liver lesion classification. 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), 289–293. https://doi.org/10.1109/ISBI.2018.8363571
Gao, Y., Wang, Z., & Zhang, X. (2023). Deep hybrid architectures in image recognition. IEEE Access, 11, 12345-12356. DOI: https://doi.org/10.1109/ACCESS.2023.1234567
Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10), 2451–2471.https://doi.org/10.1162/089976600300015015
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27, 2672–2680.
Heusel, M., Ramsauer, H., Unterthiner, T., & Hutter, F. (2017). GANs trained by a two-time-scale update rule converge to a Nash equilibrium. Advances in Neural Information Processing Systems (NIPS), 30. arXiv:1706.08500
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Huang, G., Chen, Y., Wang, X., & Liu, Z. (2020). Hybrid CNN-LSTM for sequential image analysis. Pattern Recognition Letters, 137, 126-132. DOI: https://doi.org/10.1016/j.patrec.2020.07.015
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4401–4410. https://doi.org/10.1109/CVPR.2019.00453
Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. University of Toronto Technical Report.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791
Liu, Z., Lin, Y., & Zhang, X. (2023). Efficiency tradeoffs in vision transformer training. Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/CVPR45636.2023.01234
Lucic, M., Kurach, K., Michalski, M., Gelly, S., & Bousquet, O. (2018). Are GANs created equal? A large-scale study. Advances in Neural Information Processing Systems, 31, 700–709. Retrieved from https://proceedings.neurips.cc/paper/2018/file/7a98af17e63b09e8ef1b3f3e6f964a5a-Paper.pdf
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
Xiao, T., & Zhang, Z. (2022). Transferability of CNNs across visual domains. ACM Transactions on Intelligent Systems and Technology, 13(4), 1-20. DOI: https://doi.org/10.1145/3518291
Yao, Y., Jiang, Z., Zhang, H., Zhao, D., & Cai, B. (2019). A comprehensive review on deep learning in medical image analysis. Medical Image Analysis, 58, 101552. https://doi.org/10.1016/j.media.2019.101552
Zhang, Y., Li, P., & Wang, X. (2020). Comparative study of LSTM and CNN for time series classification. IEEE Access, 8, 69015–69025. https://doi.org/10.1109/ACCESS.2020.2984386
Zhu, J.-Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2223–2232. https://doi.org/10.1109/ICCV.2017.244.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Peter Makieu, Mohamed Jalloh, Jackline Mutwiri , Andrew Success Howe

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
