EXPLORING BANGLADESH CULTURAL HERITAGE THROUGH INTEGRATING DEEP CONVOLUTIONAL NEURAL NETWORK BY IMAGE CAPTIONING
Abstract
Image captioning is a significant task at the intersection of computer vision and natural language processing that aims to automatically generate concise textual descriptions of images. Although seemingly simple for humans, this task is complex for machines as it requires both accurate image analysis and the generation of semantically coherent sentences. Recent advances in encoder–decoder architectures, which combine convolutional neural networks for feature extraction with recurrent or transformer-based networks for sentence generation, have achieved promising results in this domain. In this work, we focus on applying image captioning to represent Bangladeshi culture, traditions, foods, and heritage sites, an area largely overlooked in existing research. To this end, we build a novel deep convolutional neural network model trained on a curated heritage dataset consisting of images of historical landmarks, cultural events, and traditional foods of Bangladesh. The proposed model generates culturally enriched captions that highlight not only the visual content but also its cultural and historical significance. Our system can serve as a digital bridge to promote Bangladeshi culture, benefiting travelers, researchers, and enthusiasts while contributing to cultural preservation. Ultimately, this study demonstrates how image captioning can extend beyond visual description to support heritage promotion and global cultural engagement.