Abstract
The Natural Language Processing (NLP) field has undergone significant transformations, primarily marked by Large Language Models (LLMs). However, an inherent limitation of these models is the incapacity of processing data modalities beyond text. To tackle this, in recent years, different Multimodal Large Language Models (MLLMs) have been proposed to extend the LLMs to other modalities fur…