Research and Innovation: Research in ChatBot's Large Language Models to optimize Customer Acquisition Systems and Customer Relationship Management in Real State market

Abstract

ApeReal has a comprehensive platform where customers can complete their entire home-buying journey, with a focus on the Minha Casa Minha Vida program. In this context, the objective of this research is to explore Large Language Models (LLMs) algorithms for interpreting Brazilian Portuguese, considering the specific linguistic nuances of the real estate sector in Brazil. The study will involve an in-depth investigation and fine-tuning of models such as Mistral, Chat GPT 4.0, Gemini, LLama3, among others, to evaluate which is most effective in understanding the language and interacting with users. The ultimate goal is to create an intelligent chatbot that will replace the company's internal processes in the selection and qualification of leads for the sales process.The database will consist of recorded telephone conversations of SDRs (Sales Development Representatives) with customers, as well as conversations via messaging apps, which will be processed, cleaned, and handled using ETL tools. Additionally, the research team will test specific software tools (e.g., ChatML) to standardize these messages into a specific format for fine-tuning LLMs, such as the instruction-input-output method, JSONL, or SquAD. The standardized messages will then be tokenized, a step in which patterns based on words, subwords, or hybrids will be studied. These will then be grouped into batches and input into the training algorithm.With the messages in the appropriate format, Python libraries will be used to execute the LLM training, utilizing techniques such as quantization, PEFT (Parameter-Efficient Fine Tuning), and LoRA (Low-Rank Adaptation). The use of these techniques will allow the customization of the LLM's parameter subset, thus creating a specific and proprietary model for the company, tailored to its business model. This represents the significant advancement and scientific contribution of this research, going beyond a simple practical application of LLMs.The use of the LLM's parameter subset will make it more adaptable and agile for training with multiple rounds of messages. For the training, custom training scripts will be implemented, optimized to work with large language models. Data will be loaded in small batches to facilitate efficient parameter updates. The loss function used will be cross-entropy, which measures the difference between the model's predictions and the true labels, created specifically for this comparison. Model parameters will be optimized using algorithms such as Adam or SGD (Stochastic Gradient Descent).At the end of this stage, it is expected to obtain the best LLM model with hyperparameters adjusted for the real estate domain. In the next phase, the research will evolve into experimental studies with prompt engineering techniques to improve the interaction of models with users, increasing the accuracy and suitability of responses to the real estate sector's needs. Additionally, a context expansion system - RAG - will be developed using a database of documents on real estate, developers, and the real estate market.Finally, a careful evaluation of each iteration of this process with each of the trained LLMs will be conducted. Domain experts will examine the output of each LLM against these questions and evaluate it according to predetermined criteria, or a golden set, using evaluation metrics such as perplexity, BLEU, ROUGE, and diversity, but primarily human evaluation.Upon defining the model and completing the training stage, a specific architecture will be built to integrate the chatbot with ApeReal's current customer management system through APIs. (AU)

Articles published in Agência FAPESP Newsletter about the research grant:

More items Less items

TITULO

Articles published in other media outlets ( ):

More items Less items

VEICULO: TITULO (DATA)

Grant number:	24/14052-7
Support Opportunities:	Research Grants - Innovative Research in Small Business - PIPE
Start date:	January 01, 2026
End date:	September 30, 2026
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computer Systems

Principal Investigator:	Pedro Luiz Nani Costa
Grantee:	Pedro Luiz Nani Costa

Company:
CNAE:	Desenvolvimento e licenciamento de programas de computador customizáveis Intermediação na compra, venda e aluguel de imóveis

Associated researchers:	Bruno Barbieri de Pontes Cafeo ; Elder José Reioli Cirilo

Short URL