Advanced search
Start date
Betweenand

Using Persistent Memory to Accelerate Large Language Model Inference

Grant number: 24/02372-7
Support Opportunities:Scholarships in Brazil - Master
Start date: April 01, 2024
End date: April 30, 2025
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal Investigator:Alexandro José Baldassin
Grantee:Pedro Luis Cattai
Host Institution: Instituto de Geociências e Ciências Exatas (IGCE). Universidade Estadual Paulista (UNESP). Campus de Rio Claro. Rio Claro , SP, Brazil
Associated research grant:18/15519-5 - Performance optimizations for multicore architectures, AP.JP2

Abstract

Large Language Models (LLM) have gained a lot of attention given the recent technological advances. However, they require large memory requirement and high processing power, particularly for systems employing a limited amount of DRAM and slow storage devices. Often times, the data needed by the models cannot fit entirely in the main memória (DRAM), requiring external storage systems which invariably slow the entire system down. The goal of this project is to devise a solution for LLMs so that they can be efficiently used in systems with modest amount of DRAM. The main ideia is to use Persistent Memory (PM), a new non-volatile byte-addressable memory technology, to optimize LLMs, particularly in the inference stage.

News published in Agência FAPESP Newsletter about the scholarship:
More itemsLess items
Articles published in other media outlets ( ):
More itemsLess items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)