Inner alignment in artificial intelligence systems

Grant number:	23/15356-7
Support Opportunities:	Scholarships in Brazil - Scientific Initiation
Start date:	March 01, 2024
End date:	August 31, 2024
Field of knowledge:	Physical Sciences and Mathematics - Computer Science - Computer Systems

Principal Investigator:	Jacques Wainer
Grantee:	Gabriel Antunes Rodrigues

Host Institution:	Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil

Abstract In this work, we investigate the recently formalised inner alignment problem. In broad terms, to align an artificial intelligence is to construct or adjust it in such a way that its outputs are in accordance with human preferences. Internal alignment is a subtask within this exercise, in which the system is treated as an optimisation mechanism which is in turn optimised by some other optimiser. We aim to assess environments where inner alignment failures occur, as well as identify potential causes of this phenomenon. Our intention is to elucidate the nature of the problem, which we regard as important for the future of the field.

News published in Agência FAPESP Newsletter about the scholarship:
More items Less items
TITULO

Articles published in other media outlets ( ):
More items Less items
VEICULO: TITULO (DATA)
VEICULO: TITULO (DATA)