Advanced search
Start date
Betweenand

Clarice.ai: a linguistic intelligence aid to writing nonfiction based on web

Grant number: 18/22511-0
Support type:Research Grants - Innovative Research in Small Business - PIPE
Duration: August 01, 2019 - April 30, 2020
Field of knowledge:Linguistics, Literature and Arts - Linguistics - Applied Linguistics
Principal Investigator:Felipe Iszlaji de Albuquerque
Grantee:Felipe Iszlaji de Albuquerque
Empresas:Empresa a definir
Clarice Inteligência Artificial Ltda
CNAE: Portais, provedores de conteúdo e outros serviços de informação na internet
Outras atividades de prestação de serviços de informação não especificadas anteriormente
City: São Paulo
Associated scholarship(s):19/18786-7 - Clarice.ai: a linguistic intelligence aids to writing nonfiction based on web, BP.TT
19/18923-4 - Clarice.ai: a linguistic intelligence aids to writing nonfiction based on web, BP.TT
19/17221-6 - This research project proposes the construction of a linguistic intelligence that can assist, in real time, users who are writing non-fiction texts, BP.PIPE

Abstract

This research project proposes the construction of a linguistic intelligence that can assist, in real time, users who are writing non-fiction texts. It can be compared to existing spelling and grammar check assistants in text editors, such as Microsoft Word and others. However, this project's differential is to build a linguistic intelligence, not in the spelling and grammar levels, but in the style level that is used in some non-fiction genres. Technology should be able to help users with writing techniques in the following categories: i) Text writability; ii) Visual aspects of texts; iii) Phonic stylistics; iv) Syntactic stylistics; v) and Lexical-semantic stylistics. Some typical examples of deviations in this level of writing are: excessive use of adjectives and adverbs; excessive use of the passive voice; sentences that are too long; the use of clichés and catchphrases; repetition of words; excessive use of conjunctions, such as "that" and "and"; cacophony, echoes, and alliterations. Similar tools already exist, and they have been produced for different languages, especially English. Two factors could explain the emergence of this tool: first of all, the advance of natural language automatic processing techniques allows for the development of the technology. Secondly, a commercial application for this tool has been identified, in two emerging industries: the EdTechs (technologies for Education) industry and in the Content Marketing industry (content production for ranking brands and products in search mechanisms). The Content Marketing industry is a large one, and it keeps growing. One sole company, Rock Content, produces over 10 thousand articles per month. That volume of text production demands an increasing number of writers. There is an estimate of 100 thousand freelance writers in Brazil. And, for every 20 writers, a proofreader is needed. That demand is higher than the ability to train good writers and proofreaders. That is the reason why the proposed tool has the potential to contribute to that industry, by improving the quality of the produced articles, while reducing costs for content producers. Thus, the aimed result is a cloud-based text editor, with web access from different devices, that will activate an artificial linguistic intelligence to give real-time feedbacks, for tips on "how to write well." As for the research methodology to be applied, the project scope has been narrowed down to the subfield of Natural Language Processing (NLP) research and to the sphere of investigations that design and develop writing aid tools. One of the main strategies in this research project is to use linguistic resources that have been developed by research labs, in order to advance in the construction of an innovative product, for massive use and provided with a commercial value. The methodology summary can be explained as: i) automatically detecting language or stylistic deviations; ii) through available or built computational linguistic resources; iii) and providing the tip that should be shown as real-time feedback for the user. We will start with the detection of the 80 most common deviation for the genre "500 to 1000 words article for internet Content Marketing, and the beta version will be launched with that value proposition. (AU)