PRIN 2022 CIP - Corpus of Italian language for Preschoolers. Lexicon directed to Italian preschool children from 3 to 6 years collected from heterogeneous sources in Italian and Italian Sign Language

Data inizio
1 settembre 2023
Durata (mesi) 
Scienze Umane
Responsabili (o referenti locali)
Majorano Marinella
Parole chiave
language, developmental psychology, preschool

This project aims to create a novel collection of linguistic resources for the lexical investigation of the language directed to preschool children aged 3-6 years (i.e., before entering primary school). These resources will include a corpus, the tools necessary to access and extract the information it contains, and a sample dictionary. The data will be collected from a diversified array of contexts and situations, including adult-child interactions, children's books, cartoons and media. These resources will include both Italian language and Italian Sign Language (LIS) data. Beside not being currently available for this age range, these resources are innovative for multiple reasons that constitute the rationale of the project.

Lexical exposure plays a relevant role in the first years of life, since the quantity and quality of the lexicon children are exposed to is crucial for their linguistic development and impacts on many other areas of their cognitive and social development, primarily on school and academic skills. Hence, the empirical investigation of child-directed lexicon is important first to document, analyze, and understand its structure and usage from the psychological and linguistic perspective; then to leverage this knowledge in applied fields such as speech-language therapy, language teaching, and children’s media design.

The resources will be tailored to different types of users, such as researchers, teachers and educators, clinicians, media designers, and parents, and they will allow more in-depth investigations on the role of linguistic input in the developmental stages and on the composition of children's vocabulary. At the same time, they will constitute a valuable source for the creation of empirical tools that will offer support for professionals and researchers involved in children education and care, and suggestions for creators of media for preschoolers. The sample dictionary, presented in a child- and layman-friendly manner, will represent the prototype of a tool useful not only for teachers and educators, but also for parents. Since sources from both Italian and Italian Sign Language will be included, the tools will allow comparisons of data across the two languages.

The collected data will be processed within the methodological framework CLARIN using the ELAN software, while the analysis of lexical features will be done using the ILC-LinguA software. Information gathered and processed will feed a user-friendly front end platform for quick and easy consultation.
The project aims will be achieved through strict and goal-oriented collaboration between specialists from different domains including psychology, speech-language therapy, computational and applied linguistics, and statistics, who share an interest in empirically based research on child-directed spoken and signed languages. The tools that will be developed will be functional for the accomplishment of some of the PNRR's missions, e.g., M4C1.3, M4C2.3, MCC2.1.

Partecipanti al progetto

Elena Florit
Ricercatore a tempo determinato
Marinella Majorano
Professore associato
Aree di ricerca coinvolte dal progetto
Formazione e organizzazioni
Psychology, Developmentalh