Hosting corpora and datasets

For hosting data and corpora for atypical communication and making these accessible in a FAIR manner, CLST has established a close collaboration with The Language Archive (TLA). TLA is situated at the Max Planck Institute for Psycholinguistics (MPI) in Nijmegen. As a CLARIN B Centre the goal of TLA is to provide a unique record of how people around the world use language in everyday life. They focus on collecting spoken and signed language materials in audio and video form along with transcriptions, analyses, annotations and other types of relevant material such as photos and accompanying notes. TLA offers storage of sensitive data (speech, audio and transcripts) and supports the CMDI metadata framework. TLA also supports strong authentication procedures, layered access to data, and persistent identification.

For corpora of speech from people with language disorders ACE works closely together with the DELAD initiative. Especially for this type of resources there is a close collaboration with CMU’s Talkbank / Clinical banks. Our collaboration allows that data can be registered at Talkbank and obtains its metadata and landing page at the Talkbank website whereas the storage of and authentication of access to the ‘raw’ data (typical audio and video) data is handled at TLA. For giving access to critical data ACE is also involved in the SSHOC project in which Task 5.4 is devoted to making an inventory of systems and technologies suitable to conduct research on critical data which is relevant for offering various ways of accessing critical data stored at central repositories where they can be downloaded or at shielded repositories where they can only be remotely accessed.