Linkage Projects 2020 Round 3 Announcement Banner

Preserving records of endangered languages through digital archives

Preserving records of endangered languages through digital archives

Image: Associate Professor Nick Thieberger. Credit: The University of Melbourne.

Associate Professor Nick Thieberger is a linguistics academic and an Australian Research Council (ARC) Future Fellow, based at The University of Melbourne, who is digitising and archiving the records of our most at-risk indigenous languages.

Professor Thieberger and his team are building and populating the databases that hold these records, and developing methodology that will allow the public, and particularly speakers of endangered languages, to have greater access to the raw materials of language research including transcripts, words lists, and recordings made in the field.

The newest example of this work, and a direct outcome of Professor Thieberger’s Future Fellowship, is the online language archive Digital Daisy Bates, which was launched on 12 June 2018 at the National Library of Australia. The website is the result of the complete digitisation of historical documents prepared in the 1900s by Irish-Australian author and ethnographer, Daisy Bates. It contains 23,000 pages of wordlists of Australian languages mostly from the Western half of Australia, and is completely accessible and easily searchable by the public and researchers.

Professor Thieberger also engaged a Text Encoding Initiative (TEI) specialist in Australia, to assist with encoding the texts.

“The TEI technology is something that is typically used on medieval manuscripts, and has never previously been applied to Australian Indigenous languages,” says Professor Thieberger. “People have put images of early documents on the internet, which is great. However I wanted to explore how technology such as TEI could improve our access to these materials by including the text as well as the images.”

“In the sciences, taxonomy is a legitimate activity, but in the humanities people don’t seem to regard it so highly. Yet as linguists we have to know the range and possibilities of human language, and to do that we have to have primary records of as many languages as possible.”

Professor Thieberger says that when properly indexed and searchable, the material not only opens up doors for linguists, but also for researchers in other disciplines who are seeking ‘latent data’ which was captured incidentally to the main purpose of the research, for example biologists who are interested in indigenous plant names.

“There might even be ornithologists who are interested in bird calls which are recorded in the background of a language researcher’s tape,” says Professor Thieberger.

An important aspect of these digital collections, is that they always present records in their original state as when they were collected, without embellishment, with any transcriptions and annotations presented alongside the originals, not replacing them.

“For linguists, analysis at this moment in time is great, but it is always based on current knowledge and fashions in our discipline, and there are some things we don’t look at. So the importance of preserving primary records is that in the future someone can go back through them with new eyes and find what we don’t even know is in there.”

Professor Thieberger is also the Director of a digital archiving project funded through several rounds of the ARC’s Linkage Infrastructure, Equipment and Facilities scheme—the Pacific and Regional Archive for Digital Sources in Endangered Cultures, or PARADISEC, which archives Australian researchers’ recordings from the Asia-Pacific region.

“PARADISEC provides a citable form of primary data, and a way for people to verify that the data cited in analysis actually exists. Its catalog is accessible to people on their phone—often the only means of internet access in remote areas.

“Current students sometimes take it for granted that they have this technology, but when I was doing my PhD in 1991, I had to build a digital archive from primary sources as I went, just so that I could cite my examples of primary data—I presented my thesis with a DVD of the archive.”

PARADISEC is helping to change methodology in language description, supporting re-usability of primary research records, both as a basis for future research, and for the people recorded and their descendants.

As a Chief Investigator at the ARC Centre of Excellence for the Dynamics of Language, Professor Thieberger is also overseeing the digital preservation of the enormous amounts of new language data that is being produced by researchers at the Centre.

“It’s a logical fit for my Future Fellowship, which is about developing the digital archive technology, and the Centre of Excellence recognised the need for a repository to archive the outcome of their primary data—all their language recordings—for future use,” says Professor Thieberger.

“In the past, language researchers might only archive their materials on retirement or it was done after their death by executors. Now, to ensure these records survive and are useful to other researchers, we are making sure that researchers archive as they go.”

Through Professor Thieberger and his team’s dedication, the precious resources which flow from the work of our language researchers are preserved, for the benefit of future generations to come.


Image: Associate Professor Nick Thieberger. Credit: The University of Melbourne.

Back to top