No. 29




Estonian Web Archive Preserving National Cultural Heritage

  • Jaanus Kõuts

    Jaanus Kõuts

    National Library of Estonia, web archiving senior specialist

Cultural heritage being formed now is partly digital and a portion of it is published on the web only.

Almost every aspect of the modern e-state society and culture has its own reflection on the web. Without preserving the web sites there will be significant gaps in the Estonian national cultural heritage. We can imagine a researcher in the year 2114 discovering from printed sources, TV and radio broadcasts of our time many links pointing to websites with no possibility to use these sources because they were not archived nor preserved.

The National Library of Estonia started to study the possibilities to create the web archive already in 1997. In 2000 a pilot project was launched and it was switched to a web archive in 2005. The Legal Deposit Act passed on 1 June 2006 granted the National Library of Estonia entitlement to archive Web publications as legal deposits and make them publicly accessible. The owner of archived material has the right to restrict access to his or her publications. Only a few countries in the world have so advanced legislation – Iceland, Slovenia, Croatia and Portugal.

The Estonian Web Archive was opened to the public in November 2013. It contains 1.6 TB of data (31 million URLs) collected from 2010 to the end of 2013. Due to the fact that only a small part of valuable sites can be gathered by the selection-based archiving, the archiving of the entire Estonian Web domain (.ee) is planned for 2014.

Web technologies are evolving fast and archiving software cannot keep up the pace, so there is a strong need for an IT-specialist with ability to solve emerging technical challenges. But the budget of National Library of Estonia allows to dedicate only 2.5 positions to web archiving, which is not enough for such a task.

Since the beginning of 2012 the National Library of Estonia is a member of the International Internet Preservation Consortium. The consortium is improving the tools, standards and best practices of web archiving.

In other countries the social scientists, linguists and computer scientists are using web archive collections as big data to get new knowledge. Researcher Kalev H. Leetaru (Georgetown University, USA) has suggested using the web archive materials in Estonian universities as big data in data mining exercises. There is also a request to make the public sector websites available as open data.

The researchers of tomorrow need the sources of today. Our mission is to collect and keep the heritage and pass it on to the next generation.

Full article in Estonian