27, February 2013 | Published : Projects |
The site « HDA-Lab » results from a collaboration between the Institut and Innovation (IRI) and the Departement of Digital Programs (DPN) of the Ministry of Culture and Communication. This Research & Development project is intended to show the heuristic potential of semantic tagging.
The Histoiredesarts corpus
The corpus used for the needs of this project is extracted from Histoiredesarts. It currently counts about 5000 descriptive notes referring, by deep links, to the same number of online ressources. Each note from Histoiredesarts contains key words, initialy produced as simple tags.
« HDA-BO », semantic tagging module
Technically, the key words used to re-index the corpus are borrowed from Wikipedia articles (the titles of the encyclopedia articles), for example : « Quentin de La Tour », « Vallée des rois (Valley of the kings) », « IVe siècle av. J.-C. (4th century B.C.) » etc.
This tool offers a function that links to Wikipedia. The complete list of articles from the encyclopedia is available for each tag. The re-indexation consists simply of substituting the tag for its equivalent among the Wikipedia articles. The module then imports the label and the URI from Wikipedia. It also imports a link towards the semantic Web version of Wikipedia : DBpedia.
This approach offers numerous advantages, in particular:
- The disambiguation, for example, allows to distinguish « Roman » referring to Roman art, from « Roman » in the sense of literary works.
-The universal interoperability of key words: the article’s Wikipedia URI gives each key word an universal identifier. All institutions adopting this procedure would be interoperable with the Histoiredesarts corpus.
-The automatic enrichment of metadata: certain data contained in the Wikipedia articles can be automatically excerpted to enrich indexation. This can permit, for example, to interrogate Histoiredesarts in a foreign language, to automatically locate a monument on a map, to associate images or definitions to research, or even to generate thematic indexes (index for writers, for painters…).
-Underlying logical relationships in Wikipedia content (for example, inclusion relationships between French cities, departments and regions) enable the enrichment of research functionalities, such as finding all cities belonging to one region.
Thanks to this tool, the 350 partner institutions of the project will have the capacity to enrich and update their own data independently.
« HDA-Lab », proof of concept
« HDA-Lab » is a research and navigation interface of the semanticized version of the Histoiredesarts corpus.
The first functionalities available online today (not yet definitive) focus on searching by facets: time (timeline), space (world map), arts (list of fields), thematic keywords (tag cloud) and their sharing on the
Through the month of June, « HDA-Lab » will be enriched with other features: searching with an increasingly complete list, multilingual access, heuristic map, index of authors, etc. The corpus, which is currently being processed, will ultimately be fully re-indexed.
This experimental prototype, from an initiative by Research & Development, is not intended to replace the Histoiredesarts directory, but to explore new paths offered by the Web 3.0 and thus encourage cultural institutions to adopt these new technologies.
This proof of concept is firmly oriented toward the end-user: users will be able to compare online, the classic features of the original interface with the enhanced features of the proof of concept. The objective is to demonstrate the feasibility of semantic tagging and sensitize the end-user to the challenges of Web data.
For more information :