Loading...
RTVE-Grafo of a Semantic Interoperability Project
Chapter 01
Chapter 01 Read more
Knowledge representation standards
Chapter 02
Chapter 02 Read more
Semantic annotation and population of the RTVE graph
Chapter 03
Chapter 03 Read more
RTVE Play's contextual information interrogation,
search and retrieval system.
Chapter 04
Chapter 04 Read more
Semantic Artificial Intelligence able to operate
in a ‘’common sense‘’ framework with humans
Chapter 05
Chapter 05 Read more
RTVE are conversations
Chapter 06
Chapter 06 Read more
The RTVE Ontological Model
Chapter 07
Chapter 07 Read more
A Semantic Interoperability Project
RTVE Grafo is a semantic interoperability project that integrates and enriches audiovisual content with metadata, facilitating data exchange and enhancing the user experience through contextual and precise searches.
RTVE-Grafo is a Semantic Interoperability project.
RTVE Play's technical platform integrates with multiple systems, both internal and external, including historical content from the Archivo RTVE, as well as the latest RTVE productions and broadcasts from in-house production or acquisitions.
The integration and consolidation of information from audiovisual production, both video and audio, is a complex process that requires the contribution of third parties, the consolidation of multiple data sources and, finally, the enrichment with metadata to provide the content with complete information.
In order to improve the exchange and enrichment of information and the interoperability of data, RTVE as a member of the European Broadcasting Union, known by its acronym in English EBU/UER (EBU) participates in metadata standardization processes and in specific groups, such as, for instance, the group EBU-AIM which manages and promotes the use of standard metadata systems. Within the work of this group, RTVE adopts the EBUCorePlus ontology as a pivot ontology for the construction of a knowledge graph through the RTVE Grafo project. EBUCorePlus is an ontology developed by the EBU for the representation of information from companies and media organizations, developed as a strictly semantic open source project, whose mission is to improve interoperability between systems.
Semantic interoperability is defined as the ability of computer systems to exchange data while maintaining a clear and unambiguous meaning. For the understanding between systems, it is essential that not only data are shared between different systems or applications, but also that they are interpretable symbolically while maintaining the meaning. In this context, RTVE launches the initiative RTVE Grafo as
a semantic interoperability project for the exchange of information between heterogeneous systems, to guarantee the semantic interpretation capacity of the data published by RTVE.
This will simplify the exchange of information between RTVE systems and with third parties and will result in a more effective, simple and semantically aware integration and cooperation between them.
RTVE Grafo is not only a semantic project for machines; it is also a project on which services are created for people, with the aim of improving their experience when they relate to RTVE's contents and information and entertainment offer.
RTVE Grafo makes it possible to generate simpler, more accurate, useful and intuitive models of interaction, interrogation and conversation between people and machines, exploiting the capabilities of inference and knowledge discovery, which facilitates the findability of the content.
In short, RTVE Grafo makes it possible to create utilities for people through advanced semantic technologies.
Knowledge representation standards
The adoption of semantic standards in RTVE Grafo enhances interoperability and data enrichment, creating a knowledge model adaptable to the Spanish media sector based on European practices.
The RTVE-Grafo ontology framework: Knowledge representation standards and the RTVE ontology
Knowledge representation standards
The adoption of semantic standards in RTVE has allowed the improvement of internal and external interoperability, facilitating the linking of data and enabling the enrichment of information based on the linking and hybridization with other content representation ontologies.
The RTVE Grafo project sought to implement a European standard in the spanish media sector. As an EBU member; and participant in the group EBU-AIM, which manages and promotes the use of standard metadata systems and specifically EBUCorePlus, it was agreed that this standard would be used, although adapted to the idiosyncrasy of RTVE's audiovisual production.
EBUCorePlus is a media ontology, developed by the EBU Metadata Modeling Working Group as an open source project, and is the continuation of two EBU ontologies: EBUCore and CCDM (Class Conceptual Data Model).
Ontologies allow machines to recognize entities in the world and, in general, to structure and organize information in a symbolic and precise way, which in turn enables more effective data processing, as well as building systems and applications that are more humanized and useful for people.
Thus, an ontology is a formal representation of a set of concepts and their relationships within a domain of knowledge written in a technical programming language understandable by machines. In practice it functions for machines as the implicit system of classes used by the human mind to identify, distinguish and classify entities in the world. The ontology that RTVE has built on top of EBUCorePlus for the RTVE Grafo project defines concepts, along with their relationships and properties, and includes axioms that establish rules about these concepts. RTVE's ontology uses technical programming languages and standards established by W3C such as OWL and RDF, and will be key for its artificial intelligence and knowledge representation program, as well as for its participation in the semantic web and Linked Open Data Web construction program.
RTVE publishes its ontology so that the detailed description of each of the classes, relations and attributes that compose the knowledge domain of RTVE Play can be consulted. This technical documentation website allows to consult the detailed description of the classes and attributes of the ontology, its visualization in interactive format and its download in OWL format.
https://www.rtve.es/en/graph/ontologies/rtveplay/
Given RTVE's position in the audiovisual market in Spain and in the Spanish language, the adoption and localization of an ontological standard of reference such as EBUCorePlus will help to standardize and improve the quality and interoperability of data, not only with the aim of improving integration and access to its own content, but also with the aim of raising the standards of the audiovisual industry in Spain.
The Knowledge Graph
RTVE's Knowledge Graph is a system that structures information by understanding the relationships among various audiovisual content as well as any potentially linked objects. It unifies data, making it accessible and comprehensible for both machines and people.
RTVE Grafo is RTVE's operable Knowledge Graph.
RTVE's knowledge graph
The fact that ontologies are a knowledge representation model independent of any system is what makes it possible to represent and consolidate data from diverse and heterogeneous systems in a knowledge graph. RTVE Grafo is an advanced data structure that organizes and relates information in a more intuitive and efficient way. This graph has been populated with the contents of RTVE Play, which allows a better organization and access to the vast amount of RTVE audiovisual material.
The RTVE Knowledge Graph, RTVE Grafo, is a representation system of the set of its contents and digital resources that understands facts related to programs, audiovisual contents, seasons, genres, themes, as well as any object potentially linked to them. When we say that it is a system that “understands” we must assume that it is a system written in a technical language that makes it possible for machines or systems to “understand” and correctly treat the set of entities to which we have referred in order to collaborate with people in their processes of interrogation, information retrieval and knowledge discovery.
Semantic annotation and population of the RTVE graph
Probably the most outstanding and far-reaching result of this digital project has been the consolidation of the contents from the Spanish Public Radio and Television into a large unified knowledge graph, extensible, expressive and interrogable by machines and people, making it easier for users to retrieve these resources according to any interest or intention.
For the consolidation of all RTVE Play data in the unified knowledge graph it has been necessary to design and develop a synchronization process that collects online the data from RTVE systems and annotates them semantically according to the defined EBUCorePlus-based ontology and adopted term vocabularies (as is the case of ESCORT 2007 - EBU System of Classification Of Radio and Television Programs), representing them in the form of triples (predicative sentences with the form subject+predicate+object) and depositing them in the semantic store (graph database) that is at the heart of RTVE's new semantic AI platform.
For the correct semantic annotation it has been necessary to perform a data alignment of the existing contents in the RTVE databases with the classes and attributes defined in the RTVE ontology and in some cases to improve the metadata of such contents in the source systems. A crucial objective of the RTVE Grafo project was to improve this metadata, that is, the way in which RTVE contents are labeled and described. A more accurate and detailed metadata facilitates the search and access to the information. From an internal point of view, the project aimed, therefore, at developing a fine-tuned system for the annotation and semantic representation of contents which would shorten the distance between RTVE and the varied set of audiences to which a public institution has to address and for which it has to speak. For this purpose, and beyond its public use. RTVE Grafo is used to annotate, organize and present the information in a meaningful way, gathering, for example, in the file of each content all the relevant information related to it.
The RTVE knowledge graph integrates some 2,000,000 digital resources, 26 million entities, some 85 million relationships between these different objects and entities, and 167 million triples, which are used to understand the meaning of the term that the user enters in the search; but also to offer a system for exploring the collection and, in general, all the resources, based on a faceted search engine, among other utilities, which allows the user to have all the possible ways of browsing this set of entities. The ontology and the RTVE Grafo will make it possible to represent the contents in a more precise, detailed, exhaustive and expressive way, and will facilitate more natural and conversational forms of relationship between users and the same.
In short, the exploitation of a knowledge graph of these characteristics will make it possible in the future to hybridize it with other artificial intelligence technologies in order to develop advanced services for different groups of users. This new way of being present on the Web aims, in short, to create and use the knowledge base of the Spanish audiovisual heritage in an intensive and efficient way.
Knowledge graphs represent the structure of reality and the mode of operation of our cognition.
RTVE Grafo wants to be useful to people and their demands for knowledge expressed through their questioning processes. To this end, the project has transformed the massive data and knowledge of RTVE Play into fast and accurate answers to complex questions in a scenario that assumes the need for the answers to be explainable, using artificial intelligence based on the emulation of human-like reasoning (semantic reasoning) operated with high-performance knowledge graph technology.
The RTVE Grafo knowledge graph is operated by AI for some specific purposes, such as the unification of the thesauri of the RTVE Archive, a key project for the enrichment of the metadata of its contents.
The interrogation system
RTVE has developed a semantic search system that uses its knowledge graph to provide a contextual and enriched search experience. This allows users to access relevant and organized content, facilitating navigation and information retrieval on any device.
RTVE-Grafo makes all RTVE contents accessible from a single point of interrogation. The contextual search and retrieval system for RTVE Play information.
RTVE Play's contextual information interrogation, search and retrieval system.
A knowledge graph is not only a means to integrate heterogeneous and distributed information, to improve the interoperability of systems or to facilitate the representation of knowledge to documentalists and other expert personnel, it is also a means to build intelligent utilities for people. RTVE has implemented for this purpose and on the basis of the RTVE Grafo knowledge graph, a faceted semantic search engine that allows textual search, contextual search, advanced search by entities, as well as the generation of pages with enriched information. In short, it has built a more intuitive, simple, natural and contextual system of interrogation and retrieval of RTVE PLay information. This means, in practice, that searches are not only based on keywords, but on the meaning and context of the terms, which makes them more efficient and relevant.
The RTVE Grafo project aims to provide the best possible experience to its digital visitors, offering a search engine that interrogates a knowledge graph where audiovisual resources are linked to each other, making it possible to present results well organized by entities, as well as enriched and contextualized. In short, one of the aims of the project has been to provide the public with a more intuitive, intelligent, personalized, semantically meaningful and effective browsing and search experience.
And all this, ensuring that this new experience of knowledge discovery and navigation through the contents of the Spanish Radio and Television works equally on any kind of device, so that all users can access what interests them and act as they wish in the RTVE Grafo framework at any time and place.
RTVE Grafo is built on GNOSS technology, a company that for two decades has been working on the development of its own capabilities for the construction and exploitation of knowledge graphs, a mathematical structure, computable by machines and interrogable by people, that represents well, both the structure of reality and that of our cognition and that, therefore, are at the core of the Artificial Intelligence Symbolic Program. GNOSS is also a pioneer in the construction of the Semantic Web in Spain and in the wider Spanish language. In 2010, GNOSS became the first Spanish company in linking its GNOSS and DIDACTALIA projects with Freebase, New York Times and DBPedia and, as a consequence, in being part of the global project of building the semantic web with open and linkable data orLinked Open Data Cloud.
In order to make its purpose a reality, GNOSS has developed GNOSS Semantic AI Platform, development platform for the construction of projects Inteligencia Artificial Simbólica o Semántica, which includes an ecosystem of Artificial Intelligence Cognitive Services. The Artificial Intelligence family of services from GNOSS Semantic AI Platform consists of GNOSS Sherlock NLP-NLU, which encompasses a set of natural language processing services oriented to the recognition and extraction of entities and topics in texts and their subsequent consolidation in a knowledge graph, and GNOSS LOKUT, a natural language interrogation system that hybridizes knowledge graphs with LLMs in order to ensure an auditable, traceable and reproducible AI output, which are the three attributes of an Explainable AI.
Semantic Artificial Intelligence
RTVE's unified knowledge system identifies entities in the audiovisual field, allowing users to interact with the machines in an intuitive way. This facilitates a deeper and more contextualised exploration of the contents produced by Spanish television, ensuring a proper semantic understanding.
Semantic Artificial Intelligence able to operate in a ‘’common sense‘’ framework with humans
The differential technological feature of the RTVE Grafo project is that all the information generated by RTVE is integrated and consolidated in a unified knowledge graph, interrogable by machines and people, which works as the cognitive artifact of the project of publication and broadcasting on the Internet of Spanish television and which is interrogable from a single point.
The unified knowledge graph of Spanish radio and television recognizes the entities of the audiovisual world. This enables people to converse with machines in a common sense framework and to inquire about any content produced by Spanish television in a deeper, contextual and semantically aware way.
The RTVE Grafo information interrogation, search and retrieval system works in the technological scenario of the Artificial Intelligence Program interpreted semantically or based on the exploitation by humans and machines of the possibilities inherent to the linking of data in a knowledge graph.
This is not only the condition for the systems to be able to interpret the knowledge generated by RTVE, but also to link it with each other and, in the future, with third party sources that can enrich this content and contextualize it, thanks to the Contextual Artificial Intelligence framework that RTVE Grafo provides.
RTVE are conversations
RTVE transforms into a space for dialogue, where each visitor is unique and seeks something different. The RTVE Play Graph stands as the bridge connecting the platform with its diverse audience, creating meaningful and personalized conversations.
RTVE are conversations
RTVE Play's new Semantic Digital Ecosystem enables a conversational, meaningful, contextual and more personal relationship between RTVE and its users.
RTVE are conversations. Personal conversations, because each user who visits it is different and wants different things, builds different preferences, aspires to different things. RTVE must be able to talk to everyone. Talking with sense, opportunity and usefulness with each person who approaches it would be the goal of RTVE Play's graph, a platform with a strong inclusive vocation which integrates everyone, citizens, regular users, teachers, researchers, students, documentary makers, taking into account its inherent diversity.
The digital project RTVE Grafo allows to generate that story, that personal conversation with different audiences providing a useful, contextual, rich, but above all personal conversation.
The ontological model of RTVE
A domain ontology manages specific concepts within a field, helping to clarify terms and represent specialized knowledge. In the context of RTVE's Knowledge Graph, the EBUCorePlus ontology was used, expanded with other vocabularies to improve the organization of audiovisual information and interoperability between systems.
How the RTVE Knowledge Graph was made: The ontological model
Semantic standards and linked data
The RTVE Play Knowledge Graph has been built on the semantic web standards and according to the principles of the Linked Data Web, which has allowed:
- Connecting Televisión Española's audiovisual resources and documentation management systems with the publication of the digital space RTVE Grafo.
- Optimize the use of these documentation systems, adding value to the work of all areas of the corporation.
- To convert RTVE's information system into a Knowledge Graph that is expressed by means of a Linked Data Web.
- Develop modes of interrogation and visualization of this Graph adapted to different audiences and oriented to maximize the satisfaction of their interests, offering data explicitly related to those results that satisfy the user's questions.
- Build thematic web pages based on a data set or subgraph that meets certain requirements.
- To build a semantically conscious experience of exploration, discovery, questioning and search through RTVE's contents, which makes it possible to explore in depth and in a contextual way any topic related to the digital resources that make up the world of television.
All the contents of this website are represented and published according to W3C standards for the semantic web and in accordance with the principles promoted by the Linking Open Data Project in order to promote and facilitate the publication and linking of data on the web. These semantic metadata generate, as we have already pointed out, a unified knowledge graph that is exploited in the first instance, although not only, on the web itself through the interrogation and recommendation systems, offering users a superior experience.
RTVE's ontological model
A domain ontology (or domain-specific ontology) represents concepts that belong to a specific part of the world; it can therefore be considered as managing highly specialized knowledge. The ontological aspirations of information sciences and technologies tend to close and control vocabularies as far as possible so that the particular meaning of a term belonging to that domain is provided by the ontology in a precise and unambiguous manner. The main ontology or specific vocabulary used in this project has been the reference model EBUCorePlus, which provides the descriptions and the formal structure to describe the explicit and implicit concepts, and their relationships, used in the audiovisual world documentation domain, which in practice allows to represent, with the necessary adjustments, the information contained in RTVE's information systems in an adequate way.
Domain ontologies represent the concepts of their scope in a very specific, bounded and closed way, as we have already pointed out. However, reality as a whole shows a remarkable propensity for continuity and the domains in which the world is organized are often less pure or more mixed than our controlled vocabularies. This is why world systems, such as a public television can be considered, need hybrid ontologies, which come from the mixing and integration of different domain ontologies into a more general representation.
The ontological project developed in RTVE Play for the construction of its Knowledge Graph has extended the EBUCorePlus domain ontology and has hybridized it with metadata schemas and general purpose vocabularies such as Dublin Core (dc) and schema, integrating them into a common ontological framework that represents the set of activities developed in the audiovisual field, understood in the sense of the set of techniques, practices and processes related to the operation of an audiovisual entity.
In the following section we explain the ontological extension and hybridization process carried out in the RTVE Grafo digital semantic platform project. The ontological model is used not only to generate a reusable dataset, but also to solve the set of operations and interrogations that different groups of users may want to perform on the knowledge thus represented.
The RTVE ontology network
With this project, RTVE addresses the creation of an ontology that can operate through a knowledge graph, with the main purpose of improving the semantic interoperability of the new RTVE Grafo platform with various systems, market participants and the RTVE Archive, as well as the implementation of a European standard of reference within the Spanish audiovisual sector, in addition to making available to the general public all the audiovisual heritage of the interactive area through a single point of interrogation.
RTVE Play Ontology. Modeling principles
RTVE, as a member of the European Broadcasting Union (EBU), is part of standardization initiatives, such as the group EBU-AIM, which oversees and promotes the implementation of standard metadata systems, such as EBUCorePlus.
The integration of standards in the field of knowledge representation within an organization leads to an improvement in both internal and external interoperability, simplifies data linking and strengthens the connection with other ontologies that represent the content. For this reason, within the scope of this project, the following design principles have been used as a starting point for the development of the RTVE ontology:
- Use as reference ontology EBUCorePlus.
- Adherence to the reference ontology whenever possible. EBUCOREPLUS classes, attributes and relationships will be preferably adopted in the modeling of the domain whenever it is semantically compatible with the objects of the domain to be modeled.
- Extension mechanisms: when the reference model does not meet the needs required for the modeling of the domain (i.e. in those cases in which specific properties arise that refine a class, when the cardinality of a property changes or when it is semantically relevant), the model will be extended by resorting to inheritance mechanisms. The new classes and attributes are housed in a proper namespace named RTVE Play.
The modeled domain corresponds to the totality of the contents that RTVE has available online (more than two million multimedia resources). These contents are exposed through the RTVE Play website. In those cases where it is possible to make direct use of the reference ontology class, this has been chosen; however, in other cases, an extension mechanism of the reference class has been chosen in order to accommodate the specific properties of RTVE's business model.
Detailed exposition of the RTVE Play ontological model.
The RTVE Play ontology to which we refer has been consolidated in what we call the RTVE Play Ontological Model, which is composed of a set of vocabularies articulated around the EBUCorePlus model.
EBUCorePlus is an extension of the EBUCore specification, which is a standard developed by the European Broadcasting Union (EBU) for the description and exchange of audiovisual content metadata. EBUCore provides a metadata model covering various aspects of content, such as identification, description, rights, and techniques.
EBUCorePlus builds on EBUCore and adds additional capabilities to support specific needs of the audiovisual industry. These extensions include:
- Improvements in content description: Adding more details and categories for a more accurate description of audiovisual content.
- Support for new types of content: Adaptation to new forms of media and formats that may arise.
- Improved interoperability: Facilitating the exchange of metadata between different systems and platforms in a more efficient way.
- Integration with other standards: Better compatibility and integration with other metadata models and technology standards.
- Advances in rights management and content protection: Providing structures to better manage copyright and content distribution.
EBUCorePlus is used primarily by broadcasters, content producers, audiovisual archives, and other players in the media value chain to ensure that metadata associated with audiovisual content is accurate, complete, and useful for a variety of applications, from production and distribution to archiving and retrieval.
As we have said, in the case of the RTVE Play graph, we started from EBUCorePlus extending the ontology, providing it with new classes, attributes and relations in those cases where the specific needs of the project required it, either to give semantic precision to the classes according to the contents handled by RTVE or to refine the cardinality of some of its attributes.
The following image represents the class model defined in the RTVE Play ontology.
A simplification of the general class model that allows us to identify the main classes of the model is represented in the following diagram:
The main entities of the semantic web RTVE Play: Program, Season, Video, Audio, Genre, Agent, are represented according to the above mentioned EBUCorePlus reference model or else extension mechanisms are used for those cases in which the reference model does not cover the needs required for the modeling of the domain (i.e. in those cases in which specific properties arise that refine a class, when the cardinality of some property changes or when it is semantically relevant).
Among the main elements of the domain we find, in the first place, the program, understood as a container of information that relates different elements that coherently form a logical unit of broadcasting. Programs, which can be television or radio programs, are those that group video and audio contents. To model Program we start from the reference ontology class EditorialGroup, which is defined in EBUCorePlus as “a collection/group of media resources”. EditorialGroup covers the concept of program, and has some subclasses (Series, Serial, Collection, etc.) that will allow us to classify the programs conveniently.
As mentioned in the previous paragraph, programs can be TV or Radio (and have associated video and audio resources). To model both video and audio, we start from the Programme class, using TVProgramme for videos (a TVProgramme is “a program for distribution on TV channels”) and RadioProgramme for audios (a RadioProgramme is “a program for distribution on radio channels”).
The programs, audios and videos are related to people and organizations, insofar as they participate in the production of the audiovisual material (actors, directors, producers, etc.). To model this, the EBUCorePlus Agent class has been extended to the namespace of RTVE Play and called Agent, since it is expected that there may be people and organizations coming from different sources, which will need RTVE's own information attributes. Agent is defined in EBUCorePlus as “a contact, person or organization to which is associated a role corresponding to the contribution that the ‘Agent’ brings to the realization of a MediaResource or EditorialObject”. Recall that EditorialObject is the parent class from which EditorialGroup inherits, which is the one we have used as a basis for modeling Program by means of an extension mechanism.
In terms of genres, three of the seven dimensions used in ESCORT2007, the EBU classification system for radio and television programs, have been implemented for programs, audios and videos. Specifically, the dimensions Intention, Format and Content have been used.
In order to ensure greater control of the vocabulary used for the description of the objects, an OWL ontology has been implemented for the semantic project of RTVE Play and for each of the objects mentioned.
- RTVE-Grafo of a Semantic Interoperability Project 1
- Knowledge representation standards 2
- Semantic annotation and population of the RTVE graph 3
- RTVE Play's contextual information interrogation, search and retrieval system. 4
- Semantic Artificial Intelligence able to operate in a ‘’common sense‘’ framework with humans 5
- RTVE are conversations 6
- The RTVE Ontological Model 7