Loading...
How the RTVE Knowledge Graph was made: The ontological model
Semantic standards and linked data
The RTVE Play Knowledge Graph has been built on the semantic web standards and according to the principles of the Linked Data Web, which has allowed:
- Connecting Televisión Española's audiovisual resources and documentation management systems with the publication of the digital space RTVE Grafo.
- Optimize the use of these documentation systems, adding value to the work of all areas of the corporation.
- To convert RTVE's information system into a Knowledge Graph that is expressed by means of a Linked Data Web.
- Develop modes of interrogation and visualization of this Graph adapted to different audiences and oriented to maximize the satisfaction of their interests, offering data explicitly related to those results that satisfy the user's questions.
- Build thematic web pages based on a data set or subgraph that meets certain requirements.
- To build a semantically conscious experience of exploration, discovery, questioning and search through RTVE's contents, which makes it possible to explore in depth and in a contextual way any topic related to the digital resources that make up the world of television.
All the contents of this website are represented and published according to W3C standards for the semantic web and in accordance with the principles promoted by the Linking Open Data Project in order to promote and facilitate the publication and linking of data on the web. These semantic metadata generate, as we have already pointed out, a unified knowledge graph that is exploited in the first instance, although not only, on the web itself through the interrogation and recommendation systems, offering users a superior experience.
RTVE's ontological model
A domain ontology (or domain-specific ontology) represents concepts that belong to a specific part of the world; it can therefore be considered as managing highly specialized knowledge. The ontological aspirations of information sciences and technologies tend to close and control vocabularies as far as possible so that the particular meaning of a term belonging to that domain is provided by the ontology in a precise and unambiguous manner. The main ontology or specific vocabulary used in this project has been the reference model EBUCorePlus, which provides the descriptions and the formal structure to describe the explicit and implicit concepts, and their relationships, used in the audiovisual world documentation domain, which in practice allows to represent, with the necessary adjustments, the information contained in RTVE's information systems in an adequate way.
Domain ontologies represent the concepts of their scope in a very specific, bounded and closed way, as we have already pointed out. However, reality as a whole shows a remarkable propensity for continuity and the domains in which the world is organized are often less pure or more mixed than our controlled vocabularies. This is why world systems, such as a public television can be considered, need hybrid ontologies, which come from the mixing and integration of different domain ontologies into a more general representation.
The ontological project developed in RTVE Play for the construction of its Knowledge Graph has extended the EBUCorePlus domain ontology and has hybridized it with metadata schemas and general purpose vocabularies such as Dublin Core (dc) and schema, integrating them into a common ontological framework that represents the set of activities developed in the audiovisual field, understood in the sense of the set of techniques, practices and processes related to the operation of an audiovisual entity.
In the following section we explain the ontological extension and hybridization process carried out in the RTVE Grafo digital semantic platform project. The ontological model is used not only to generate a reusable dataset, but also to solve the set of operations and interrogations that different groups of users may want to perform on the knowledge thus represented.
The RTVE ontology network
With this project, RTVE addresses the creation of an ontology that can operate through a knowledge graph, with the main purpose of improving the semantic interoperability of the new RTVE Grafo platform with various systems, market participants and the RTVE Archive, as well as the implementation of a European standard of reference within the Spanish audiovisual sector, in addition to making available to the general public all the audiovisual heritage of the interactive area through a single point of interrogation.
RTVE Play Ontology. Modeling principles
RTVE, as a member of the European Broadcasting Union (EBU), is part of standardization initiatives, such as the group EBU-AIM, which oversees and promotes the implementation of standard metadata systems, such as EBUCorePlus.
The integration of standards in the field of knowledge representation within an organization leads to an improvement in both internal and external interoperability, simplifies data linking and strengthens the connection with other ontologies that represent the content. For this reason, within the scope of this project, the following design principles have been used as a starting point for the development of the RTVE ontology:
- Use as reference ontology EBUCorePlus.
- Adherence to the reference ontology whenever possible. EBUCOREPLUS classes, attributes and relationships will be preferably adopted in the modeling of the domain whenever it is semantically compatible with the objects of the domain to be modeled.
- Extension mechanisms: when the reference model does not meet the needs required for the modeling of the domain (i.e. in those cases in which specific properties arise that refine a class, when the cardinality of a property changes or when it is semantically relevant), the model will be extended by resorting to inheritance mechanisms. The new classes and attributes are housed in a proper namespace named RTVE Play.
The modeled domain corresponds to the totality of the contents that RTVE has available online (more than two million multimedia resources). These contents are exposed through the RTVE Play website. In those cases where it is possible to make direct use of the reference ontology class, this has been chosen; however, in other cases, an extension mechanism of the reference class has been chosen in order to accommodate the specific properties of RTVE's business model.
Detailed exposition of the RTVE Play ontological model.
The RTVE Play ontology to which we refer has been consolidated in what we call the RTVE Play Ontological Model, which is composed of a set of vocabularies articulated around the EBUCorePlus model.
EBUCorePlus is an extension of the EBUCore specification, which is a standard developed by the European Broadcasting Union (EBU) for the description and exchange of audiovisual content metadata. EBUCore provides a metadata model covering various aspects of content, such as identification, description, rights, and techniques.
EBUCorePlus builds on EBUCore and adds additional capabilities to support specific needs of the audiovisual industry. These extensions include:
- Improvements in content description: Adding more details and categories for a more accurate description of audiovisual content.
- Support for new types of content: Adaptation to new forms of media and formats that may arise.
- Improved interoperability: Facilitating the exchange of metadata between different systems and platforms in a more efficient way.
- Integration with other standards: Better compatibility and integration with other metadata models and technology standards.
- Advances in rights management and content protection: Providing structures to better manage copyright and content distribution.
EBUCorePlus is used primarily by broadcasters, content producers, audiovisual archives, and other players in the media value chain to ensure that metadata associated with audiovisual content is accurate, complete, and useful for a variety of applications, from production and distribution to archiving and retrieval.
As we have said, in the case of the RTVE Play graph, we started from EBUCorePlus extending the ontology, providing it with new classes, attributes and relations in those cases where the specific needs of the project required it, either to give semantic precision to the classes according to the contents handled by RTVE or to refine the cardinality of some of its attributes.
The following image represents the class model defined in the RTVE Play ontology.
A simplification of the general class model that allows us to identify the main classes of the model is represented in the following diagram:
The main entities of the semantic web RTVE Play: Program, Season, Video, Audio, Genre, Agent, are represented according to the above mentioned EBUCorePlus reference model or else extension mechanisms are used for those cases in which the reference model does not cover the needs required for the modeling of the domain (i.e. in those cases in which specific properties arise that refine a class, when the cardinality of some property changes or when it is semantically relevant).
Among the main elements of the domain we find, in the first place, the program, understood as a container of information that relates different elements that coherently form a logical unit of broadcasting. Programs, which can be television or radio programs, are those that group video and audio contents. To model Program we start from the reference ontology class EditorialGroup, which is defined in EBUCorePlus as “a collection/group of media resources”. EditorialGroup covers the concept of program, and has some subclasses (Series, Serial, Collection, etc.) that will allow us to classify the programs conveniently.
As mentioned in the previous paragraph, programs can be TV or Radio (and have associated video and audio resources). To model both video and audio, we start from the Programme class, using TVProgramme for videos (a TVProgramme is “a program for distribution on TV channels”) and RadioProgramme for audios (a RadioProgramme is “a program for distribution on radio channels”).
The programs, audios and videos are related to people and organizations, insofar as they participate in the production of the audiovisual material (actors, directors, producers, etc.). To model this, the EBUCorePlus Agent class has been extended to the namespace of RTVE Play and called Agent, since it is expected that there may be people and organizations coming from different sources, which will need RTVE's own information attributes. Agent is defined in EBUCorePlus as “a contact, person or organization to which is associated a role corresponding to the contribution that the ‘Agent’ brings to the realization of a MediaResource or EditorialObject”. Recall that EditorialObject is the parent class from which EditorialGroup inherits, which is the one we have used as a basis for modeling Program by means of an extension mechanism.
In terms of genres, three of the seven dimensions used in ESCORT2007, the EBU classification system for radio and television programs, have been implemented for programs, audios and videos. Specifically, the dimensions Intention, Format and Content have been used.
In order to ensure greater control of the vocabulary used for the description of the objects, an OWL ontology has been implemented for the semantic project of RTVE Play and for each of the objects mentioned.