(by Alessandro Capezzuoli, ISTAT official and manager of the data observatory professions and skills Aidr) There is always little talk of metadata, perhaps because the prefix "meta" is unconsciously associated with its original meaning (μετά "with, after") and the consequent connotation of a concept to ethereal and elusive areas, such as metaphysics or metaphors. Probably, the word itself, metadata, does not arouse as much interest as the words blockchain, big data and machine learning. The reductive definition that is given to metadata, the information that describes data, does not help to fully understand their function: it seems to refer to something secondary to the data, which could be done without. It should be added to the definition that without metadata a data loses its meaning, no longer has consistency and cannot be read correctly.
Reducing the function of metadata to an exclusively "descriptive" scope is a dangerous underestimation. First, because the descriptive function refers not to one but to several aspects, which may include the content, structure and context related to the data. Precisely for this reason, there is no single type of metadata. There are descriptive metadata, which are made up of a set of normalized descriptions, useful for identifying data and in semantic search systems that make use of Linked Open Data. Structural metadata, on the other hand, describes the architecture and internal relationships and is essential for using the data correctly. Then there are the management metadata, which include technical information such as the formats or the technological environment adopted.
This general overview, and, for insiders, the words XSD and JSON Object, is enough to understand the enormous potential associated with metadata. If it is relatively easy to imagine a data driven system, it is not so easy to think about the use of metadata to make decisions. The imagination, however, can take a precise form, if it is supported by a practical example. Suppose, hypothetically, that somewhere in the world there is an out-of-control epidemic and that this phenomenon is measured through a rigorous scientific method that detects the number and dynamics of infections and deaths. Let's suppose that, through these “numbers”, there is a high risk of contagion in restaurants and that restaurants are mainly frequented by a specific segment of the population made up of males over 70.
To reduce the risks, one could think of closing the restaurants, or of denying the entrance to indulgent and prone to eating individuals. In the first case, metadata would be needed through which to describe mainly the economic activities, in order to identify the companies that deal with the supply of food and drinks. In the second case, an archive of the population would be needed from which to extract a list of names to which to send the message “No entry to restaurants”. In both cases, quality descriptive and structural metadata would be needed to make a decision. This example, certainly an understatement, allows us to start numerous reflections on the role of metadata. The closure of the restaurants, in the period of maximum Covid risk, was decided through the adoption of the ATECO statistical classification, that is a set of classes and descriptors that more or less precisely identify the economic activities carried out by the companies. statistical purity ”of a classification system, but it cannot be denied that the closure of restaurants was guided by metadata. Similarly, if it had been decided to ban access to restaurants for a certain segment of the population, metadata would have played a key role in the selection of individuals. The two scenarios bring out aspects that are not currently part of the public debate: the governance of metadata, the adoption of shared “languages” to describe data, or more generally scientific phenomena, and the quality of metadata. The decision to close the restaurants was basically possible for a reason: all companies have an ATECO code that refers to a single shared classification system, so it was relatively easy to identify the companies with which the code 56.10.11 was associated - Catering with administration. The same provision could have been inapplicable in a context in which each region had adopted a different classification system, perhaps less rigorous, and decontextualized from the others.
Those who deal with metadata are well aware of the difficulties encountered in integrating different databases in which, for example, the gender is indicated differently, M / F, Male / Female, 0/1, 1/2, or the territory is codified on the basis of different classifications in methodological and temporal terms. Unfortunately, it is not always possible to build a uniform metadata system: sometimes it depends on the mental closure of the data producers with respect to the outside, sometimes on real or presumed claims of greater (or lesser) scientific rigor of a set of metadata compared to another, other times by the adoption of procedures or time series that cannot be interrupted.
The shared use of quality metadata is far from trivial and is often hampered by political and non-methodological issues. If the scope of use of metadata is limited to the labor market and professions, a bleak scenario emerges: on the one hand there is the international classification ISCO (International Standard Classification of Occupation), which would be very well suited to describe, with a shared and quality language, professions and their representation under multiple aspects, on the other hand there are partisan interests, castes, egocentrisms and poor knowledge of the subject, which hinder their application. As a result, recruiting, particularly in the public sector, has suffered from a structural deficiency for many years now, at a time when it cannot be afforded. For this reason, it would be desirable that the item "Metadata, governance, sharing and quality" be included on the agenda of the "digital transformation" topic.