Modelling Vagueness and Uncertainty in DH
Digital Humanities (DH) aims not only to archive and make available materials (in particular historical artefacts) but also to introduce a better scientific reflexion into humanities by propagating computational methods. However more than ten years of consequent employment of computer-aided research did not lead to a hermeneutic-adequate digital modelling of historical objects. The main crux remains in most DH-attempts the storage of objects in database architectures designed for natural science application, the annotation with very general metadata, the mark-up with shallow linguistic information no after the language or the purpose of the document and the quantitative analysis. Not only images and texts become artificially precise, but the mutual illumination of texts and other media loses its traditional hermeneutic power.
Vagueness is one of the most important, most significant but most difficult features of historical objects, especially texts and images. Whereas ambiguity – several distinct but clear meanings- and uncertainty – conceptually clear but unknown or forgotten data - are relatively well describable phenomena, vagueness is undefined by semantics or pragmatics.
This workshop aims at bringing together for the first time experts in representation of vagueness and uncertainty and scholars from DH who went beyond state-of the art in their research and tried to apply existent theories like fuzzy logic in their work.
Programme & Abstracts 09.07.2020
10:00 -10:15 Opening session
10:15 – 11:15 Session 1 Chair: Walther v. Hahn
Invited Talk: Manfred Thaller, “The Fog of History”
Historians have frequently to make decisions based on evidence which is incomplete, contradictory and anything but precise: the “Fog of History”. Historical information systems and software systems used for their implementation must therefor be based upon a model of information which reflects these properties of the available sources.
The presentation starts by categorizing the problems contributing to this situation:
Examples for each of these problems are briefly discussed.
We propose that technical support for these problems can be provided by solutions for four technical challenges:
Examples for each of the types of solutions are given, referring to particularly promising branches of computer science for their solution.
Finally a rough sketch of a structural model to integrate these solutions is given. It assumes that the basic data structure needed for such a model is graph oriented, using graphs not only for the data to be held by the information system, but also for most of the supporting objects representing the knowledge needed for interpretation. The implementation of these graphs has to support the mathematical structure of the graph into an environment, however, which goes beyond the classical definition of a graph.
11:15 – 11:30 Break
11:30 – 12:15 Session 2 Chair: Michael Piotrowski
Cristina Vertan, Walther v. Hahn, University of Hamburg: „Detecting, Prcessing and Visualising Vagueness and Uncertainty sources in multilingual historical data collections“
Digital methods can facilitate analysis on the reliability of translations but also of the historical facts claimed by the author . In order to be effective these methods must consider an intrinsic feature of natural language: the ability of producing vague utterances. The project HerCoRe- Hermeneutic and Computer –based approached for investigating reliability, consistency and vagueness in historical sources aims at modelling and annotating five levels of vague assertions
We develop an annotation formalism which allows:
The knowledge base backbone is ensured by a fuzzy ontology modelled in OWL2. We distinguish between fixed concepts and relations (like geographical elements: river, mountain, island) and notions for which several “contexts can be defined. E.g. a geographical notion like “Danube” is within one historical context a border of the administrative notion “Ottoman empire”, and in another one the border to the so called administrative notion “Roman empire”. The historical contexts are specified by further fuzzy data properties (e.g. time , placement).
For the detection of linguistic vagueness we follow a multilingual approach. We collected initially listed indicators in the three languages involved in the project (Latin, German and Romanian). Based on (Pinkal 1980), (Pinkal 1985) we distinguish between:
The initial collections of linguistic indicators are enriched through synsets in the corresponding Wordnets.
In this contribution we will present the general set-up of the system, the annotation framework as well as the compute-based approach for marking different types of vagueness and uncertainty
12:15 – 12:30 Discussions
12:30 – 13:30 Lunch Break
13:30 -15:45 Session 3 Chair: Cristina Vertan
13:30 – 14:00 Wieslawa Duzy, Polish Academy of Sciences „Unclarity and Uncertainty of Historical Spatial data from Poland“
The paper presents experiences and case studies analysed and elaborated in the Department of Historical Atlas, Institute of History Polish Academy of Sciences. Discussed projects focus mostly on spatial data, covering – among other – place names, types of settlements, and their locality. Our research questionnaire includes also mereological issues and relations, as well as identification and georeferencing historical settlements. Data sets are collected from various historical sources, i.e. 16th century tax registers, 18th and 19th century cadastres, and 20th century national surveys. Last but not least, historical maps are essential for our projects.
The paper will focus on unclarity and uncertainty of historical sources with some insight into discussion on building stable identifiers and semantics. The paper will discuss the whole process of collecting historical data and preparing them to be explained and understood properly, and prepared to be joined with external data sets. Some of our former results, outcomes and ongoing research are presented online:
14:00 – 14:30 Davor Lauc, University of Zagreb : „Reasoning about inexact temporal expressions using fuzzy logic and deep learning“
The problem of temporal reasoning is known since antiquity. The contemporary-relevant logical analysis began in the second half of the twentieth century when Arthur Prior developed temporal logic as a variant of modal logic. The alternative approach, based on the use of classical predicate logic, was devised by philosopher Donald Davidson. Davidson's proposal was an inspiration to the calculus of events as well as the interval algebra developed by James Allen. To this day, many other approaches to display and reasoning in this domain have been developed, such as temporal databases, temporal logical programming and many others.
All these approaches represent the temporal determinants either as a point in the timeline or as a uniform interval. Many temporal expressions occurring in a range of social sciences and humanities, particularly history, as well as in everyday reasoning, cannot be adequately represented in this way. For example, if something happened during the Industrial Revolution, it's possible that it was 1745, but less likely than in 1801. Also, if we have the exact record that a person became the mother of 1848, she could be born any day from 1790-1839, but is more likely to be born 1828 than 1800. The second level of uncertainty brings the credibility or reliability of the source. For example, if you are verifying whether two events are concurrent in two historical references, and there is no exact overlap, we will be inclined to choose an event that occurred in the temporal vicinity as more likely, given the possibility of error in the source.
In this research, we are investigating possibilities of representing the semantics of inexact temporal expressions using fuzzy sets. For the purpose of empirical validation of the reasoning models, the first part of the research includes developing a NER system for recognising inexact dates in the text. Although there are well-developed models for identifying temporal expressions such as SuTime and HeidelTime, their support for vague temporal expression is limited. The developed transformer-based model is applied to the English Wikipedia with satisfactory f-score. The second part of the research is the development of a neural model for the generation of a fuzzy set representing the meaning of temporal expression. As fuzzy sets rendering temporal data with the granularity of dates, they are huge. A model for dimensionality reduction is developed to facilitate efficient storage and manipulation of those sets. The third part of the research is the development of different formal models for relations and operation on the inexact dates. The fundamental and the most challenging relationship is that of similarity between two imprecise dates. The relationship is modelled using fuzzy logic t-norm operation with parameters learned from empirical data. The fourth part of the research includes logical and philosophical analysis of the problem of similarity in the context of the temporal reasoning.
14:30 – 15:00 Dimitar Iliev, University of Sofia: „What’s in a Name? Encoding Ambiguity in Ancient Greek Inscriptions from Bulgaria“
The significant ancient epigraphic heritage in Bulgaria includes more than 5,000 Greek inscriptions. A selection of them is currently being encoded in EpiDoc TEI XML and visualized and indexed through a customized EFES platform for the purposes of the Telamon epigraphic collection created by the University of Sofia in the framework of the National CLaDA-BG Consortium (CLARIN+DARIAH). The corpus aims mainly at the encoding and representation of historical figures and events, local names and places, dates, numbers, and sacred or administrative offices. In short, rather than creating a linguistic corpus of inscriptional texts, the focus of the collection is on different sorts of Named Entities. Among them, attested names of persons are probably the most interesting for the target scholarly audience of the collection. They are also the most difficult to approach. On the one hand, this is partly due to the ambiguity of reference inherent in proper names by definition. When dealing with historical names, editors of texts, including of digital corpora, face various methodological problems concerning identification of a historical person behind a name or approaching names according to their onomastic and prosopographical value at the same time. The overlapping linguistic, cultural and political traditions of the Roman province of Thrace in the first centuries CE contribute as well to the exciting complexity of its naming system. The extent of one’s name and the comparative relevance of its different elements and accompanying nicknames or titles is a constantly open issue requiring different approaches to the encoding and processing of digital inscriptions. Finally yet importantly, monuments on stone often provide texts open to different interpretations due to physical damage caused by time or human influence. Thus, the inevitable lacunae are frequently filled with the aid of differing research hypotheses, constructs and paradigms: a range of alternative information that has to be encoded and modelled by the digital editors. The current talk will present various issues connected with these ambiguities and uncertainties around attested names in the ancient Greek inscriptions from Bulgaria. It will also examine the proposed approaches towards encoding them as well as creating RDF’s based on them and indexing them in the front-end service applied in the work on the Telamon corpus (current repository: https://github.com/DHLabUniSofia/Telamon-EFES).
15:00 – 15: 45 Discussions
15:45 – 16:00 Break
16:00 – 17:15 Session 4 Chair: Cristina Vertan
16:00 – 16:30 Alexandra Pouulovassilis, Birkbeck Knowledge Lab: „Managing missing and uncertain data on the UK museum sector“
|There is a general consensus that the UK museum sector has a problem with data. Over the last forty years numerous reports have commented on the lack of an authoritative list of museums: that there is no way of identifying how many museums there are in the UK, what they are, when they opened, and what levels of visitors do they have. The Mapping Museums project is in part a response to this issue. The project aims to analyse the emergence and development of the UK museum sector from 1960 until the present day, with particular emphasis on the wave of small independent museums that opened from the mid 1970s onwards. The project has involved extensive archival research, capturing data on over 4,000 museums, conceptualizing that information, and designing ways of searching and visualizing the ensuing knowledge base. Here, we focus on the techniques we adopted within the knowledge base for managing the missing and uncertain data that was encountered during the project’s two-year data collection process.|
16:30 – 17:00 Marc Kupietz, IDS Mannheim, „Coping with Uncertainty in Synchronous Corpus Linguistics“
Corpus linguistics, like most empirical disciplines, is fundamentally affected by uncertainty. The typical modus operandi of the empirical disciplines is to extrapolate from observations on sample data to draw more general conclusions about the properties of the population under study. If the sample does not coincide with the population, such inferences are always subject to uncertainty. However, especially in synchronous corpus linguistics there are further and particular sources of uncertainty. Somewhat different from also historical corpus linguistics where the known set of published documents can be regarded as population, but above all different from other disciplines, synchronous corpus linguistics has the fundamental problem that the population typically cannot be defined in a way that is even remotely operationalizable. This means that the quality of a corpus cannot be generally assessed in terms of how well it represents the population as a sample, to what extent it allows inferences, or if inferential statistics is at all applicable (Koplenig 2017). Further problems arise because 1.) language has aspects of an artifact, so that systematicity is only of limited use as a criterion for plausibility (Keibel & Kupietz 2009), 2.) context variables that influence language are unknown and variable, so that the application of stratified sampling techniques to avoid sampling errors is difficult (Kupietz 2016), and 3.) interpretative categories such as parts of speech need to be used in the form of automatic annotations that are not necessarily empirically adequate (Belica et al. 2011). In my paper I will discuss some approaches on how to deal with these difficulties and how, in the context of digital humanities and science, as well as sometimes arts and crafts, synchronous corpus linguistics can be performed with a prospect of knowledge gain, despite the many sources of uncertainty.
17:00 – 17:15 Discussions
Programme & Abstracts 10.07.2020
10:00 – 10:45 Session 5 Chair Walther v. Hahn
Invited Talk: Michael Piotrowski, University of Lausanne: What Are We Uncertain About? The Challenge of Historiographical Uncertainty
|When people talk about uncertainty in a historical context in digital humanities, most of the time they talk about questions such as the exact date of birth of a person, whether two names refer to one or two persons, what geographical location a place name refers to, or the location of a person at a specific point in time. These are important questions and it is important to find ways to computationally model the associated uncertainty. However, history is ultimately not about drawing exact maps or time lines, even if those can certainly help: history is about causality. In this talk, I would like to reflect on some of the issues on the level of historical interpretation, i.e., historiographical rather than historical uncertainty.|
10:45 – 11:00 Break
11:00 -12:30 Session 6 Chair Manfred Thaller
11:00 – 11:30 Fairouz Zendaoui, Ecole Nationale Supérieure d'Informatique, Alger, Quantifying and Representing Uncertainty of Historical Information
The digital humanities are a field of research, teaching and engineering at the crossroads of computer science and arts, literature, human and social sciences. Historical disciplines focus on digital tools, especially databases. Current interests and efforts focus on the representation of historical knowledge in order to facilitate the diffusion, sharing and exploitation of collective knowledge. Simplifying and structuring qualitatively complex knowledge, quantifying it in a certain way to make it reusable and easily accessible are all aspects that are not new to historians. Computer science is currently approaching a solution to some of these issues, or at least making it easier to work with historical data.
11:30 – 12:00 Umberto Straccia, Italian National Research Council, „Fuzziness in Semantic Web languages“
|I present the state of the art in representing and reasoning with fuzzy knowledge in Semantic Web Languages such as triple languages RDF/RDFS, conceptual languages of the OWL 2 family and rule languages.|
12:00- 12:30 Francesca Lisi, University of Bari : „Representing Fuzzy Quantified Sentences in OWL 2“
|Fuzzy quantifiers, such as "many" and "most", are used to express imprecise properties on fuzzy information granules. Examples of fuzzy quantified sentences are the following:
During the talk I will report on a method for representing both kinds of fuzzy quantified sentences in OWL 2 ontologies.
12:30 – 12:45 Discussions
12:45 – 13:45 Lunch Break
13:45 – 15:30 Session 7 Chair Cristina Vertan
13: 45 – 14:15 Fernnando Bobillo , University of Zaragoza, „Fuzzy ontology reasoning and applications“
Fuzzy ontologies have proved to be very useful in many application domains. One of the reasons for their success is the availability of fuzzy ontology reasoners, i.e.., software tools that are able to discover implicit knowledge that can derived from the axioms of a fuzzy ontology. In this talk, we will firstly examine the fuzzy ontology reasoner fuzzyDL and then we will overview some important examples of applications using fuzzy ontologies.
The first objective of this talk is to provide an overview of fuzzyDL, which is probably the most mature fuzzy ontology reasoner. We will discuss the fuzzy ontology elements that can be represented, as fuzzyDL supports fuzzy extensions of OWL 2 constructors and axioms, but also some traditional fuzzy logic operators, such as aggregation or defuzzification. We will also overview the supported reasoning tasks (some of them specific to the fuzzy case) and their mutual relationships. The different ways to interact with the tool will be reviewed, as different languages (a native syntax and Fuzzy OWL 2) and interfaces (terminal mode, graphical interface, and a Java API) are supported. Finally, we will provide some key implementation details (such as the underlying reasoning algorithm) and analyze some notable differences with other fuzzy ontology reasoners (paying special attention to DeLorean).
The second objective of this talk is to show some notable use cases of fuzzy ontologies in real world problems. In particular, we will focus on some recent developments where the speaker has been involved. This includes recommender systems (where the use of linguistic variables allows users to submit flexible queries), human gait recognition (where fuzzy sets make it possible to deal with the imprecision of the sensors the capture the motion), and blockchain smart contracts (where fuzzy ontologies make it possible to obtain partial agreements between two or more parts).
14:15 – 14:45 Jesús Medina, University of Cadiz, DIGital FORensics: evidence Analysis via intelligent Systems and Practices (DigForASP). COST Action 17124. Goals and intermediate achievements.
COST Action DigForASP (digforasp.uca.es) is focused on fostering the synergy between security forces, related agencies, institutions of the European Union and near neighboring countries, associations, and companies in the field, in order to establish a solid network to introduce new technologies based on Artificial Intelligence and Automatic Reasoning into the digital analysis of evidences, which improves processes and obtains more direct and efficient results. In this talk, we will present the main goals and the current achievements obtained from different members of the Action and the research team Mathematics for Computational Intelligence Systems (M·CIS, harmonic.uca.es/mcis).
14:45 – 15:15 Franziska Diehr, Free University Berlin, Modelling vagueness in the deciphering of Classic Mayan hieroglyphs - A criteria-based approach for the qualitative assessment of readings proposals
|The project ‘Text Database and Dictionary of Classic Mayan’ aims at creating a machine-readable corpus of all Maya texts and compiling a dictionary on this basis. The characteristics of this complex writing system pose particular challenges to research, resulting in contradictory and ambiguous deciphering hypotheses. With our approach, we present a system for the qualitative evaluation of reading proposals that is integrated into a digital Sign Catalogue for Mayan hieroglyphs, establishing a novel concept for sign systematisation and classification. In the presentation we focus in particular on the modelling process and thus emphasizes the role of knowledge representation in digital humanities research.|
15:15 – 15:30 Discussions
15:30 – 15:45 Break
15:45 – 16:15 Final Discussions
16:15- Closing Session