People

Latest Research Publications:
Latest Research Publications:

My Master’s research involves conducting surveys to find patterns of human reasoning and using these findings to inform and evaluate the reasoning processes of Artificial Intelligence systems.
I would describe myself as a critical thinker who is well-organised and thorough. I hold the esteemed Vice-Chancellor's Master’s Research Scholarship, awarded for academic excellence.
I have co-authored a paper, "Cognitive Defeasible Reasoning: the Extent to which Forms of Defeasible Reasoning Correspond with Human Reasoning", which will be published by Springer in the CCIS series in 2020.
To get in touch, email me via <bkrcla003@myuct.ac.za> or connect with me on LinkedIn.
Latest Research Publications:

Latest Research Publications:
A robust theoretical framework that can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification.
@{236, author = {Marelie Davel, Marthinus Theunissen, Arnold Pretorius, Etienne Barnard}, title = {DNNs as layers of cooperating classifiers}, abstract = {A robust theoretical framework that can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification.}, year = {2020}, journal = {The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)}, pages = {3725 - 3732}, month = {07/02-12/02/2020}, address = {New York}, }
The understanding of generalization in machine learning is in a state of flux. This is partly due to the elatively recent revelation that deep learning models are able to completely memorize training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about generalization. The phenomenon was brought to light and discussed in a seminal paper by Zhang et al. [24]. We expand upon this work by discussing local attributes of neural network training within the context of a relatively simple and generalizable framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the global deep learning model to generalize in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterized multilayer perceptrons and controlled noise in the training data. The main insights are that deep learning models are optimized for training data modularly, with different regions in the function space dedicated to fitting distinct kinds of sample information. Detrimental overfitting is largely prevented by the fact that different regions in the function space are used for prediction based on the similarity between new input data and that which has been optimized for.
@{284, author = {Marthinus Theunissen, Marelie Davel, Etienne Barnard}, title = {Insights regarding overfitting on noise in deep learning}, abstract = {The understanding of generalization in machine learning is in a state of flux. This is partly due to the elatively recent revelation that deep learning models are able to completely memorize training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about generalization. The phenomenon was brought to light and discussed in a seminal paper by Zhang et al. [24]. We expand upon this work by discussing local attributes of neural network training within the context of a relatively simple and generalizable framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the global deep learning model to generalize in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterized multilayer perceptrons and controlled noise in the training data. The main insights are that deep learning models are optimized for training data modularly, with different regions in the function space dedicated to fitting distinct kinds of sample information. Detrimental overfitting is largely prevented by the fact that different regions in the function space are used for prediction based on the similarity between new input data and that which has been optimized for.}, year = {2019}, journal = {South African Forum for Artificial Intelligence Research (FAIR)}, pages = {49-63}, address = {Cape Town, South Africa}, }
The generalization capabilities of deep neural networks are not well understood, and in particular, the influence of activation functions on generalization has received little theoretical attention. Phenomena such as vanishing gradients, node saturation and network sparsity have been identified as possible factors when comparing different activation functions [1]. We investigate these factors using fully connected feedforward networks on two standard benchmark problems, and find that the most salient differences between networks with sigmoidal and ReLU activations relate to the way that class-distinctive information is propagated through a network.
@{279, author = {Arnold Pretorius, Etienne Barnard, Marelie Davel}, title = {ReLU and sigmoidal activation functions}, abstract = {The generalization capabilities of deep neural networks are not well understood, and in particular, the influence of activation functions on generalization has received little theoretical attention. Phenomena such as vanishing gradients, node saturation and network sparsity have been identified as possible factors when comparing different activation functions [1]. We investigate these factors using fully connected feedforward networks on two standard benchmark problems, and find that the most salient differences between networks with sigmoidal and ReLU activations relate to the way that class-distinctive information is propagated through a network.}, year = {2019}, journal = {South African Forum for Artificial Intelligence Research (FAIR)}, pages = {37-48}, month = {04/12-07/12}, publisher = {CEUR Workshop Proceedings}, address = {Cape Town, South Africa}, }
Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.
@{275, author = {Christiaan Van der Walt, Etienne Barnard}, title = {Variable Kernel Density Estimation in High-dimensional Feature Spaces}, abstract = {Estimating the joint probability density function of a dataset is a central task in many machine learning applications. In this work we address the fundamental problem of kernel bandwidth estimation for variable kernel density estimation in high-dimensional feature spaces. We derive a variable kernel bandwidth estimator by minimizing the leave-one-out entropy objective function and show that this estimator is capable of performing estimation in high-dimensional feature spaces with great success. We compare the performance of this estimator to state-of-the art maximum likelihood estimators on a number of representative high-dimensional machine learning tasks and show that the newly introduced minimum leave-one-out entropy estimator performs optimally on a number of high-dimensional datasets considered.}, year = {2017}, journal = {AAAI Conf. on Artificial Intelligence (AAAI-17)}, pages = {2674-2680}, month = {04/02-09/04}, }
Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. One approach to assist with resource-scarce ASR system development, is to select 'useful' training samples which could reduce the resources needed to collect new corpora. In this work, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions.
@article{291, author = {Neil Kleynhans, Etienne Barnard}, title = {Efficient data selection for ASR}, abstract = {Automatic speech recognition (ASR) technology has matured over the past few decades and has made significant impacts in a variety of fields, from assistive technologies to commercial products. However, ASR system development is a resource intensive activity and requires language resources in the form of text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of ASR systems in the developing world is severely inhibited. One approach to assist with resource-scarce ASR system development, is to select 'useful' training samples which could reduce the resources needed to collect new corpora. In this work, we propose a new data selection framework which can be used to design a speech recognition corpus. We show for limited data sets, independent of language and bandwidth, the most effective strategy for data selection is frequency-matched selection and that the widely-used maximum entropy methods generally produced the least promising results. In our model, the frequency-matched selection method corresponds to a logarithmic relationship between accuracy and corpus size; we also investigated other model relationships, and found that a hyperbolic relationship (as suggested from simple asymptotic arguments in learning theory) may lead to somewhat better performance under certain conditions.}, year = {2015}, journal = {Language Resources and Evaluation}, volume = {49}, pages = {327-353}, issue = {2}, publisher = {Springer Science+Business Media}, address = {Dordrecht}, doi = {10.1007/s10579-014-9285-0}, }
DEGREES LINKED TO THIS RESEARCH GROUP:
1) 2018-current PhD (Philosophy): 'Interfaces between Knowledge Representation and Reasoning and Political Philosophy: The Symbiotic Relationship Between Morality and Justice'.
TALKS:
1) 'Philosophy in/as Translation' (PSSA 2019);
2) 'Decolonizing Knowledge' (PPA 2019);
3) 'AI and the Social Good' (4th CAIR/UP Symposium 2019);
4) 'Decolonization and Alterity: Intersecting Theories and Praxis' (PPA 2018).
Latest Research Publications:
Latest Research Publications:
In recent work, we addressed an important limitation in previous ex- tensions of description logics to represent defeasible knowledge, namely the re- striction in the semantics of defeasible concept inclusion to a single preference or- der on objects of the domain. Syntactically, this limitation translates to a context- agnostic notion of defeasible subsumption, which is quite restrictive when it comes to modelling different nuances of defeasibility. Our point of departure in our recent proposal allows for different orderings on the interpretation of roles. This yields a notion of contextual defeasible subsumption, where the context is informed by a role. In the present paper, we extend this work to also provide a proof-theoretic counterpart and associated results. We define a (naïve) tableau- based algorithm for checking preferential consistency of contextual defeasible knowledge bases, a central piece in the definition of other forms of contextual defeasible reasoning over ontologies, notably contextual rational closure.
@{247, author = {Katarina Britz, Ivan Varzinczak}, title = {Preferential tableaux for contextual defeasible ALC}, abstract = {In recent work, we addressed an important limitation in previous ex- tensions of description logics to represent defeasible knowledge, namely the re- striction in the semantics of defeasible concept inclusion to a single preference or- der on objects of the domain. Syntactically, this limitation translates to a context- agnostic notion of defeasible subsumption, which is quite restrictive when it comes to modelling different nuances of defeasibility. Our point of departure in our recent proposal allows for different orderings on the interpretation of roles. This yields a notion of contextual defeasible subsumption, where the context is informed by a role. In the present paper, we extend this work to also provide a proof-theoretic counterpart and associated results. We define a (naïve) tableau- based algorithm for checking preferential consistency of contextual defeasible knowledge bases, a central piece in the definition of other forms of contextual defeasible reasoning over ontologies, notably contextual rational closure.}, year = {2019}, journal = {28th International Conference on Automated Reasoning with Analytic Tableaux and Related Methods (TABLEAUX)}, pages = {39-57}, month = {03/09-05/09}, publisher = {Springer LNAI no. 11714}, isbn = {ISBN 978-3-030-29026-9}, url = {https://www.springer.com/gp/book/9783030290252}, }
Description logics have been extended in a number of ways to support defeasible reason- ing in the KLM tradition. Such features include preferential or rational defeasible concept inclusion, and defeasible roles in complex concept descriptions. Semantically, defeasible subsumption is obtained by means of a preference order on objects, while defeasible roles are obtained by adding a preference order to role interpretations. In this paper, we address an important limitation in defeasible extensions of description logics, namely the restriction in the semantics of defeasible concept inclusion to a single preference order on objects. We do this by inducing a modular preference order on objects from each modular preference order on roles, and using these to relativise defeasible subsumption. This yields a notion of contextualised rational defeasible subsumption, with contexts described by roles. We also provide a semantic construction for rational closure and a method for its computation, and present a correspondence result between the two.
@article{246, author = {Katarina Britz, Ivan Varzinczak}, title = {Contextual rational closure for defeasible ALC}, abstract = {Description logics have been extended in a number of ways to support defeasible reason- ing in the KLM tradition. Such features include preferential or rational defeasible concept inclusion, and defeasible roles in complex concept descriptions. Semantically, defeasible subsumption is obtained by means of a preference order on objects, while defeasible roles are obtained by adding a preference order to role interpretations. In this paper, we address an important limitation in defeasible extensions of description logics, namely the restriction in the semantics of defeasible concept inclusion to a single preference order on objects. We do this by inducing a modular preference order on objects from each modular preference order on roles, and using these to relativise defeasible subsumption. This yields a notion of contextualised rational defeasible subsumption, with contexts described by roles. We also provide a semantic construction for rational closure and a method for its computation, and present a correspondence result between the two.}, year = {2019}, journal = {Annals of Mathematics and Artificial Intelligence}, volume = {87}, pages = {83-108}, issue = {1-2}, isbn = {ISSN: 1012-2443}, url = {https://link.springer.com/article/10.1007/s10472-019-09658-2}, doi = {10.1007/s10472-019-09658-2}, }
In this paper we present an approach to defeasible reasoning for the description logic ALC. The results discussed here are based on work done by Kraus, Lehmann and Magidor (KLM) on defeasible conditionals in the propositional case. We consider versions of a preferential semantics for two forms of defeasible subsumption, and link these semantic constructions formally to KLM-style syntactic properties via representation results. In addition to showing that the semantics is appropriate, these results pave the way for more effective decision procedures for defeasible reasoning in description logics. With the semantics of the defeasible version of ALC in place, we turn to the investigation of an appropriate form of defeasible entailment for this enriched version of ALC. This investigation includes an algorithm for the computation of a form of defeasible entailment known as rational closure in the propositional case. Importantly, the algorithm relies completely on classical entailment checks and shows that the computational complexity of reasoning over defeasible ontologies is no worse than that of the underlying classical ALC. Before concluding, we take a brief tour of some existing work on defeasible extensions of ALC that go beyond defeasible subsumption.
@inbook{240, author = {Katarina Britz, Giovanni Casini, Thomas Meyer, Ivan Varzinczak}, title = {A KLM Perspective on Defeasible Reasoning for Description Logics}, abstract = {In this paper we present an approach to defeasible reasoning for the description logic ALC. The results discussed here are based on work done by Kraus, Lehmann and Magidor (KLM) on defeasible conditionals in the propositional case. We consider versions of a preferential semantics for two forms of defeasible subsumption, and link these semantic constructions formally to KLM-style syntactic properties via representation results. In addition to showing that the semantics is appropriate, these results pave the way for more effective decision procedures for defeasible reasoning in description logics. With the semantics of the defeasible version of ALC in place, we turn to the investigation of an appropriate form of defeasible entailment for this enriched version of ALC. This investigation includes an algorithm for the computation of a form of defeasible entailment known as rational closure in the propositional case. Importantly, the algorithm relies completely on classical entailment checks and shows that the computational complexity of reasoning over defeasible ontologies is no worse than that of the underlying classical ALC. Before concluding, we take a brief tour of some existing work on defeasible extensions of ALC that go beyond defeasible subsumption.}, year = {2019}, journal = {Description Logic, Theory Combination, and All That}, pages = {147–173}, publisher = {Springer}, address = {Switzerland}, isbn = {978-3-030-22101-0}, url = {https://link.springer.com/book/10.1007%2F978-3-030-22102-7}, doi = {https://doi.org/10.1007/978-3-030-22102-7 _ 7}, }
ConceptCloud is a flexible interactive tool for exploring, vi- sualising, and analysing semi-structured data sets. It uses a combination of an intuitive tag cloud visualisation with an underlying concept lattice to provide a formal structure for navigation through a data set. Con- ceptCloud 2.0 extends the tool with an integrated map view to exploit the geolocation aspect of data. The tool’s implementation of exploratory search does not require prior knowledge of the structure of the data or compromise on scalability, and provides seamless navigation through the tag cloud and the map viewer.
@misc{227, author = {Tiaan Du Toit, Joshua Berndt, Katarina Britz, Bernd Fischer}, title = {ConceptCloud 2.0: Visualisation and exploration of geolocation-rich semi-structured data sets}, abstract = {ConceptCloud is a flexible interactive tool for exploring, vi- sualising, and analysing semi-structured data sets. It uses a combination of an intuitive tag cloud visualisation with an underlying concept lattice to provide a formal structure for navigation through a data set. Con- ceptCloud 2.0 extends the tool with an integrated map view to exploit the geolocation aspect of data. The tool’s implementation of exploratory search does not require prior knowledge of the structure of the data or compromise on scalability, and provides seamless navigation through the tag cloud and the map viewer.}, year = {2019}, journal = {ICFCA 2019 Conference and Workshops}, month = {06/2019}, publisher = {CEUR-WS}, isbn = {1613-0073}, url = {http://ceur-ws.org/Vol-2378/}, }
The W3C standardized Semantic Web languages enable users to capture data without a schema in a manner which is intuitive to them. The challenge is that for the data to be useful, it should be possible to query the data and to query it efficiently, which necessitates a schema. Understanding the structure of data is thus important to both users and storage implementers: the structure of the data gives insight to users in how to query the data while storage implementers can use the structure to optimize queries. In this paper we propose that data mining routines can be used to infer candidate n-ary relations with related uniqueness- and null-free constraints, which can be used to construct an informative Armstrong RDF dataset. The benefit of an informative Armstrong RDF dataset is that it provides example data based on the original data which is a fraction of the size of the original data, while capturing the constraints of the original data faithfully. A case study on a DBPedia person dataset showed that the associated informative Armstrong RDF dataset contained 0.00003% of the statements of the original DBPedia dataset.
@{188, author = {Henriette Harmse, Katarina Britz, Aurona Gerber}, title = {Informative Armstrong RDF datasets for n-ary relations}, abstract = {The W3C standardized Semantic Web languages enable users to capture data without a schema in a manner which is intuitive to them. The challenge is that for the data to be useful, it should be possible to query the data and to query it efficiently, which necessitates a schema. Understanding the structure of data is thus important to both users and storage implementers: the structure of the data gives insight to users in how to query the data while storage implementers can use the structure to optimize queries. In this paper we propose that data mining routines can be used to infer candidate n-ary relations with related uniqueness- and null-free constraints, which can be used to construct an informative Armstrong RDF dataset. The benefit of an informative Armstrong RDF dataset is that it provides example data based on the original data which is a fraction of the size of the original data, while capturing the constraints of the original data faithfully. A case study on a DBPedia person dataset showed that the associated informative Armstrong RDF dataset contained 0.00003% of the statements of the original DBPedia dataset.}, year = {2018}, journal = {Formal Ontology in Information Systems: 10th International Conference, Cape Town, South Africa}, pages = {187-198}, month = {17/09-21/09}, publisher = {IOS Press}, }

Latest Research Publications:
Propositional KLM-style defeasible reasoning involves a core propositional logic capable of expressing defeasible (or conditional) implications. The semantics for this logic is based on Kripke-like structures known as ranked interpretations. KLM-style defeasible entailment is referred to as rational whenever the defeasible entailment relation under consideration generates a set of defeasible implications all satisfying a set of rationality postulates known as the KLM postulates. In a recent paper Booth et al. proposed PTL, a logic that is more expressive than the core KLM logic. They proved an impossibility result, showing that defeasible entailment for PTL fails to satisfy a set of rationality postulates similar in spirit to the KLM postulates. Their interpretation of the impossibility result is that defeasible entailment for PTL need not be unique.
In this paper we continue the line of research in which the expressivity of the core KLM logic is extended. We present the logic Boolean KLM (BKLM) in which we allow for disjunctions, conjunctions, and negations, but not nesting, of defeasible implications. Our contribution is twofold. Firstly, we show (perhaps surprisingly) that BKLM is more expressive than PTL. Our proof is based on the fact that BKLM can characterise all single ranked interpretations, whereas PTL cannot. Secondly, given that the PTL impossibility result also applies to BKLM, we adapt the different forms of PTL entailment proposed by Booth et al. to apply to BKLM.
@misc{383, author = {Guy Paterson-Jones, Giovanni Casini, Thomas Meyer}, title = {BKLM - An expressive logic for defeasible reasoning}, abstract = {Propositional KLM-style defeasible reasoning involves a core propositional logic capable of expressing defeasible (or conditional) implications. The semantics for this logic is based on Kripke-like structures known as ranked interpretations. KLM-style defeasible entailment is referred to as rational whenever the defeasible entailment relation under consideration generates a set of defeasible implications all satisfying a set of rationality postulates known as the KLM postulates. In a recent paper Booth et al. proposed PTL, a logic that is more expressive than the core KLM logic. They proved an impossibility result, showing that defeasible entailment for PTL fails to satisfy a set of rationality postulates similar in spirit to the KLM postulates. Their interpretation of the impossibility result is that defeasible entailment for PTL need not be unique. In this paper we continue the line of research in which the expressivity of the core KLM logic is extended. We present the logic Boolean KLM (BKLM) in which we allow for disjunctions, conjunctions, and negations, but not nesting, of defeasible implications. Our contribution is twofold. Firstly, we show (perhaps surprisingly) that BKLM is more expressive than PTL. Our proof is based on the fact that BKLM can characterise all single ranked interpretations, whereas PTL cannot. Secondly, given that the PTL impossibility result also applies to BKLM, we adapt the different forms of PTL entailment proposed by Booth et al. to apply to BKLM.}, year = {2020}, journal = {18th International Workshop on Non-Monotonic Reasoning}, month = {12/09/2020}, }
We present a formal framework for modelling belief change within a non-monotonic reasoning system. Belief change and non-monotonic reasoning are two areas that are formally closely related, with recent attention being paid towards the analysis of belief change within a non-monotonic environment. In this paper we consider the classical AGM belief change operators, contraction and revision, applied to a defeasible setting in the style of Kraus, Lehmann, and Magidor. The investigation leads us to the formal characterisation of a number of classes of defeasible belief change operators. For the most interesting classes we need to consider the problem of iterated belief change, generalising the classical work of Darwiche and Pearl in the process. Our work involves belief change operators aimed at ensuring logical consistency, as well as the characterisation of analogous operators aimed at obtaining coherence—an important notion within the field of logic-based ontologies
@{382, author = {Giovanni Casini, Thomas Meyer, Ivan Varzinczak}, title = {Rational Defeasible Belief Change}, abstract = {We present a formal framework for modelling belief change within a non-monotonic reasoning system. Belief change and non-monotonic reasoning are two areas that are formally closely related, with recent attention being paid towards the analysis of belief change within a non-monotonic environment. In this paper we consider the classical AGM belief change operators, contraction and revision, applied to a defeasible setting in the style of Kraus, Lehmann, and Magidor. The investigation leads us to the formal characterisation of a number of classes of defeasible belief change operators. For the most interesting classes we need to consider the problem of iterated belief change, generalising the classical work of Darwiche and Pearl in the process. Our work involves belief change operators aimed at ensuring logical consistency, as well as the characterisation of analogous operators aimed at obtaining coherence—an important notion within the field of logic-based ontologies}, year = {2020}, journal = {17th International Conference on Principles of Knowledge Representation and Reasoning (KR 2020)}, pages = {213-222}, month = {12/09/2020}, publisher = {IJCAI}, address = {Virtual}, url = {https://library.confdna.com/kr/2020/}, doi = {10.24963/kr.2020/22}, }
In this paper we present an approach to defeasible reasoning for the description logic ALC. The results discussed here are based on work done by Kraus, Lehmann and Magidor (KLM) on defeasible conditionals in the propositional case. We consider versions of a preferential semantics for two forms of defeasible subsumption, and link these semantic constructions formally to KLM-style syntactic properties via representation results. In addition to showing that the semantics is appropriate, these results pave the way for more effective decision procedures for defeasible reasoning in description logics. With the semantics of the defeasible version of ALC in place, we turn to the investigation of an appropriate form of defeasible entailment for this enriched version of ALC. This investigation includes an algorithm for the computation of a form of defeasible entailment known as rational closure in the propositional case. Importantly, the algorithm relies completely on classical entailment checks and shows that the computational complexity of reasoning over defeasible ontologies is no worse than that of the underlying classical ALC. Before concluding, we take a brief tour of some existing work on defeasible extensions of ALC that go beyond defeasible subsumption.
@inbook{240, author = {Katarina Britz, Giovanni Casini, Thomas Meyer, Ivan Varzinczak}, title = {A KLM Perspective on Defeasible Reasoning for Description Logics}, abstract = {In this paper we present an approach to defeasible reasoning for the description logic ALC. The results discussed here are based on work done by Kraus, Lehmann and Magidor (KLM) on defeasible conditionals in the propositional case. We consider versions of a preferential semantics for two forms of defeasible subsumption, and link these semantic constructions formally to KLM-style syntactic properties via representation results. In addition to showing that the semantics is appropriate, these results pave the way for more effective decision procedures for defeasible reasoning in description logics. With the semantics of the defeasible version of ALC in place, we turn to the investigation of an appropriate form of defeasible entailment for this enriched version of ALC. This investigation includes an algorithm for the computation of a form of defeasible entailment known as rational closure in the propositional case. Importantly, the algorithm relies completely on classical entailment checks and shows that the computational complexity of reasoning over defeasible ontologies is no worse than that of the underlying classical ALC. Before concluding, we take a brief tour of some existing work on defeasible extensions of ALC that go beyond defeasible subsumption.}, year = {2019}, journal = {Description Logic, Theory Combination, and All That}, pages = {147–173}, publisher = {Springer}, address = {Switzerland}, isbn = {978-3-030-22101-0}, url = {https://link.springer.com/book/10.1007%2F978-3-030-22102-7}, doi = {https://doi.org/10.1007/978-3-030-22102-7 _ 7}, }
We present a systematic approach for extending the KLM framework for defeasible entailment. We first present a class of basic defeasible entailment relations, characterise it in three distinct ways and provide a high-level algorithm for computing it. This framework is then refined, with the refined version being characterised in a similar manner. We show that the two well-known forms of defeasible entailment, rational closure and lexicographic closure, fall within our refined framework, that rational closure is the most conservative of the defeasible entailment relations within the framework (with respect to subset inclusion), but that there are forms of defeasible entailment within our framework that are more “adventurous” than lexicographic closure.
@{238, author = {Giovanni Casini, Thomas Meyer, Ivan Varzinczak}, title = {Taking Defeasible Entailment Beyond Rational Closure}, abstract = {We present a systematic approach for extending the KLM framework for defeasible entailment. We first present a class of basic defeasible entailment relations, characterise it in three distinct ways and provide a high-level algorithm for computing it. This framework is then refined, with the refined version being characterised in a similar manner. We show that the two well-known forms of defeasible entailment, rational closure and lexicographic closure, fall within our refined framework, that rational closure is the most conservative of the defeasible entailment relations within the framework (with respect to subset inclusion), but that there are forms of defeasible entailment within our framework that are more “adventurous” than lexicographic closure.}, year = {2019}, journal = {European Conference on Logics in Artificial Intelligence}, pages = {182 - 197}, month = {07/05 - 11/05}, publisher = {Springer}, address = {Switzerland}, isbn = {978-3-030-19569-4}, url = {https://link.springer.com/chapter/10.1007%2F978-3-030-19570-0_12}, doi = {https://doi.org/10.1007/978-3-030-19570-0 _ 12}, }
In this paper we introduce and investigate a very basic semantics for conditionals that can be used to define a broad class of conditional reasoning systems. We show that it encompasses the most popular kinds of conditional reasoning developed in logic-based KR. It turns out that the semantics we propose is appropriate for a structural analysis of those conditionals that do not satisfy the property of Right Weakening. We show that it can be used for the further development of an analysis of the notion of relevance in conditional reasoning.
@{226, author = {Giovanni Casini, Thomas Meyer, Ivan Varzinczak}, title = {Simple Conditionals with Constrained Right Weakening}, abstract = {In this paper we introduce and investigate a very basic semantics for conditionals that can be used to define a broad class of conditional reasoning systems. We show that it encompasses the most popular kinds of conditional reasoning developed in logic-based KR. It turns out that the semantics we propose is appropriate for a structural analysis of those conditionals that do not satisfy the property of Right Weakening. We show that it can be used for the further development of an analysis of the notion of relevance in conditional reasoning.}, year = {2019}, journal = {International Joint Conference on Artificial Intelligence}, pages = {1632-1638}, month = {10/08-16/08}, publisher = {International Joint Conferences on Artificial Intelligence}, isbn = {978-0-9992411-4-1}, url = {https://www.ijcai.org/Proceedings/2019/0226.pdf}, doi = {10.24963/ijcai.2019/226}, }
Latest Research Publications:
Latest Research Publications:

Marelie obtained her undergraduate degrees (Computer Science & Mathematics) from Stellenbosch University, receiving the Dean’s medal as best student in the US Faculty of Science at the end of her Honours degree. Prior to joining NWU, Marelie was a principal researcher and research group leader at the South African CSIR, involved in technology-oriented research and development. Her research group focussed on speech technology development in under-resourced environments; in 2005, she received her PhD from the University of Pretoria (UP), with a thesis on bootstrapping pronunciation models, at the time one of the core ‘missing’ components when developing speech recognition for South African languages.
In 2011, Marelie joined NWU, becoming the Director of MuST in 2014. MuST is a focussed research environment with an emphasis on postgraduate training and delivering on externally-focussed projects. Recent projects include the development of an automatic speech transcription platform for the South African government, development of a new multilingual text-to-speech corpus in collaboration with Internet giant Google, and being part of the winning consortium of the BABEL project: a 5-year internationally collaborative challenge aimed at solving the spoken term detection task for under-resourced languages.
Over the past few years, Marelie has supervised 23 post-graduate students, all producing research related to the theory and applications of machine learning. She frequently participates in various scientific committees both nationally and internationally (AAAI, IJCAI, Interspeech, SLT, MediaEval, ICASSP, SLTU), is the NWU group representative at the national Centre for Artificial Intelligence Research (CAIR), and an NRF-rated researcher. Since 2003, she has published 100 peer-reviewed papers related to machine learning; she has an h-index of 21, and an i10-index of 37.
Latest Research Publications:
Each node in a neural network is trained to activate for a specific region in the input domain. Any training samples that fall within this domain are therefore implicitly clustered together. Recent work has highlighted the importance of these clusters during the training process but has not yet investigated their evolution during training. Towards this goal, we train several ReLU-activated MLPs on a simple classification task (MNIST) and show that a consistent training process emerges: (1) sample clusters initially increase in size and then decrease as training progresses, (2) the size of sample clusters in the first layer decreases more rapidly than in deeper layers, (3) binary node activations, especially of nodes in deeper layers, become more sensitive to class membership as training progresses, (4) individual nodes remain poor predictors of class membership, even if accurate when applied as a group. We report on the detail of these findings and interpret them from the perspective of a high-dimensional clustering process.
@{402, author = {Daniël Haasbroek, Marelie Davel}, title = {Exploring neural network training dynamics through binary node activations}, abstract = {Each node in a neural network is trained to activate for a specific region in the input domain. Any training samples that fall within this domain are therefore implicitly clustered together. Recent work has highlighted the importance of these clusters during the training process but has not yet investigated their evolution during training. Towards this goal, we train several ReLU-activated MLPs on a simple classification task (MNIST) and show that a consistent training process emerges: (1) sample clusters initially increase in size and then decrease as training progresses, (2) the size of sample clusters in the first layer decreases more rapidly than in deeper layers, (3) binary node activations, especially of nodes in deeper layers, become more sensitive to class membership as training progresses, (4) individual nodes remain poor predictors of class membership, even if accurate when applied as a group. We report on the detail of these findings and interpret them from the perspective of a high-dimensional clustering process.}, year = {2020}, journal = {Southern African Conference for Artificial Intelligence Research}, pages = {304-320}, month = {22/02/2021 - 26/02/2021}, address = {South Africa}, isbn = {978-0-620-89373-2}, url = {https://sacair.org.za/proceedings/}, }
When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.
@{401, author = {Arthur Venter, Marthinus Theunissen, Marelie Davel}, title = {Pre-interpolation loss behavior in neural networks}, abstract = {When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.}, year = {2020}, journal = {Southern African Conference for Artificial Intelligence Research}, pages = {296-309}, month = {22/02/2021 - 26/02/2021}, publisher = {Springer}, address = {South Africa}, isbn = {978-3-030-66151-9}, doi = {https://doi.org/10.1007/978-3-030-66151-9_19}, }
Although Convolutional Neural Networks (CNNs) are widely used, their translation invariance (ability to deal with translated inputs) is still subject to some controversy. We explore this question using translation-sensitivity maps to quantify how sensitive a standard CNN is to a translated input. We propose the use of cosine similarity as sensitivity metric over Euclidean distance, and discuss the importance of restricting the dimensionality of either of these metrics when comparing architectures. Our main focus is to investigate the effect of different architectural components of a standard CNN on that network’s sensitivity to translation. By varying convolutional kernel sizes and amounts of zero padding, we control the size of the feature maps produced, allowing us to quantify the extent to which these elements influence translation invariance. We also measure translation invariance at different locations within the CNN to determine the extent to which convolutional and fully connected layers, respectively, contribute to the translation invariance of a CNN as a whole. Our analysis indicates that both convolutional kernel size and feature map size have a systematic influence on translation invariance. We also see that convolutional layers contribute less than expected to translation invariance, when not specifically forced to do so.
@{400, author = {Johannes Myburgh, Coenraad Mouton, Marelie Davel}, title = {Tracking translation invariance in CNNs}, abstract = {Although Convolutional Neural Networks (CNNs) are widely used, their translation invariance (ability to deal with translated inputs) is still subject to some controversy. We explore this question using translation-sensitivity maps to quantify how sensitive a standard CNN is to a translated input. We propose the use of cosine similarity as sensitivity metric over Euclidean distance, and discuss the importance of restricting the dimensionality of either of these metrics when comparing architectures. Our main focus is to investigate the effect of different architectural components of a standard CNN on that network’s sensitivity to translation. By varying convolutional kernel sizes and amounts of zero padding, we control the size of the feature maps produced, allowing us to quantify the extent to which these elements influence translation invariance. We also measure translation invariance at different locations within the CNN to determine the extent to which convolutional and fully connected layers, respectively, contribute to the translation invariance of a CNN as a whole. Our analysis indicates that both convolutional kernel size and feature map size have a systematic influence on translation invariance. We also see that convolutional layers contribute less than expected to translation invariance, when not specifically forced to do so.}, year = {2020}, journal = {Southern African Conference for Artificial Intelligence Research}, pages = {282-295}, month = {22/02/2021 - 26/02/2021}, publisher = {Springer}, address = {South Africa}, isbn = {978-3-030-66151-9}, doi = {https://doi.org/10.1007/978-3-030-66151-9_18}, }
Convolutional Neural Networks have become the standard for image classification tasks, however, these architectures are not invariant to translations of the input image. This lack of invariance is attributed to the use of stride which subsamples the input, resulting in a loss of information, and fully connected layers which lack spatial reasoning. We show that stride can greatly benefit translation invariance given that it is combined with sufficient similarity between neighbouring pixels, a characteristic which we refer to as local homogeneity. We also observe that this characteristic is dataset-specific and dictates the relationship between pooling kernel size and stride required for translation invariance. Furthermore we find that a trade-off exists between generalization and translation invariance in the case of pooling kernel size, as larger kernel sizes lead to better invariance but poorer generalization. Finally we explore the efficacy of other solutions proposed, namely global average pooling, anti-aliasing, and data augmentation, both empirically and through the lens of local homogeneity.
@{399, author = {Coenraad Mouton, Johannes Myburgh, Marelie Davel}, title = {Stride and translation invariance in CNNs}, abstract = {Convolutional Neural Networks have become the standard for image classification tasks, however, these architectures are not invariant to translations of the input image. This lack of invariance is attributed to the use of stride which subsamples the input, resulting in a loss of information, and fully connected layers which lack spatial reasoning. We show that stride can greatly benefit translation invariance given that it is combined with sufficient similarity between neighbouring pixels, a characteristic which we refer to as local homogeneity. We also observe that this characteristic is dataset-specific and dictates the relationship between pooling kernel size and stride required for translation invariance. Furthermore we find that a trade-off exists between generalization and translation invariance in the case of pooling kernel size, as larger kernel sizes lead to better invariance but poorer generalization. Finally we explore the efficacy of other solutions proposed, namely global average pooling, anti-aliasing, and data augmentation, both empirically and through the lens of local homogeneity.}, year = {2020}, journal = {Southern African Conference for Artificial Intelligence Research}, pages = {267-281}, month = {22/02/2021 - 26/02/2021}, publisher = {Springer}, address = {South Africa}, isbn = {978-3-030-66151-9}, doi = {https://doi.org/10.1007/978-3-030-66151-9_17}, }
The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance trade off in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework.We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically,we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally,we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.
@article{394, author = {Marthinus Theunissen, Marelie Davel, Etienne Barnard}, title = {Benign interpolation of noise in deep learning}, abstract = {The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance trade off in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework.We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically,we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally,we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.}, year = {2020}, journal = {South African Computer Journal}, volume = {32}, pages = {80-101}, issue = {2}, publisher = {South African Institute of Computer Scientists and Information Technologists}, isbn = {ISSN: 1015-7999; E:2313-7835}, doi = {https://doi.org/10.18489/sacj.v32i2.833}, }