People

Latest Research Publications:
Latest Research Publications:
Latest Research Publications:
Latest Research Publications:

Latest Research Publications:
When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.
@article{484, author = {Arthur Venter, Marthinus Theunissen, Marelie Davel}, title = {Pre-interpolation loss behaviour in neural networks}, abstract = {When training neural networks as classifiers, it is common to observe an increase in average test loss while still maintaining or improving the overall classification accuracy on the same dataset. In spite of the ubiquity of this phenomenon, it has not been well studied and is often dismissively attributed to an increase in borderline correct classifications. We present an empirical investigation that shows how this phenomenon is actually a result of the differential manner by which test samples are processed. In essence: test loss does not increase overall, but only for a small minority of samples. Large representational capacities allow losses to decrease for the vast majority of test samples at the cost of extreme increases for others. This effect seems to be mainly caused by increased parameter values relating to the correctly processed sample features. Our findings contribute to the practical understanding of a common behaviour of deep neural networks. We also discuss the implications of this work for network optimisation and generalisation.}, year = {2020}, journal = {Communications in Computer and Information Science}, volume = {1342}, pages = {296-309}, publisher = {Southern African Conference for Artificial Intelligence Research}, address = {South Africa}, isbn = {978-3-030-66151-9}, doi = {https://doi.org/10.1007/978-3-030-66151-9_19}, }
The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance trade off in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework.We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically,we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally,we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.
@article{394, author = {Marthinus Theunissen, Marelie Davel, Etienne Barnard}, title = {Benign interpolation of noise in deep learning}, abstract = {The understanding of generalisation in machine learning is in a state of flux, in part due to the ability of deep learning models to interpolate noisy training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about the bias-variance trade off in learning. We expand upon relevant existing work by discussing local attributes of neural network training within the context of a relatively simple framework.We describe how various types of noise can be compensated for within the proposed framework in order to allow the deep learning model to generalise in spite of interpolating spurious function descriptors. Empirically,we support our postulates with experiments involving overparameterised multilayer perceptrons and controlled training data noise. The main insights are that deep learning models are optimised for training data modularly, with different regions in the function space dedicated to fitting distinct types of sample information. Additionally,we show that models tend to fit uncorrupted samples first. Based on this finding, we propose a conjecture to explain an observed instance of the epoch-wise double-descent phenomenon. Our findings suggest that the notion of model capacity needs to be modified to consider the distributed way training data is fitted across sub-units.}, year = {2020}, journal = {South African Computer Journal}, volume = {32}, pages = {80-101}, issue = {2}, publisher = {South African Institute of Computer Scientists and Information Technologists}, isbn = {ISSN: 1015-7999; E:2313-7835}, doi = {https://doi.org/10.18489/sacj.v32i2.833}, }
A robust theoretical framework that can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification.
@{236, author = {Marelie Davel, Marthinus Theunissen, Arnold Pretorius, Etienne Barnard}, title = {DNNs as layers of cooperating classifiers}, abstract = {A robust theoretical framework that can describe and predict the generalization ability of deep neural networks (DNNs) in general circumstances remains elusive. Classical attempts have produced complexity metrics that rely heavily on global measures of compactness and capacity with little investigation into the effects of sub-component collaboration. We demonstrate intriguing regularities in the activation patterns of the hidden nodes within fully-connected feedforward networks. By tracing the origin of these patterns, we show how such networks can be viewed as the combination of two information processing systems: one continuous and one discrete. We describe how these two systems arise naturally from the gradient-based optimization process, and demonstrate the classification ability of the two systems, individually and in collaboration. This perspective on DNN classification offers a novel way to think about generalization, in which different subsets of the training data are used to train distinct classifiers; those classifiers are then combined to perform the classification task, and their consistency is crucial for accurate classification.}, year = {2020}, journal = {The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)}, pages = {3725 - 3732}, month = {07/02-12/02/2020}, address = {New York}, }
The understanding of generalization in machine learning is in a state of flux. This is partly due to the elatively recent revelation that deep learning models are able to completely memorize training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about generalization. The phenomenon was brought to light and discussed in a seminal paper by Zhang et al. [24]. We expand upon this work by discussing local attributes of neural network training within the context of a relatively simple and generalizable framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the global deep learning model to generalize in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterized multilayer perceptrons and controlled noise in the training data. The main insights are that deep learning models are optimized for training data modularly, with different regions in the function space dedicated to fitting distinct kinds of sample information. Detrimental overfitting is largely prevented by the fact that different regions in the function space are used for prediction based on the similarity between new input data and that which has been optimized for.
@{284, author = {Marthinus Theunissen, Marelie Davel, Etienne Barnard}, title = {Insights regarding overfitting on noise in deep learning}, abstract = {The understanding of generalization in machine learning is in a state of flux. This is partly due to the elatively recent revelation that deep learning models are able to completely memorize training data and still perform appropriately on out-of-sample data, thereby contradicting long-held intuitions about generalization. The phenomenon was brought to light and discussed in a seminal paper by Zhang et al. [24]. We expand upon this work by discussing local attributes of neural network training within the context of a relatively simple and generalizable framework. We describe how various types of noise can be compensated for within the proposed framework in order to allow the global deep learning model to generalize in spite of interpolating spurious function descriptors. Empirically, we support our postulates with experiments involving overparameterized multilayer perceptrons and controlled noise in the training data. The main insights are that deep learning models are optimized for training data modularly, with different regions in the function space dedicated to fitting distinct kinds of sample information. Detrimental overfitting is largely prevented by the fact that different regions in the function space are used for prediction based on the similarity between new input data and that which has been optimized for.}, year = {2019}, journal = {South African Forum for Artificial Intelligence Research (FAIR)}, pages = {49-63}, address = {Cape Town, South Africa}, }
Latest Research Publications:

TALKS:
1) 'Toward a Coherent Account of Moral Agency' (FAIR 2019);
2) 'Functional Moral Agency' (4IR: Philosophical, Ethical and Legal Perspectives 2019).
PUBLICATIONS:
1) Tollon, Fabio. 2020. The Artificial View: toward a non-anthropocentric account of moral patiency. Ethics and Information Technology. https://doi.org/10.1007/s10676-020-09540-4;
2) Tollon, Fabio. 2019. Moral Agents or Mindless Machines? A critical appraisal of agency in artificial systems. Hungarian Philosophical Review 63(4), pp. 9-23;
3) Tollon, Fabio. 2019. Toward a Coherent Account of Moral Agency. Proceedings of the South African Forum for Artificial Intelligence Research. Vol 2540. http://ceur-ws.org/Vol-2540/.
Latest Research Publications:
Artifcial Intelligence (AI) systems are ubiquitous. From social media timelines, video recommendations on YouTube, and the kinds of adverts we see online, AI, in a very real sense, flters the world we see. More than that, AI is being embedded in agent-like systems, which might prompt certain reactions from users. Specifcally, we might fnd ourselves feeling frustrated if these systems do not meet our expectations. In normal situations, this might be fne, but with the ever increasing sophistication of AI-systems, this might become a problem. While it seems unproblematic to realize that being angry at your car for breaking down is unfitting, can the same be said for AI-systems? In this paper, therefore, I will investigate the so-called “reactive attitudes”, and their important link to our responsibility practices. I then show how within this framework there exist exemption and excuse conditions, and test whether our adopting the “objective attitude” toward agential AI is justifed. I argue that such an attitude is appropriate in the context of three distinct senses of responsibility (answerability, attributability, and accountability), and that, therefore, AI-systems do not undermine our responsibility ascriptions.
@article{487, author = {Fabio Tollon}, title = {Responsibility gaps and the reactive attitudes}, abstract = {Artifcial Intelligence (AI) systems are ubiquitous. From social media timelines, video recommendations on YouTube, and the kinds of adverts we see online, AI, in a very real sense, flters the world we see. More than that, AI is being embedded in agent-like systems, which might prompt certain reactions from users. Specifcally, we might fnd ourselves feeling frustrated if these systems do not meet our expectations. In normal situations, this might be fne, but with the ever increasing sophistication of AI-systems, this might become a problem. While it seems unproblematic to realize that being angry at your car for breaking down is unfitting, can the same be said for AI-systems? In this paper, therefore, I will investigate the so-called “reactive attitudes”, and their important link to our responsibility practices. I then show how within this framework there exist exemption and excuse conditions, and test whether our adopting the “objective attitude” toward agential AI is justifed. I argue that such an attitude is appropriate in the context of three distinct senses of responsibility (answerability, attributability, and accountability), and that, therefore, AI-systems do not undermine our responsibility ascriptions.}, year = {2022}, journal = {AI and Ethics}, publisher = {Springer}, url = {https://link.springer.com/article/10.1007/s43681-022-00172-6}, doi = {https://doi.org/10.1007/s43681-022-00172-6}, }
Up to 70% of all watch time on YouTube is due to the suggested content of its recommender system. This system has been found, by virtue of its design, to be promoting conspiratorial content. In this paper, the author firstly critiques the value neutrality thesis regarding technology, showing it to be philosophically untenable. This means that technological artefacts can influence what people come to value (or perhaps even embody values themselves) and change the moral evaluation of an action. Secondly, he introduces the concept of an affordance, borrowed from the literature on ecological psychology. This concept allows him to make salient how technologies come to solicit certain kinds of actions from users, making such actions more or less likely, and in this way influencing the kinds of things one comes to value. Thirdly, he critically assesses the results of a study by Alfano et al. He makes use of the literature on affordances, introduced earlier, to shed light on how these technological systems come to mediate our perception of the world and influence action.
@article{415, author = {Fabio Tollon}, title = {Designed to Seduce: Epistemically Retrograde Ideation and YouTube's Recommender System}, abstract = {Up to 70% of all watch time on YouTube is due to the suggested content of its recommender system. This system has been found, by virtue of its design, to be promoting conspiratorial content. In this paper, the author firstly critiques the value neutrality thesis regarding technology, showing it to be philosophically untenable. This means that technological artefacts can influence what people come to value (or perhaps even embody values themselves) and change the moral evaluation of an action. Secondly, he introduces the concept of an affordance, borrowed from the literature on ecological psychology. This concept allows him to make salient how technologies come to solicit certain kinds of actions from users, making such actions more or less likely, and in this way influencing the kinds of things one comes to value. Thirdly, he critically assesses the results of a study by Alfano et al. He makes use of the literature on affordances, introduced earlier, to shed light on how these technological systems come to mediate our perception of the world and influence action.}, year = {2021}, journal = {International Journal of Technoethics (IJT)}, volume = {12}, issue = {2}, publisher = {IGI Global}, isbn = {9781799861492}, url = {https://www.igi-global.com/gateway/article/281077}, doi = {10.4018/IJT.2021070105}, }
In this paper I critically evaluate the value neutrality thesis regarding technology, and find it wanting. I then introduce the various ways in which artifacts can come to influence moral value, and our evaluation of moral situations and actions. Here, following van de Poel and Kroes, I introduce the idea of value sensitive design. Specifically, I show how by virtue of their designed properties, artifacts may come to embody values. Such accounts, however, have several shortcomings. In agreement with Michael Klenk, I raise epistemic and metaphysical issues with respect to designed properties embodying value. The concept of an affordance, borrowed from ecological psychology, provides a more philosophically fruitful grounding to the potential way(s) in which artifacts might embody values. This is due to the way in which it incorporates key insights from perception more generally, and how we go about determining possibilities for action in our environment specifically. The affordance account as it is presented by Klenk, however, is insufficient. I therefore argue that we understand affordances based on whether they are meaningful, and, secondly, that we grade them based on their force.
@article{386, author = {Fabio Tollon}, title = {Artifacts and affordances: from designed properties to possibilities for action}, abstract = {In this paper I critically evaluate the value neutrality thesis regarding technology, and find it wanting. I then introduce the various ways in which artifacts can come to influence moral value, and our evaluation of moral situations and actions. Here, following van de Poel and Kroes, I introduce the idea of value sensitive design. Specifically, I show how by virtue of their designed properties, artifacts may come to embody values. Such accounts, however, have several shortcomings. In agreement with Michael Klenk, I raise epistemic and metaphysical issues with respect to designed properties embodying value. The concept of an affordance, borrowed from ecological psychology, provides a more philosophically fruitful grounding to the potential way(s) in which artifacts might embody values. This is due to the way in which it incorporates key insights from perception more generally, and how we go about determining possibilities for action in our environment specifically. The affordance account as it is presented by Klenk, however, is insufficient. I therefore argue that we understand affordances based on whether they are meaningful, and, secondly, that we grade them based on their force.}, year = {2021}, journal = {AI & SOCIETY Journal of Knowledge, Culture and Communication}, volume = {36}, issue = {1}, publisher = {Springer}, url = {https://link.springer.com/article/10.1007%2Fs00146-021-01155-7}, doi = {https://doi.org/10.1007/s00146-021-01155-7}, }
In this paper I provide an exposition and critique of the Organic View of Ethical Status, as outlined by Torrance (2008). A key presupposition of this view is that only moral patients can be moral agents. It is claimed that because artificial agents lack sentience, they cannot be proper subjects of moral concern (i.e. moral patients). This account of moral standing in principle excludes machines from participating in our moral universe. I will argue that the Organic View operationalises anthropocentric intuitions regarding sentience ascription, and by extension how we identify moral patients. The main difference between the argument I provide here and traditional arguments surrounding moral attributability is that I do not necessarily defend the view that internal states ground our ascriptions of moral patiency. This is in contrast to views such as those defended by Singer (1975, 2011) and Torrance (2008), where concepts such as sentience play starring roles. I will raise both conceptual and epistemic issues with regards to this sense of sentience. While this does not preclude the usage of sentience outright, it suggests that we should be more careful in our usage of internal mental states to ground our moral ascriptions. Following from this I suggest other avenues for further exploration into machine moral patiency which may not have the same shortcomings as the Organic View.
@article{387, author = {Fabio Tollon}, title = {The artifcial view: toward a non‑anthropocentric account of moral patiency}, abstract = {In this paper I provide an exposition and critique of the Organic View of Ethical Status, as outlined by Torrance (2008). A key presupposition of this view is that only moral patients can be moral agents. It is claimed that because artificial agents lack sentience, they cannot be proper subjects of moral concern (i.e. moral patients). This account of moral standing in principle excludes machines from participating in our moral universe. I will argue that the Organic View operationalises anthropocentric intuitions regarding sentience ascription, and by extension how we identify moral patients. The main difference between the argument I provide here and traditional arguments surrounding moral attributability is that I do not necessarily defend the view that internal states ground our ascriptions of moral patiency. This is in contrast to views such as those defended by Singer (1975, 2011) and Torrance (2008), where concepts such as sentience play starring roles. I will raise both conceptual and epistemic issues with regards to this sense of sentience. While this does not preclude the usage of sentience outright, it suggests that we should be more careful in our usage of internal mental states to ground our moral ascriptions. Following from this I suggest other avenues for further exploration into machine moral patiency which may not have the same shortcomings as the Organic View.}, year = {2020}, journal = {Ethics and Information Technology}, volume = {22}, issue = {4}, publisher = {Springer}, url = {https://link.springer.com/article/10.1007%2Fs10676-020-09540-4}, doi = {https://doi.org/10.1007/s10676-020-09540-4}, }
Latest Research Publications:

Current research focus: modelling and analysis of surface electromyographic signals.
Latest Research Publications:

Latest Research Publications:
The output size problem, for a string-to-tree transducer, is to determine the asymptotic behavior of the function describing the maximum size of output trees, with respect to the length of input strings. We show that the problem to determine, for a given regular expression, the worst-case matching time of a backtracking regular expression matcher, can be reduced to the output size problem. The latter can, in turn, be solved by determining the degree of ambiguity of a non-deterministic finite automaton.
Keywords: string-to-tree transducers, output size, backtracking regular expression matchers, NFA ambiguity
@article{201, author = {Martin Berglund, F. Drewes, Brink van der Merwe}, title = {The Output Size Problem for String-to-Tree Transducers}, abstract = {The output size problem, for a string-to-tree transducer, is to determine the asymptotic behavior of the function describing the maximum size of output trees, with respect to the length of input strings. We show that the problem to determine, for a given regular expression, the worst-case matching time of a backtracking regular expression matcher, can be reduced to the output size problem. The latter can, in turn, be solved by determining the degree of ambiguity of a non-deterministic finite automaton. Keywords: string-to-tree transducers, output size, backtracking regular expression matchers, NFA ambiguity}, year = {2018}, journal = {Journal of Automata, Languages and Combinatorics}, volume = {23}, pages = {19-38}, issue = {1}, publisher = {Institut für Informatik, Justus-Liebig-Universität Giessen}, address = {Germany}, isbn = {2567-3785}, url = {https://www.jalc.de/issues/2018/issue_23_1-3/jalc-2018-019-038.php}, }
Modern regular expression matching software features many extensions, some general while some are very narrowly specied. Here we consider the generalization of adding a class of operators which can be described by, e.g. nite-state transducers. Combined with backreferences they enable new classes of languages to be matched. The addition of nite-state transducers is shown to make membership testing undecidable. Following this result, we study the complexity of membership testing for various restricted cases of the model.
@{199, author = {Martin Berglund, F. Drewes, Brink van der Merwe}, title = {On Regular Expressions with Backreferences and Transducers}, abstract = {Modern regular expression matching software features many extensions, some general while some are very narrowly specied. Here we consider the generalization of adding a class of operators which can be described by, e.g. nite-state transducers. Combined with backreferences they enable new classes of languages to be matched. The addition of nite-state transducers is shown to make membership testing undecidable. Following this result, we study the complexity of membership testing for various restricted cases of the model.}, year = {2018}, journal = {10th Workshop on Non-Classical Models of Automata and Applications (NCMA 2018)}, pages = {1-19}, month = {21/08-22/08}, }
Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, and we analyse the complexity of regular expression matching when using Boost semantics.
@{196, author = {Brink van der Merwe, Martin Berglund, Willem Bester}, title = {Formalising Boost POSIX Regular Expression Matching}, abstract = {Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, and we analyse the complexity of regular expression matching when using Boost semantics.}, year = {2018}, journal = {International Colloquium on Theoretical Aspects of Computing}, pages = {99-115}, month = {17/02}, publisher = {Springer}, isbn = {978-3-030-02508-3}, url = {https://link.springer.com/chapter/10.1007/978-3-030-02508-3_6}, }
No Abstract
@{178, author = {Brink van der Merwe, N. Weideman, Martin Berglund}, title = {Turning evil regexes harmless}, abstract = {No Abstract}, year = {2017}, journal = {Conference of South African Institute of Computer Scientists and Information Technologists (SAICSIT'17)}, month = {26/09-28/09}, publisher = {ACM}, url = {https://dl.acm.org/citation.cfm?id=3129416}, }
Most modern regular expression matching libraries (one of the rare exceptions being Google’s RE2) allow backreferences, operations which bind a substring to a variable allowing it to be matched again verbatim. However, different implementations not only vary in the syntax permitted when using backreferences, but both implementations and definitions in the literature offer up a number of different variants on how backreferences match. Our aim is to compare the various flavors by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, and old ones extended to new classes.
@{176, author = {Martin Berglund, Brink van der Merwe}, title = {Regular Expressions with Backreferences Re-examined}, abstract = {Most modern regular expression matching libraries (one of the rare exceptions being Google’s RE2) allow backreferences, operations which bind a substring to a variable allowing it to be matched again verbatim. However, different implementations not only vary in the syntax permitted when using backreferences, but both implementations and definitions in the literature offer up a number of different variants on how backreferences match. Our aim is to compare the various flavors by considering the formal languages that each can describe, resulting in the establishment of a hierarchy of language classes. Beyond the hierarchy itself, some complexity results are given, and as part of the effort on comparing language classes new pumping lemmas are established, and old ones extended to new classes.}, year = {2017}, journal = {The Prague Stringology Conference (PSC 2017)}, pages = {30-41}, month = {28/08-30/08}, address = {Czech Technical University in Prague,}, isbn = {ISBN 978-80-01-06193-0}, }