Research Publications

2018

Watson B. The impact of using a contract-driven, test-interceptor based software development approach. Annual conference of The South African Institute of Computer Scientists and Information Technologists (SAICSIT 2018). 2018. https://doi.org/10.475/123_4.

A contract-driven development approach requires the formalization of component requirements in the form of a component contract. The Use Case, Responsibility Driven Analysis and Design (URDAD) methodology is based on the contract-driven development approach and uses contracts to capture user requirements and perform a technology-neutral design across layers of granularity. This is achieved by taking use-case based functional requirements through an iterative design process and generating various levels of granularity iteratively. In this project, the component contracts that were captured by utilizing the URDAD approach are used to generate test interceptors which validate whether, in the context of rendering component services, the component contracts are satisfied. To achieve this, Java classes and interfaces are annotated with pre- and postconditions to represent the contracts in code. Annotation processors are then used to automatically generate test-interceptor classes by processing the annotations. The test-interceptor classes encapsulate test-logic and are interface-compatible with their underlying component counterparts. This enable test-interceptors to intercept service requests to the underlying counterpart components in order to verify contract adherence. The generated test interceptors can be used for unit testing as well as real-time component monitoring. This development approach, utilized within the URDAD methodology would then result in unit and integration tests across levels of granularity. Empirical data from actual software development projects will be used to assess the impact of introducing such a development approach in real software development projects. In particular, the study assesses the impact on the quality attributes of the software development process, as well as the qualities of the software produced by the process. Process qualities measured include development productivity (the rate at which software is produced), correctness (the rate at which the produced software meets the clients requirements) and the certifiability of the software development process (which certifiability requirements are fully or partially addressed by the URDAD development approach). Software qualities measured include reusability (empirical and qualitative), simplicity (the inverse of the complexity measure) and bug density (number of defects in a module). The study aims to show conclusively how the approach impacts the creation of correct software which meets the client requirements, how productivity is affected and if the approach enhances or hinders certifiability. The study also aims to determine if testinterceptors are a viable mechanism to produce high-quality tests that contribute to the creation of correct software. Furthermore, the study aims to determine if the software produced by applying this approach yield improved reusability or not, if the software becomes more or less complex and if more or less bugs are induced.

@proceedings{211,
  author = {Bruce Watson},
  title = {The impact of using a contract-driven, test-interceptor based software development approach},
  abstract = {A contract-driven development approach requires the formalization of component requirements in the form of a component contract. The Use Case, Responsibility Driven Analysis and Design (URDAD) methodology is based on the contract-driven development approach and uses contracts to capture user requirements and perform a technology-neutral design across layers of granularity. This is achieved by taking use-case based functional requirements through an iterative design process and generating various levels of granularity iteratively. 
In this project, the component contracts that were captured by utilizing the URDAD approach are used to generate test interceptors which validate whether, in the context of rendering component services, the component contracts are satisfied. To achieve this, Java classes and interfaces are annotated with pre- and postconditions to represent the contracts in code. Annotation processors are then used to automatically generate test-interceptor classes by processing the annotations. The test-interceptor classes encapsulate test-logic and are interface-compatible with their underlying component counterparts. This enable test-interceptors to intercept service requests to the underlying counterpart components in order to verify contract adherence. The generated test interceptors can be used for unit testing as well as real-time component monitoring. This development approach, utilized within the URDAD methodology would then result in unit and integration tests across levels of granularity. 
Empirical data from actual software development projects will be used to assess the impact of introducing such a development approach in real software development projects. In particular, the study assesses the impact on the quality attributes of the software development process, as well as the qualities of the software produced by the process.
Process qualities measured include development productivity (the rate at which software is produced), correctness (the rate at which the produced software meets the clients requirements) and the certifiability of the software  development process (which certifiability requirements are fully or partially addressed by the URDAD development approach). Software qualities measured include reusability (empirical and qualitative), simplicity (the inverse of the complexity measure) and bug density (number of defects in a module). 
The study aims to show conclusively how the approach impacts the creation of correct software which meets the client requirements, how productivity is affected and if the approach enhances or hinders certifiability. The study also aims to determine if testinterceptors
are a viable mechanism to produce high-quality tests that contribute to the creation of correct software. Furthermore, the study aims to determine if the software produced by applying this approach yield improved reusability or not, if the software becomes
more or less complex and if more or less bugs are induced.},
  year = {2018},
  journal = {Annual conference of The South African Institute of Computer Scientists and Information Technologists (SAICSIT 2018)},
  pages = {322-326},
  month = {26/09-28/09},
  publisher = {ACM},
  address = {New York},
  isbn = {123-4567-24-567/08/06},
  url = {https://doi.org/10.475/123_4},
}
Watson B. Three Strategies for the Dead-Zone String Matching Algorithm. The Prague Stringology Conference. 2018. http://www.stringology.org/.

No Abstract

@proceedings{210,
  author = {Bruce Watson},
  title = {Three Strategies for the Dead-Zone String Matching Algorithm},
  abstract = {No Abstract},
  year = {2018},
  journal = {The Prague Stringology Conference},
  pages = {117-128},
  month = {27/08-28/08},
  publisher = {Prague Stringology Club},
  address = {Prague, Czech Republic},
  isbn = {978-80-01-06484-9},
  url = {http://www.stringology.org/},
}
Watson B. Modelling the sensory space of varietal wines: Mining of large, unstructured text data and visualisation of style patterns. Scientific Reports. 2018;8(4987). www.nature.com/scientificreports.

No Abstract

@article{209,
  author = {Bruce Watson},
  title = {Modelling the sensory space of varietal wines: Mining of large, unstructured text data and visualisation of style patterns},
  abstract = {No Abstract},
  year = {2018},
  journal = {Scientific Reports},
  volume = {8},
  pages = {1-13},
  issue = {4987},
  publisher = {Springer Nature},
  url = {www.nature.com/scientificreports},
}
Watson B. Using CSP to Develop Quality Concurrent Software. In: Principled Software Development. Switzerland: Springer; 2018. https://doi.org/10.1007/978-3-319-98047-8.

A method for developing concurrent software is advocated that centres on using CSP to specify the behaviour of the system. A small example problem is used to illustrate the method. The problem is to develop a simulation system that keeps track of and reports on the least unique bid of multiple streams of randomly generated incoming bids. The problem’s required high-level behaviour is specified in CSP, refined down to the level of interacting processes and then verified for refinement and behavioural correctness using the FDR refinement checker. Heuristics are used to map the CSP processes to a GO implementation. Interpretive reflections are offered of the lessons learned as a result of the exercise.

@inbook{208,
  author = {Bruce Watson},
  title = {Using CSP to Develop Quality Concurrent Software},
  abstract = {A method for developing concurrent software is advocated that centres
on using CSP to specify the behaviour of the system. A small example problem
is used to illustrate the method. The problem is to develop a simulation system
that keeps track of and reports on the least unique bid of multiple streams of
randomly generated incoming bids. The problem’s required high-level behaviour is
specified in CSP, refined down to the level of interacting processes and then verified
for refinement and behavioural correctness using the FDR refinement checker.
Heuristics are used to map the CSP processes to a GO implementation. Interpretive
reflections are offered of the lessons learned as a result of the exercise.},
  year = {2018},
  journal = {Principled Software Development},
  pages = {165-184},
  publisher = {Springer},
  address = {Switzerland},
  isbn = {978-3-319-98046-1},
  url = {https://doi.org/10.1007/978-3-319-98047-8},
}
Meyer T, Leenen L. Semantic Technologies and Big Data Analytics for Cyber Defence. In: Information Retrieval And Management: Concepts, Methodologies, Tools, And Applications. IGI Global; 2018. https://researchspace.csir.co.za/dspace/bitstream/handle/10204/8932/Leenen_2016.pdf?sequence=1.

The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment. The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logicbased systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems. This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.

@inbook{206,
  author = {Thomas Meyer and Louise Leenen},
  title = {Semantic Technologies and Big Data Analytics for Cyber Defence},
  abstract = {The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment.
The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logicbased systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems.
This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.},
  year = {2018},
  journal = {Information Retrieval and Management: Concepts, Methodologies, Tools, and Applications},
  pages = {1375-1388},
  publisher = {IGI Global},
  isbn = {9781522551911},
  url = {https://researchspace.csir.co.za/dspace/bitstream/handle/10204/8932/Leenen_2016.pdf?sequence=1},
}
Botha L, Meyer T, Peñaloza R. The Bayesian Description Logic BALC. International Workshop on Description Logics. 2018. http://ceur-ws.org/Vol-2211/.

Description Logics (DLs) that support uncertainty are not as well studied as their crisp alternatives, thereby limiting their use in real world domains. The Bayesian DL BEL and its extensions have been introduced to deal with uncertain knowledge without assuming (probabilistic) independence between axioms. In this paper we combine the classical DL ALC with Bayesian Networks. Our new DL includes a solution to the consistency checking problem and changes to the tableaux algorithm that are not a part of BEL. Furthermore, BALC also supports probabilistic assertional information which was not studied for BEL. We present algorithms for four categories of reasoning problems for our logic; two versions of concept satis ability (referred to as total concept satis- ability and partial concept satis ability respectively), knowledge base consistency, subsumption, and instance checking. We show that all reasoning problems in BALC are in the same complexity class as their classical variants, provided that the size of the Bayesian Network is included in the size of the knowledge base.

@proceedings{205,
  author = {Leonard Botha and Thomas Meyer and Rafael Peñaloza},
  title = {The Bayesian Description Logic BALC},
  abstract = {Description Logics (DLs) that support uncertainty are not as well studied as their crisp alternatives, thereby limiting their use in real world domains. The Bayesian DL BEL and its extensions have been introduced to deal with uncertain knowledge without assuming (probabilistic) independence between axioms. In this paper we combine the classical DL ALC with Bayesian Networks. Our new DL includes a solution to the consistency checking problem and changes to the tableaux algorithm that are not a part of BEL. Furthermore, BALC also supports probabilistic assertional information which was not studied for BEL. We present algorithms for four categories of reasoning problems for our logic; two versions of concept satisability (referred to as total concept satis- ability and partial concept satisability respectively), knowledge base consistency, subsumption, and instance checking. We show that all reasoning problems in BALC are in the same complexity class as their classical variants, provided that the size of the Bayesian Network is included in the size of the knowledge base.},
  year = {2018},
  journal = {International Workshop on Description Logics},
  month = {27/10-29/10},
  url = {http://ceur-ws.org/Vol-2211/},
}
de Waal A, Yoo K. Latent Variable Bayesian Networks Constructed Using Structural Equation Modelling. 2018 21st International Conference on Information Fusion (FUSION). 2018. https://ieeexplore.ieee.org/abstract/document/8455240.

Bayesian networks in fusion systems often contain latent variables. They play an important role in fusion systems as they provide context which lead to better choices of data sources to fuse. Latent variables in Bayesian networks are mostly constructed by means of expert knowledge modelling.We propose using theory-driven structural equation modelling (SEM) to identify and structure latent variables in a Bayesian network. The linking of SEM and Bayesian networks is motivated by the fact that both methods can be shown to be causal models. We compare this approach to a data-driven approach where latent factors are induced by means of unsupervised learning. We identify appropriate metrics for URREF ontology criteria for both approaches.

@proceedings{204,
  author = {Alta de Waal and Keunyoung Yoo},
  title = {Latent Variable Bayesian Networks Constructed Using Structural Equation Modelling},
  abstract = {Bayesian networks in fusion systems often contain latent variables. They play an important role in fusion systems as they provide context which lead to better choices of data sources to fuse. Latent variables in Bayesian networks are mostly constructed by means of expert knowledge modelling.We propose using theory-driven structural equation modelling (SEM) to identify and structure latent variables in a Bayesian network. The linking of SEM and Bayesian networks is motivated by the fact that both methods can be shown to be causal models. We compare this approach to a data-driven approach where latent factors are induced by means of unsupervised learning. We identify appropriate metrics for URREF ontology criteria for both approaches.},
  year = {2018},
  journal = {2018 21st International Conference on Information Fusion (FUSION)},
  pages = {688-695},
  month = {10/07-13/07},
  publisher = {IEEE},
  isbn = {978-0-9964527-6-2},
  url = {https://ieeexplore.ieee.org/abstract/document/8455240},
}
Watson B. Using CSP to Develop Quality Concurrent Software. 2018. https://doi.org/10.1007/978-3-319-98047-8.

No Abstract

@proceedings{203,
  author = {Bruce Watson},
  title = {Using CSP to Develop Quality Concurrent Software},
  abstract = {No Abstract},
  year = {2018},
  address = {Switzerland},
  isbn = {978-3-319-98046-1},
  url = {https://doi.org/10.1007/978-3-319-98047-8},
}
Pretorius AP, Kroon S, Kamper H. Learning Dynamics of Linear Denoising Autoencoders. 35th International Conference on Machine Learning. 2018.

Denoising autoencoders (DAEs) have proven useful for unsupervised representation learning, but a thorough theoretical understanding is still lacking of how the input noise influences learning. Here we develop theory for how noise influences learning in DAEs. By focusing on linear DAEs, we are able to derive analytic expressions that exactly describe their learning dynamics. We verify our theoretical predictions with simulations as well as experiments on MNIST and CIFAR-10. The theory illustrates how, when tuned correctly, noise allows DAEs to ignore low variance directions in the inputs while learning to reconstruct them. Furthermore, in a comparison of the learning dynamics of DAEs to standard regularised autoencoders, we show that noise has a similar regularisation effect to weight decay, but with faster training dynamics. We also show that our theoretical predictions approximate learning dynamics on real-world data and qualitatively match observed dynamics in nonlinear DAEs.

@proceedings{202,
  author = {Arnold Pretorius and Steve Kroon and H. Kamper},
  title = {Learning Dynamics of Linear Denoising Autoencoders},
  abstract = {Denoising autoencoders (DAEs) have proven useful for unsupervised representation learning, but a thorough theoretical understanding is still lacking of how the input noise influences learning. Here we develop theory for how noise influences learning in DAEs. By focusing on linear DAEs, we are able to derive analytic expressions that exactly describe their learning dynamics. We verify our theoretical predictions with simulations as well as experiments on MNIST and CIFAR-10. The theory illustrates how, when tuned correctly, noise allows DAEs to ignore low variance directions in the inputs while learning to reconstruct them. Furthermore, in a comparison of the learning dynamics of DAEs to standard regularised autoencoders, we show that noise has a similar regularisation effect to weight decay, but with faster training dynamics. We also show that our theoretical predictions approximate learning dynamics on real-world data and qualitatively match observed dynamics in nonlinear DAEs.},
  year = {2018},
  journal = {35th International Conference on Machine Learning},
  pages = {4141-4150},
  month = {10/07-15/07},
  publisher = {Proceedings of Machine Learning Research (PMLR)},
  address = {Sweden},
  isbn = {1938-7228},
}
Berglund M, Drewes F, van der Merwe B. The Output Size Problem for String-to-Tree Transducers. Journal of Automata, Languages and Combinatorics. 2018;23(1). https://www.jalc.de/issues/2018/issue_23_1-3/jalc-2018-019-038.php.

The output size problem, for a string-to-tree transducer, is to determine the asymptotic behavior of the function describing the maximum size of output trees, with respect to the length of input strings. We show that the problem to determine, for a given regular expression, the worst-case matching time of a backtracking regular expression matcher, can be reduced to the output size problem. The latter can, in turn, be solved by determining the degree of ambiguity of a non-deterministic finite automaton. Keywords: string-to-tree transducers, output size, backtracking regular expression matchers, NFA ambiguity

@article{201,
  author = {Martin Berglund and F. Drewes and Brink van der Merwe},
  title = {The Output Size Problem for String-to-Tree Transducers},
  abstract = {The output size problem, for a string-to-tree transducer, is to determine the asymptotic behavior of the function describing the maximum size of output trees, with respect to the length of input strings. We show that the problem to determine, for a given regular expression, the worst-case matching time of a backtracking regular expression matcher, can be reduced to the output size problem. The latter can, in turn, be solved by determining the degree of ambiguity of a non-deterministic finite automaton. 
Keywords: string-to-tree transducers, output size, backtracking regular expression matchers, NFA ambiguity},
  year = {2018},
  journal = {Journal of Automata, Languages and Combinatorics},
  volume = {23},
  pages = {19-38},
  issue = {1},
  publisher = {Institut für Informatik, Justus-Liebig-Universität Giessen},
  address = {Germany},
  isbn = {2567-3785},
  url = {https://www.jalc.de/issues/2018/issue_23_1-3/jalc-2018-019-038.php},
}
Casini G, Meyer T, Varzinczak I. Defeasible Entailment: from Rational Closure to Lexicographic Closure and Beyond. 7th International Workshop on Non-Monotonic Reasoning (NMR 2018). 2018. http://orbilu.uni.lu/bitstream/10993/37393/1/NMR2018Paper.pdf.

In this paper we present what we believe to be the first systematic approach for extending the framework for defeasible entailment first presented by Kraus, Lehmann, and Magidor—the so-called KLM approach. Drawing on the properties for KLM, we first propose a class of basic defeasible entailment relations. We characterise this basic framework in three ways: (i) semantically, (ii) in terms of a class of properties, and (iii) in terms of ranks on statements in a knowlege base. We also provide an algorithm for computing the basic framework. These results are proved through various representation results. We then refine this framework by defining the class of rational defeasible entailment relations. This refined framework is also characterised in thee ways: semantically, in terms of a class of properties, and in terms of ranks on statements. We also provide an algorithm for computing the refined framework. Again, these results are proved through various representation results. We argue that the class of rational defeasible entailment relations—a strengthening of basic defeasible entailment which is itself a strengthening of the original KLM proposal—is worthy of the term rational in the sense that all of them can be viewed as appropriate forms of defeasible entailment. We show that the two well-known forms of defeasible entailment, rational closure and lexicographic closure, fall within our rational defeasible framework. We show that rational closure is the most conservative of the defeasible entailment relations within the framework (with respect to subset inclusion), but that there are forms of defeasible entailment within our framework that are more “adventurous” than lexicographic closure.

@proceedings{200,
  author = {Giovanni Casini and Thomas Meyer and Ivan Varzinczak},
  title = {Defeasible Entailment: from Rational Closure to Lexicographic Closure and Beyond},
  abstract = {In this paper we present what we believe to be the first systematic approach for extending the framework for defeasible entailment first presented by Kraus, Lehmann, and Magidor—the so-called KLM approach. Drawing on the properties for KLM, we first propose a class of basic defeasible entailment relations. We characterise this basic framework in three ways: (i) semantically, (ii) in terms of a class of properties, and (iii) in terms of ranks on statements in a knowlege base. We also provide an algorithm for computing the basic framework. These results are proved through various representation results. We then refine this framework by defining the class of rational defeasible entailment relations. This refined framework is also characterised in thee ways: semantically, in terms of a class of properties, and in terms of ranks on statements. We also provide an algorithm for computing the refined framework. Again, these results are proved through various representation results.
We argue that the class of rational defeasible entailment relations—a strengthening of basic defeasible entailment which is itself a strengthening of the original KLM proposal—is worthy of the term rational in the sense that all of them can be viewed as appropriate forms of defeasible entailment. We show that the two well-known forms of defeasible entailment, rational closure and lexicographic closure, fall within our rational defeasible framework. We show that rational closure is the most conservative of the defeasible entailment relations within the framework (with respect to subset inclusion), but that there are forms of defeasible entailment within our framework that are more “adventurous” than lexicographic closure.},
  year = {2018},
  journal = {7th International Workshop on Non-Monotonic Reasoning (NMR 2018)},
  pages = {109-118},
  month = {27/10-29/10},
  url = {http://orbilu.uni.lu/bitstream/10993/37393/1/NMR2018Paper.pdf},
}
Berglund M, Drewes F, van der Merwe B. On Regular Expressions with Backreferences and Transducers. 10th Workshop on Non-Classical Models of Automata and Applications (NCMA 2018). 2018.

Modern regular expression matching software features many extensions, some general while some are very narrowly speci ed. Here we consider the generalization of adding a class of operators which can be described by, e.g. nite-state transducers. Combined with backreferences they enable new classes of languages to be matched. The addition of nite-state transducers is shown to make membership testing undecidable. Following this result, we study the complexity of membership testing for various restricted cases of the model.

@proceedings{199,
  author = {Martin Berglund and F. Drewes and Brink van der Merwe},
  title = {On Regular Expressions with Backreferences and Transducers},
  abstract = {Modern regular expression matching software features many extensions, some general while some are very narrowly specied. Here we consider the generalization of adding a class of operators which can be described by, e.g. nite-state transducers. Combined with backreferences they enable new classes of languages to be matched. The addition of nite-state transducers is shown to make membership testing undecidable. Following this result, we study the complexity of membership testing for various restricted cases of the model.},
  year = {2018},
  journal = {10th Workshop on Non-Classical Models of Automata and Applications (NCMA 2018)},
  pages = {1-19},
  month = {21/08-22/08},
}
Rens G, Meyer T, Nayak A. Maximizing Expected Impact in an Agent Reputation Network. 41st German Conference on AI, Berlin, Germany, September 24–28, 2018. 2018. https://www.springer.com/us/book/9783030001100.

We propose a new framework for reasoning about the reputation of multiple agents, based on the partially observable Markov decision process (POMDP). It is general enough for the specification of a variety of stochastic multi-agent system (MAS) domains involving the impact of agents on each other’s reputations. Assuming that an agent must maintain a good enough reputation to survive in the system, a method for an agent to select optimal actions is developed.

@proceedings{198,
  author = {Gavin Rens and Thomas Meyer and A. Nayak},
  title = {Maximizing Expected Impact in an Agent Reputation Network},
  abstract = {We propose a new framework for reasoning about the reputation of multiple agents, based on the partially observable Markov decision process (POMDP). It is general enough for the specification of a variety of stochastic multi-agent system (MAS) domains involving the impact of agents on each other’s reputations. Assuming that an agent must maintain a good enough reputation to survive in the system, a method for an agent to select optimal actions is developed.},
  year = {2018},
  journal = {41st German Conference on AI, Berlin, Germany, September 24–28, 2018},
  pages = {99-106},
  month = {24/09-28/09},
  publisher = {Springer},
  isbn = {978-3-030-00110-0},
  url = {https://www.springer.com/us/book/9783030001100},
}
Casini G, Eduardo F, Meyer T, Varzinczak I. A Semantic Perspective on Belief Change in a Preferential Non-Monotonic Framework. 16th International Conference on Principles of Knowledge Representation and Reasoning. 2018. https://dblp.org/db/conf/kr/kr2018.html.

Belief change and non-monotonic reasoning are usually viewed as two sides of the same coin, with results showing that one can formally be defined in terms of the other. In this paper we investigate the integration of the two formalisms by studying belief change for a (preferential) non-monotonic framework. We show that the standard AGM approach to belief change can be transferred to a preferential non-monotonic framework in the sense that change operations can be defined on conditional knowledge bases. We take as a point of departure the results presented by Casini and Meyer (2017), and we develop and extend such results with characterisations based on semantics and entrenchment relations, showing how some of the constructions defined for propositional logic can be lifted to our preferential non-monotonic framework.

@proceedings{197,
  author = {Giovanni Casini and F. Eduardo and Thomas Meyer and Ivan Varzinczak},
  title = {A Semantic Perspective on Belief Change in a Preferential Non-Monotonic Framework},
  abstract = {Belief change and non-monotonic reasoning are usually viewed as two sides of the same coin, with results showing that one can formally be defined in terms of the other. In this paper we investigate the integration of the two formalisms by studying belief change for a (preferential) non-monotonic framework. We show that the standard AGM approach to belief change can be transferred to a preferential non-monotonic framework in the sense that change operations can be defined on conditional knowledge bases. We take as a point of departure the results presented by Casini and Meyer (2017), and we develop and extend such results with characterisations based on semantics and entrenchment relations, showing how some of the constructions defined for propositional logic can be lifted to our preferential non-monotonic framework.},
  year = {2018},
  journal = {16th International Conference on Principles of Knowledge Representation and Reasoning},
  pages = {220-229},
  month = {27/10-02/11},
  publisher = {AAAI Press},
  address = {United States of America},
  isbn = {978-1-57735-803-9},
  url = {https://dblp.org/db/conf/kr/kr2018.html},
}
van der Merwe B, Berglund M, Bester W. Formalising Boost POSIX Regular Expression Matching. International Colloquium on Theoretical Aspects of Computing. 2018. https://link.springer.com/chapter/10.1007/978-3-030-02508-3_6.

Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, and we analyse the complexity of regular expression matching when using Boost semantics.

@proceedings{196,
  author = {Brink van der Merwe and Martin Berglund and Willem Bester},
  title = {Formalising Boost POSIX Regular Expression Matching},
  abstract = {Whereas Perl-compatible regular expression matchers typically exhibit some variation of leftmost-greedy semantics, those conforming to the posix standard are prescribed leftmost-longest semantics. However, the posix standard leaves some room for interpretation, and Fowler and Kuklewicz have done experimental work to confirm differences between various posix matchers. The Boost library has an interesting take on the posix standard, where it maximises the leftmost match not with respect to subexpressions of the regular expression pattern, but rather, with respect to capturing groups. In our work, we provide the first formalisation of Boost semantics, and we analyse the complexity of regular expression matching when using Boost semantics.},
  year = {2018},
  journal = {International Colloquium on Theoretical Aspects of Computing},
  pages = {99-115},
  month = {17/02},
  publisher = {Springer},
  isbn = {978-3-030-02508-3},
  url = {https://link.springer.com/chapter/10.1007/978-3-030-02508-3_6},
}
Ndaba M, Pillay A, Ezugwu A. An Improved Generalized Regression Neural Network for Type II Diabetes Classification. In: Iccsa 2018, Lncs 10963. 10963rd ed. Springer International Publishing AG; 2018.

This paper proposes an improved Generalized Regression Neural Network (KGRNN) for the diagnosis of type II diabetes. Dia- betes, a widespread chronic disease, is a metabolic disorder that develops when the body does not make enough insulin or is unable to use insulin effectively. Type II diabetes is the most common type and accounts for an estimated 90% of cases. The novel KGRNN technique reported in this study uses an enhanced K-Means clustering technique (CVE-K-Means) to produce cluster centers (centroids) that are used to train the network. The technique was applied to the Pima Indian diabetes dataset, a widely used benchmark dataset for Diabetes diagnosis. The technique outper- forms the best known GRNN techniques for Type II diabetes diagnosis in terms of classification accuracy and computational time and obtained a classification accuracy of 86% with 83% sensitivity and 87% specificity. The Area Under the Receiver Operating Characteristic Curve (ROC) of 87% was obtained.

@inbook{195,
  author = {Moeketsi Ndaba and Anban Pillay and Absalom Ezugwu},
  title = {An Improved Generalized Regression Neural Network for Type II Diabetes Classification},
  abstract = {This paper proposes an improved Generalized Regression Neural Network (KGRNN) for the diagnosis of type II diabetes. Dia- betes, a widespread chronic disease, is a metabolic disorder that develops when the body does not make enough insulin or is unable to use insulin effectively. Type II diabetes is the most common type and accounts for an estimated 90% of cases. The novel KGRNN technique reported in this study uses an enhanced K-Means clustering technique (CVE-K-Means) to produce cluster centers (centroids) that are used to train the network. The technique was applied to the Pima Indian diabetes dataset, a widely used benchmark dataset for Diabetes diagnosis. The technique outper- forms the best known GRNN techniques for Type II diabetes diagnosis in terms of classification accuracy and computational time and obtained a classification accuracy of 86% with 83% sensitivity and 87% specificity. The Area Under the Receiver Operating Characteristic Curve (ROC) of 87% was obtained.},
  year = {2018},
  journal = {ICCSA 2018, LNCS 10963},
  edition = {10963},
  pages = {659-671},
  publisher = {Springer International Publishing AG},
  isbn = {3319951718},
}
Jembere E, Rawatlal R, Pillay A. Matrix Factorisation for Predicting Student Performance. 2017 7th World Engineering Education Forum (WEEF). 2018.

Predicting student performance in tertiary institutions has potential to improve curriculum advice given to students, the planning of interventions for academic support and monitoring and curriculum design. The student performance prediction problem, as defined in this study, is the prediction of a student’s mark for a module, given the student’s performance in previously attempted modules. The prediction problem is amenable to machine learning techniques, provided that sufficient data is available for analysis. This work reports on a study undertaken at the College of Agriculture, Engineering and Science at University of KwaZulu-Natal that investigates the efficacy of Matrix Factorization as a technique for solving the prediction problem. The study uses Singular Value Decomposition (SVD), a Matrix Factorization technique that has been successfully used in recommender systems. The performance of the technique was benchmarked against the use of student and course average marks as predictors of performance. The results obtained suggests that Matrix Factorization performs better than both benchmarks.

@proceedings{194,
  author = {Edgar Jembere and Randhir Rawatlal and Anban Pillay},
  title = {Matrix Factorisation for Predicting Student Performance},
  abstract = {Predicting student performance in tertiary institutions has potential to improve curriculum advice given to students, the planning of interventions for academic support and monitoring and curriculum design. The student performance prediction problem, as defined in this study, is the prediction of a student’s mark for a module, given the student’s performance in previously attempted modules. The prediction problem is amenable to machine learning techniques, provided that sufficient data is available for analysis. This work reports on a study undertaken at the College of Agriculture, Engineering and Science at University of KwaZulu-Natal that investigates the efficacy of Matrix Factorization as a technique for solving the prediction problem. The study uses Singular Value Decomposition (SVD), a Matrix Factorization technique that has been successfully used in recommender systems. The performance of the technique was benchmarked against the use of student and course average marks as predictors of performance. The results obtained suggests that Matrix Factorization performs better than both benchmarks.},
  year = {2018},
  journal = {2017 7th World Engineering Education Forum (WEEF)},
  pages = {513-518},
  month = {13/11-16/11},
  publisher = {IEEE},
  isbn = {978-1-5386-1523-2},
}
Ndaba M, Pillay A, Ezugwu A. A Comparative Study of Machine Learning Techniques for Classifying Type II Diabetes Mellitus. 2018;MSc.

Diabetes is a metabolic disorder that develops when the body does not make enough insulin or is not able to use insulin effectively. Accurate and early detection of diabetes can aid in effective management of the disease. Several machine learning techniques have shown promise as cost ef- fective ways for early diagnosis of the disease to reduce the occurrence of health complications arising due to delayed diagnosis. This study compares the efficacy of three broad machine learning approaches; viz. Artificial Neural Networks (ANNs), Instance-based classification technique, and Statistical Regression to diagnose type II diabetes. For each approach, this study proposes novel techniques that extend the state of the art. The new techniques include Artificial Neural Networks hybridized with an improved K-Means clustering and a boosting technique; improved variants of Logistic Regression (LR), K-Nearest Neighbours algorithm (KNN), and K-Means clustering. The techniques were evaluated on the Pima Indian diabetes dataset and the results were compared to recent results reported in the literature. The highest classification accuracy of 100% with 100% sensitivity and 100% specificity were achieved using an ensemble of the Boosting technique, the enhanced K-Means clustering algorithm (CVE-K-Means) and the Generalized Regression Neu- ral Network (GRNN): B-KGRNN. A hybrid of CVE-K-Means algorithm and GRNN (KGRNN) achieved the best accuracy of 86% with 83% sensitivity. The improved LR model (LR-n) achieved the highest classification accuracy of 84% with 72% sensitivity. The new multi-layer percep- tron (MLP-BPX) achieved the best accuracy of 82% and 72% sensitivity. A hybrid of KNN and CVE-K-Means (CKNN) technique achieved the best accuracy of 81% and 89% sensitivity. CVE- K-Means technique achieved the best accuracy of 80% and 61% sensitivity. The B-KGRNN, KGRNN, LR-n, and CVE-K-Means technique outperformed similar techniques in literature in terms of classification accuracy by 15%, 1%, 2%, and 3% respectively. CKNN and KGRNN tech- nique proved to have less computational complexity compared to the standard KNN and GRNN algorithm. Employing data pre-processing techniques such as feature extraction and missing value removal improved the classification accuracy of machine learning techniques by more than 11% in most instances.

@phdthesis{192,
  author = {Moeketsi Ndaba and Anban Pillay and Absalom Ezugwu},
  title = {A Comparative Study of Machine Learning Techniques for Classifying Type II Diabetes Mellitus},
  abstract = {Diabetes is a metabolic disorder that develops when the body does not make enough insulin or is not able to use insulin effectively. Accurate and early detection of diabetes can aid in effective management of the disease. Several machine learning techniques have shown promise as cost ef- fective ways for early diagnosis of the disease to reduce the occurrence of health complications arising due to delayed diagnosis. This study compares the efficacy of three broad machine learning approaches; viz. Artificial Neural Networks (ANNs), Instance-based classification technique, and Statistical Regression to diagnose type II diabetes. For each approach, this study proposes novel techniques that extend the state of the art. The new techniques include Artificial Neural Networks hybridized with an improved K-Means clustering and a boosting technique; improved variants of Logistic Regression (LR), K-Nearest Neighbours algorithm (KNN), and K-Means clustering. The techniques were evaluated on the Pima Indian diabetes dataset and the results were compared to recent results reported in the literature. The highest classification accuracy of 100% with 100% sensitivity and 100% specificity were achieved using an ensemble of the Boosting technique, the enhanced K-Means clustering algorithm (CVE-K-Means) and the Generalized Regression Neu- ral Network (GRNN): B-KGRNN. A hybrid of CVE-K-Means algorithm and GRNN (KGRNN) achieved the best accuracy of 86% with 83% sensitivity. The improved LR model (LR-n) achieved the highest classification accuracy of 84% with 72% sensitivity. The new multi-layer percep- tron (MLP-BPX) achieved the best accuracy of 82% and 72% sensitivity. A hybrid of KNN and CVE-K-Means (CKNN) technique achieved the best accuracy of 81% and 89% sensitivity. CVE- K-Means technique achieved the best accuracy of 80% and 61% sensitivity. The B-KGRNN, KGRNN, LR-n, and CVE-K-Means technique outperformed similar techniques in literature in terms of classification accuracy by 15%, 1%, 2%, and 3% respectively. CKNN and KGRNN tech- nique proved to have less computational complexity compared to the standard KNN and GRNN algorithm. Employing data pre-processing techniques such as feature extraction and missing value removal improved the classification accuracy of machine learning techniques by more than 11% in most instances.},
  year = {2018},
  volume = {MSc},
}
Waltham M, Moodley D, Pillay A. Q-Cog: A Q-Learning Based Cognitive Agent Architecture for Complex 3D Virtual Worlds. 2018;MSc.

Intelligent cognitive agents requiring a high level of adaptability should contain min- imal initial data and be able to autonomously gather new knowledge from their own experiences. 3D virtual worlds provide complex environments in which autonomous software agents may learn and interact. In many applications within this domain, such as video games and virtual reality, the environment is partially observable and agents must make decisions and react in real-time. Due to the dynamic nature of virtual worlds, adaptability is of great importance for virtual agents. The Reinforce- ment Learning paradigm provides a mechanism for unsupervised learning that allows agents to learn from their own experiences in the environment. In particular, the Q- Learning algorithm allows agents to develop an optimal action-selection policy based on their environment experiences. This research explores the potential of cognitive architectures utilizing Reinforcement Learning whereby agents may contain a library of action-selection policies within virtual environments. The proposed cognitive archi- tecture, Q-Cog, utilizes a policy selection mechanism to develop adaptable 3D virtual agents. Results from experimentation indicates that Q-Cog provides an effective basis for developing adaptive self-learning agents for 3D virtual worlds.

@phdthesis{190,
  author = {Michael Waltham and Deshen Moodley and Anban Pillay},
  title = {Q-Cog: A Q-Learning Based Cognitive Agent  Architecture for Complex 3D Virtual Worlds},
  abstract = {Intelligent cognitive agents requiring a high level of adaptability should contain min- imal initial data and be able to autonomously gather new knowledge from their own experiences. 3D virtual worlds provide complex environments in which autonomous software agents may learn and interact. In many applications within this domain, such as video games and virtual reality, the environment is partially observable and agents must make decisions and react in real-time. Due to the dynamic nature of virtual worlds, adaptability is of great importance for virtual agents. The Reinforce- ment Learning paradigm provides a mechanism for unsupervised learning that allows agents to learn from their own experiences in the environment. In particular, the Q- Learning algorithm allows agents to develop an optimal action-selection policy based on their environment experiences. This research explores the potential of cognitive architectures utilizing Reinforcement Learning whereby agents may contain a library of action-selection policies within virtual environments. The proposed cognitive archi- tecture, Q-Cog, utilizes a policy selection mechanism to develop adaptable 3D virtual agents. Results from experimentation indicates that Q-Cog provides an effective basis for developing adaptive self-learning agents for 3D virtual worlds.},
  year = {2018},
  volume = {MSc},
  publisher = {Durban University},
}
Dzitiro J, Jembere E, Pillay A. A DeepQA Based Real-Time Document Recommender System. Southern Africa Telecommunication Networks and Applications Conference (SATNAC) 2018. 2018.

Recommending relevant documents to users in real- time as they compose their own documents differs from the traditional task of recommending products to users. Variation in the users’ interests as they work on their documents can undermine the effectiveness of classical recommender system techniques that depend heavily on off-line data. This necessitates the use of real-time data gathered as the user is composing a document to determine which documents the user will most likely be interested in. Classical methodologies for evaluating recommender systems are not appropriate for this problem. This paper proposed a methodology for evaluating real-time document recommender system solutions. The proposed method- ology was then used to show that a solution that anticipates a user’s interest and makes only high confidence recommendations performs better than a classical content-based filtering solution. The results obtained using the proposed methodology confirmed that there is a need for a new breed of recommender systems algorithms for real-time document recommender systems that can anticipate the user’s interest and make only high confidence recommendations.

@proceedings{189,
  author = {Joshua Dzitiro and Edgar Jembere and Anban Pillay},
  title = {A DeepQA Based Real-Time Document Recommender System},
  abstract = {Recommending relevant documents to users in real- time as they compose their own documents differs from the traditional task of recommending products to users. Variation in the users’ interests as they work on their documents can undermine the effectiveness of classical recommender system techniques that depend heavily on off-line data. This necessitates the use of real-time data gathered as the user is composing a document to determine which documents the user will most likely be interested in. Classical methodologies for evaluating recommender systems are not appropriate for this problem. This paper proposed a methodology for evaluating real-time document recommender system solutions. The proposed method- ology was then used to show that a solution that anticipates a user’s interest and makes only high confidence recommendations performs better than a classical content-based filtering solution. The results obtained using the proposed methodology confirmed that there is a need for a new breed of recommender systems algorithms for real-time document recommender systems that can anticipate the user’s interest and make only high confidence recommendations.},
  year = {2018},
  journal = {Southern Africa Telecommunication Networks and Applications Conference (SATNAC) 2018},
  pages = {304-309},
  month = {02/09-05/09},
  publisher = {SATNAC},
  address = {South Africa},
}
Harmse H, Britz K, Gerber A. Informative Armstrong RDF datasets for n-ary relations. Formal Ontology in Information Systems: 10th International Conference, Cape Town, South Africa. 2018.

The W3C standardized Semantic Web languages enable users to capture data without a schema in a manner which is intuitive to them. The challenge is that for the data to be useful, it should be possible to query the data and to query it efficiently, which necessitates a schema. Understanding the structure of data is thus important to both users and storage implementers: the structure of the data gives insight to users in how to query the data while storage implementers can use the structure to optimize queries. In this paper we propose that data mining routines can be used to infer candidate n-ary relations with related uniqueness- and null-free constraints, which can be used to construct an informative Armstrong RDF dataset. The benefit of an informative Armstrong RDF dataset is that it provides example data based on the original data which is a fraction of the size of the original data, while capturing the constraints of the original data faithfully. A case study on a DBPedia person dataset showed that the associated informative Armstrong RDF dataset contained 0.00003% of the statements of the original DBPedia dataset.

@proceedings{188,
  author = {Henriette Harmse and Katarina Britz and Aurona Gerber},
  title = {Informative Armstrong RDF datasets for n-ary relations},
  abstract = {The W3C standardized Semantic Web languages enable users to capture data without a schema in a manner which is intuitive to them. The challenge is that for the data to be useful, it should be possible to query the data and to query it efficiently, which necessitates a schema. Understanding the structure of data is thus important to both users and storage implementers: the structure of the data gives insight to users in how to query the data while storage implementers can use the structure to optimize queries. In this paper we propose that data mining routines can be used to infer candidate n-ary relations with related uniqueness- and null-free constraints, which can be used to construct an informative Armstrong RDF dataset. The benefit of an informative Armstrong RDF dataset is that it provides example data based on the original data which is a fraction of the size of the original data, while capturing the constraints of the original data faithfully. A case study on a DBPedia person dataset showed that the associated informative Armstrong RDF dataset contained 0.00003% of the statements of the original DBPedia dataset.},
  year = {2018},
  journal = {Formal Ontology in   Information Systems: 10th International Conference, Cape Town, South Africa},
  pages = {187-198},
  month = {17/09-21/09},
  publisher = {IOS Press},
}
Britz K, Varzinczak I. Context and rationality in defeasible subsumption. Foundations of Information and Knowledge Systems: 10th International Symposium FoIKS 2018, Budapest, Hungary. 2018.

Description logics have been extended in a number of ways to support defeasible reasoning in the KLM tradition. Such features include preferential or rational defeasible concept subsumption, and defeasible roles in complex concept descriptions. Semantically, defeasible subsumption is obtained by means of a preference order on objects, while defeasible roles are obtained by adding a preference order to role interpretations. In this paper, we address an important limitation in defeasible extensions of description logics, namely the restriction in the semantics of defeasible concept subsumption to a single preference order on objects. We do this by inducing a modular preference order on objects from each preference order on roles, and use these to relativise defeasible subsumption. This yields a notion of contextualised rational defeasible subsumption, with contexts described by roles. We also provide a semantic construction for and a method for the computation of contextual rational closure, and present a correspondence result between the two.

@proceedings{187,
  author = {Katarina Britz and Ivan Varzinczak},
  title = {Context and rationality in defeasible subsumption},
  abstract = {Description logics have been extended in a number of ways to support defeasible reasoning in the KLM tradition. Such features include preferential or rational defeasible concept subsumption, and defeasible roles in complex concept descriptions. Semantically, defeasible subsumption is obtained by means of a preference order on objects, while defeasible roles are obtained by adding a preference order to role interpretations. In this paper, we address an important limitation in defeasible extensions of description logics, namely the restriction in the semantics of defeasible concept subsumption to a single preference order on objects. We do this by inducing a modular preference order on objects from each preference order on roles, and use these to relativise defeasible subsumption. This yields a notion of contextualised rational defeasible subsumption, with contexts described by roles. We also provide a semantic construction for and a method for the computation of contextual rational closure, and present a correspondence result between the two.},
  year = {2018},
  journal = {Foundations of Information and Knowledge Systems: 10th International Symposium FoIKS 2018, Budapest, Hungary},
  pages = {114-132},
  month = {14/05-18/05},
  publisher = {Springer},
}
Harmse H, Britz K, Gerber A. Generating Armstrong ABoxes for ALC TBoxes. Theoretical Aspects of Computing: 15th International Colloquium, Stellenbosch, South Africa. 2018.

A challenge in ontology engineering is the mismatch in ex- pertise between the ontology engineer and domain expert, which often leads to important constraints not being specified. Domain experts often only focus on specifying constraints that should hold and not on specify- ing constraints that could possibly be violated. In an attempt to bridge this gap we propose the use of “perfect test data”. The generated test data is perfect in that it satisfies all the constraints of an application domain that are required, including ensuring that the test data violates constraints that can be violated. In the context of Description Logic on- tologies we call this test data an “Armstrong ABox”, a notion derived from Armstrong relations in relational database theory. In this paper we detail the theoretical development of Armstrong ABoxes for ALC TBoxes as well as an algorithm for generating such Armstrong ABoxes. The proposed algorithm is based, via the ontology completion algorithm of Baader et al., on attribute exploration in formal concept analysis.

@proceedings{186,
  author = {Henriette Harmse and Katarina Britz and Aurona Gerber},
  title = {Generating Armstrong ABoxes for ALC TBoxes},
  abstract = {A challenge in ontology engineering is the mismatch in ex- pertise between the ontology engineer and domain expert, which often leads to important constraints not being specified. Domain experts often only focus on specifying constraints that should hold and not on specify- ing constraints that could possibly be violated. In an attempt to bridge this gap we propose the use of “perfect test data”. The generated test data is perfect in that it satisfies all the constraints of an application domain that are required, including ensuring that the test data violates constraints that can be violated. In the context of Description Logic on- tologies we call this test data an “Armstrong ABox”, a notion derived from Armstrong relations in relational database theory. In this paper we detail the theoretical development of Armstrong ABoxes for ALC TBoxes as well as an algorithm for generating such Armstrong ABoxes. The proposed algorithm is based, via the ontology completion algorithm of Baader et al., on attribute exploration in formal concept analysis.},
  year = {2018},
  journal = {Theoretical Aspects of   Computing: 15th International Colloquium, Stellenbosch, South Africa},
  pages = {211-230},
  month = {16/10-19/10},
  publisher = {Springer},
}
  • CSIR
  • DSI
  • Covid-19