Statistics@CAIR-UP Research Publications

2019

1.
Wolpe Z, de Waal A. Autoencoding variational Bayes for latent Dirichlet allocation. In: Proceedings of the South African Forum for Artificial Intelligence Research. CEUR Workshop Proceedings; 2019. http://ceur-ws.org/Vol-2540/FAIR2019_paper_33.pdf.

Many posterior distributions take intractable forms and thus require variational inference where analytical solutions cannot be found. Variational Inference and Monte Carlo Markov Chains (MCMC) are established mechanism to approximate these intractable values. An alternative approach to sampling and optimisation for approximation is a direct mapping between the data and posterior distribution. This is made possible by recent advances in deep learning methods. Latent Dirichlet Allocation (LDA) is a model which offers an intractable posterior of this nature. In LDA latent topics are learnt over unlabelled documents to soft cluster the documents. This paper assesses the viability of learning latent topics leveraging an autoencoder (in the form of Autoencoding variational Bayes) and compares the mimicked posterior distributions to that achieved by VI. After conducting various experiments the proposed AEVB delivers inadequate performance. Under Utopian conditions comparable conclusion are achieved which are generally unattainable. Further, model specification becomes increasingly complex and deeply circumstantially dependant - which is in itself not a deterrent but does warrant consideration. In a recent study, these concerns were highlighted and discussed theoretically. We confirm the argument empirically by dissecting the autoencoder’s iterative process. In investigating the autoencoder, we see performance degrade as models grow in dimensionality. Visualization of the autoencoder reveals a bias towards the initial randomised topics.

@{254,
  author = {Zach Wolpe and Alta de Waal},
  title = {Autoencoding variational Bayes for latent Dirichlet allocation},
  abstract = {Many posterior distributions take intractable forms and thus
require variational inference where analytical solutions cannot be found.
Variational Inference and Monte Carlo Markov Chains (MCMC) are established mechanism to approximate these intractable values. An alternative approach to sampling and optimisation for approximation is a direct mapping between the data and posterior distribution. This is made
possible by recent advances in deep learning methods. Latent Dirichlet
Allocation (LDA) is a model which offers an intractable posterior of this
nature. In LDA latent topics are learnt over unlabelled documents to
soft cluster the documents. This paper assesses the viability of learning
latent topics leveraging an autoencoder (in the form of Autoencoding
variational Bayes) and compares the mimicked posterior distributions to
that achieved by VI. After conducting various experiments the proposed
AEVB delivers inadequate performance. Under Utopian conditions comparable conclusion are achieved which are generally unattainable. Further, model specification becomes increasingly complex and deeply circumstantially dependant - which is in itself not a deterrent but does warrant consideration. In a recent study, these concerns were highlighted and
discussed theoretically. We confirm the argument empirically by dissecting the autoencoder’s iterative process. In investigating the autoencoder,
we see performance degrade as models grow in dimensionality. Visualization of the autoencoder reveals a bias towards the initial randomised
topics.},
  year = {2019},
  journal = {Proceedings of the South African Forum for Artificial Intelligence Research},
  pages = {25-36},
  month = {12/09},
  publisher = {CEUR Workshop Proceedings},
  isbn = {1613-0073},
  url = {http://ceur-ws.org/Vol-2540/FAIR2019_paper_33.pdf},
}
1.
Weyer VD, de Waal A, Lechner AM, et al. Quantifying rehabilitation risks for surface‐strip coal mines using a soil compaction Bayesian network in South Africa and Australia: To demonstrate the R2AIN Framework. Integrated Environmental Assessment and Management. 2019;15(2). doi:10.1002/ieam.4128.

Environmental information is acquired and assessed during the environmental impact assessment process for surface‐strip coal mine approval. However, integrating these data and quantifying rehabilitation risk using a holistic multidisciplinary approach is seldom undertaken. We present a rehabilitation risk assessment integrated network (R2AIN™) framework that can be applied using Bayesian networks (BNs) to integrate and quantify such rehabilitation risks. Our framework has 7 steps, including key integration of rehabilitation risk sources and the quantification of undesired rehabilitation risk events to the final application of mitigation. We demonstrate the framework using a soil compaction BN case study in the Witbank Coalfield, South Africa and the Bowen Basin, Australia. Our approach allows for a probabilistic assessment of rehabilitation risk associated with multidisciplines to be integrated and quantified. Using this method, a site's rehabilitation risk profile can be determined before mining activities commence and the effects of manipulating management actions during later mine phases to reduce risk can be gauged, to aid decision making

@article{253,
  author = {Vanessa Weyer and Alta de Waal and Alex Lechner and Corinne Unger and Tim O'Connor and Thomas Baumgartl and Roland Schulze and Wayne Truter},
  title = {Quantifying rehabilitation risks for surface‐strip coal mines using a soil compaction Bayesian network in South Africa and Australia: To demonstrate the R2AIN Framework},
  abstract = {Environmental information is acquired and assessed during the environmental impact assessment process for surface‐strip coal mine approval. However, integrating these data and quantifying rehabilitation risk using a holistic multidisciplinary approach is seldom undertaken. We present a rehabilitation risk assessment integrated network (R2AIN™) framework that can be applied using Bayesian networks (BNs) to integrate and quantify such rehabilitation risks. Our framework has 7 steps, including key integration of rehabilitation risk sources and the quantification of undesired rehabilitation risk events to the final application of mitigation. We demonstrate the framework using a soil compaction BN case study in the Witbank Coalfield, South Africa and the Bowen Basin, Australia. Our approach allows for a probabilistic assessment of rehabilitation risk associated with multidisciplines to be integrated and quantified. Using this method, a site's rehabilitation risk profile can be determined before mining activities commence and the effects of manipulating management actions during later mine phases to reduce risk can be gauged, to aid decision making},
  year = {2019},
  journal = {Integrated Environmental Assessment and Management},
  volume = {15},
  pages = {190-208},
  issue = {2},
  publisher = {Wiley Online},
  doi = {10.1002/ieam.4128},
}

2018

1.
de Waal A, Yoo K. Latent Variable Bayesian Networks Constructed Using Structural Equation Modelling. In: 2018 21st International Conference on Information Fusion (FUSION). IEEE; 2018. https://ieeexplore.ieee.org/abstract/document/8455240.

Bayesian networks in fusion systems often contain latent variables. They play an important role in fusion systems as they provide context which lead to better choices of data sources to fuse. Latent variables in Bayesian networks are mostly constructed by means of expert knowledge modelling.We propose using theory-driven structural equation modelling (SEM) to identify and structure latent variables in a Bayesian network. The linking of SEM and Bayesian networks is motivated by the fact that both methods can be shown to be causal models. We compare this approach to a data-driven approach where latent factors are induced by means of unsupervised learning. We identify appropriate metrics for URREF ontology criteria for both approaches.

@{204,
  author = {Alta de Waal and Keunyoung Yoo},
  title = {Latent Variable Bayesian Networks Constructed Using Structural Equation Modelling},
  abstract = {Bayesian networks in fusion systems often contain latent variables. They play an important role in fusion systems as they provide context which lead to better choices of data sources to fuse. Latent variables in Bayesian networks are mostly constructed by means of expert knowledge modelling.We propose using theory-driven structural equation modelling (SEM) to identify and structure latent variables in a Bayesian network. The linking of SEM and Bayesian networks is motivated by the fact that both methods can be shown to be causal models. We compare this approach to a data-driven approach where latent factors are induced by means of unsupervised learning. We identify appropriate metrics for URREF ontology criteria for both approaches.},
  year = {2018},
  journal = {2018 21st International Conference on Information Fusion (FUSION)},
  pages = {688-695},
  month = {10/07-13/07},
  publisher = {IEEE},
  isbn = {978-0-9964527-6-2},
  url = {https://ieeexplore.ieee.org/abstract/document/8455240},
}

2017

1.
de Waal A, Koen H, de Villiers J, Roodt H. An expert-driven causal model of the rhino poaching problem. Ecological Modelling. 2017;347. https://www.sciencedirect.com/science/article/pii/S0304380016307621.

A significant challenge in ecological modelling is the lack of complete sets of high-quality data. This is especially true in the rhino poaching problem where data is incomplete. Although there are many poaching attacks, they can be spread over a vast surface area such as in the case of the Kruger National Park in South Africa, which is roughly the size of Israel. Bayesian networks are useful reasoning tools and can utilise expert knowledge when data is insufficient or sparse. Bayesian networks allow the modeller to incorporate data, expert knowledge, or any combination of the two. This flexibility of Bayesian networks makes them ideal for modelling complex ecological problems. In this paper an expert-driven model of the rhino poaching problem is presented. The development as well as the evaluation of the model is performed from an expert perspective. Independent expert evaluation is performed in the form of queries that test different scenarios. Structuring the rhino poaching problem as a causal network yields a framework that can be used to reason about the problem, as well as inform the modeller of the type of data that has to be gathered.

@article{191,
  author = {Alta de Waal and Hildegarde Koen and J.P de Villiers and Henk Roodt},
  title = {An expert-driven causal model of the rhino poaching problem},
  abstract = {A significant challenge in ecological modelling is the lack of complete sets of high-quality data. This is especially true in the rhino poaching problem where data is incomplete. Although there are many poaching attacks, they can be spread over a vast surface area such as in the case of the Kruger National Park in South Africa, which is roughly the size of Israel. Bayesian networks are useful reasoning tools and can utilise expert knowledge when data is insufficient or sparse. Bayesian networks allow the modeller to incorporate data, expert knowledge, or any combination of the two. This flexibility of Bayesian networks makes them ideal for modelling complex ecological problems. In this paper an expert-driven model of the rhino poaching problem is presented. The development as well as the evaluation of the model is performed from an expert perspective. Independent expert evaluation is performed in the form of queries that test different scenarios. Structuring the rhino poaching problem as a causal network yields a framework that can be used to reason about the problem, as well as inform the modeller of the type of data that has to be gathered.},
  year = {2017},
  journal = {Ecological Modelling},
  volume = {347},
  pages = {29-39},
  publisher = {Elsevier},
  isbn = {0304-3800},
  url = {https://www.sciencedirect.com/science/article/pii/S0304380016307621},
}
  • CSIR
  • DSI
  • Covid-19