Research Publications

2024

Brooks, W. ., Davel, M. H., & Mouton, C. . (2024). Does Simple Trump Complex? Comparing Strategies for Adversarial Robustness in DNNs. Artificial Intelligence Research. SACAIR 2024. Communications in Computer and Information Science, vol 2326. http://doi.org/https://doi.org/10.1007/978-3-031-78255-8_15

@article{520,
  author = {William Brooks and Marelie Davel and Coenraad Mouton},
  title = {Does Simple Trump Complex? Comparing Strategies for Adversarial Robustness in DNNs},
  abstract = {},
  year = {2024},
  journal = {Artificial Intelligence Research. SACAIR 2024. Communications in Computer and Information Science},
  volume = {vol 2326},
  pages = {253 - 269},
  month = {12/2024},
  publisher = {Springer Nature Switzerland},
  address = {Cham},
  doi = {https://doi.org/10.1007/978-3-031-78255-8_15},
}
Potgieter, H. L., Mouton, C. ., & Davel, M. H. (2024). Impact of Batch Normalization on Convolutional Network Representations. Artificial Intelligence Research (SACAIR 2024), vol 2326. http://doi.org/https://doi.org/10.1007/978-3-031-78255-8_14

Batch normalization (BatchNorm) is a popular layer normalization technique used when training deep neural networks. It has been shown to enhance the training speed and accuracy of deep learning models. However, the mechanics by which BatchNorm achieves these benefits is an active area of research, and different perspectives have been proposed. In this paper, we investigate the effect of BatchNorm on the resulting hidden representations, that is, the vectors of activation values formed as samples are processed at each hidden layer. Specifically, we consider the sparsity of these representations, as well as their implicit clustering – the creation of groups of representations that are similar to some extent. We contrast image classification models trained with and without batch normalization and highlight consistent differences observed. These findings highlight that BatchNorm’s effect on representational sparsity is not a significant factor affecting generalization, while the representations of models trained with BatchNorm tend to show more advantageous clustering characteristics.

@article{518,
  author = {Hermanus Potgieter and Coenraad Mouton and Marelie Davel},
  title = {Impact of Batch Normalization on Convolutional Network Representations},
  abstract = {Batch normalization (BatchNorm) is a popular layer normalization technique used when training deep neural networks. It has been shown to enhance the training speed and accuracy of deep learning models. However, the mechanics by which BatchNorm achieves these benefits is an active area of research, and different perspectives have been proposed. In this paper, we investigate the effect of BatchNorm on the resulting hidden representations, that is, the vectors of activation values formed as samples are processed at each hidden layer. Specifically, we consider the sparsity of these representations, as well as their implicit clustering – the creation of groups of representations that are similar to some extent. We contrast image classification models trained with and without batch normalization and highlight consistent differences observed. These findings highlight that BatchNorm’s effect on representational sparsity is not a significant factor affecting generalization, while the representations of models trained with BatchNorm tend to show more advantageous clustering characteristics.},
  year = {2024},
  journal = {Artificial Intelligence Research (SACAIR 2024)},
  volume = {vol 2326},
  pages = {235 - 252},
  month = {12/2024},
  publisher = {Springer Nature Switzerland},
  address = {Cham},
  doi = {https://doi.org/10.1007/978-3-031-78255-8_14},
}
Ramalepe, S. P., Modipa, T. I., & Davel, M. H. (2024). Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset. Artificial Intelligence Research. SACAIR 2024. Communications in Computer and Information Science, vol 2326. http://doi.org/https://doi.org/10.1007/978-3-031-78255-8_19

Due to the scarcity of data in low-resourced languages, the development of language models for these languages has been very slow. Currently, pre-trained language models have gained popularity in natural language processing, especially, in developing domain-specific models for low-resourced languages. In this study, we experiment with the impact of using occlusion-based techniques when training a language model for a text generation task. We curate 2 new datasets, the Sepedi monolingual (SepMono) dataset from several South African resources and the Sepedi radio news (SepNews) dataset from the radio news domain. We use the SepMono dataset to pre-train transformer-based models using the occlusion and non-occlusion pre-training techniques and compare performance. The SepNews dataset is specifically used for fine-tuning. Our results show that the non-occlusion models perform better compared to the occlusion-based models when measuring validation loss and perplexity. However, analysis of the generated text using the BLEU score metric, which measures the quality of the generated text, shows a slightly higher BLEU score for the occlusion-based models compared to the non-occlusion models.

@article{517,
  author = {Simon Ramalepe and Thipe Modipa and Marelie Davel},
  title = {Pre-training a Transformer-Based Generative Model Using a Small Sepedi Dataset},
  abstract = {Due to the scarcity of data in low-resourced languages, the development of language models for these languages has been very slow. Currently, pre-trained language models have gained popularity in natural language processing, especially, in developing domain-specific models for low-resourced languages. In this study, we experiment with the impact of using occlusion-based techniques when training a language model for a text generation task. We curate 2 new datasets, the Sepedi monolingual (SepMono) dataset from several South African resources and the Sepedi radio news (SepNews) dataset from the radio news domain. We use the SepMono dataset to pre-train transformer-based models using the occlusion and non-occlusion pre-training techniques and compare performance. The SepNews dataset is specifically used for fine-tuning. Our results show that the non-occlusion models perform better compared to the occlusion-based models when measuring validation loss and perplexity. However, analysis of the generated text using the BLEU score metric, which measures the quality of the generated text, shows a slightly higher BLEU score for the occlusion-based models compared to the non-occlusion models.},
  year = {2024},
  journal = {Artificial Intelligence Research. SACAIR 2024. Communications in Computer and Information Science},
  volume = {vol 2326},
  pages = {319-333},
  month = {12/2024},
  publisher = {Springer Nature Switzerland},
  address = {Cham},
  doi = {https://doi.org/10.1007/978-3-031-78255-8_19},
}
Ngorima, S. A., Helberg, A. S. J., & Davel, M. H. (2024). Neural Network-Based Vehicular Channel Estimation Performance: Effect of Noise in the Training Set. Artificial Intelligence Research. SACAIR 2024. Communications in Computer and Information Science, vol 2326. http://doi.org/https://doi.org/10.1007/978-3-031-78255-8_12

Vehicular communication systems face significant challenges due to high mobility and rapidly changing environments, which affect the channel over which the signals travel. To address these challenges, neural network (NN)-based channel estimation methods have been suggested. These methods are primarily trained on high signal-to-noise ratio (SNR) with the assumption that training a NN in less noisy conditions can result in good generalisation. This study examines the effectiveness of training NN-based channel estimators on mixed SNR datasets compared to training solely on high SNR datasets, as seen in several related works. Estimators evaluated in this work include an architecture that uses convolutional layers and self-attention mechanisms; a method that employs temporal convolutional networks and data pilot-aided estimation; two methods that combine classical methods with multilayer perceptrons; and the current state-of-the-art model that combines Long-Short-Term Memory networks with data pilot-aided and temporal averaging methods as post processing. Our results indicate that using only high SNR data for training is not always optimal, and the SNR range in the training dataset should be treated as a hyperparameter that can be adjusted for better performance. This is illustrated by the better performance of some models in low SNR conditions when trained on the mixed SNR dataset, as opposed to when trained exclusively on high SNR data.

@article{516,
  author = {Simbarashe Ngorima and Albert Helberg and Marelie Davel},
  title = {Neural Network-Based Vehicular Channel Estimation Performance: Effect of Noise in the Training Set},
  abstract = {Vehicular communication systems face significant challenges due to high mobility and rapidly changing environments, which affect the channel over which the signals travel. To address these challenges, neural network (NN)-based channel estimation methods have been suggested. These methods are primarily trained on high signal-to-noise ratio (SNR) with the assumption that training a NN in less noisy conditions can result in good generalisation. This study examines the effectiveness of training NN-based channel estimators on mixed SNR datasets compared to training solely on high SNR datasets, as seen in several related works. Estimators evaluated in this work include an architecture that uses convolutional layers and self-attention mechanisms; a method that employs temporal convolutional networks and data pilot-aided estimation; two methods that combine classical methods with multilayer perceptrons; and the current state-of-the-art model that combines Long-Short-Term Memory networks with data pilot-aided and temporal averaging methods as post processing. Our results indicate that using only high SNR data for training is not always optimal, and the SNR range in the training dataset should be treated as a hyperparameter that can be adjusted for better performance. This is illustrated by the better performance of some models in low SNR conditions when trained on the mixed SNR dataset, as opposed to when trained exclusively on high SNR data.},
  year = {2024},
  journal = {Artificial Intelligence Research. SACAIR 2024. Communications in Computer and Information Science},
  volume = {vol 2326},
  pages = {192 - 206},
  month = {12/2024},
  publisher = {Springer Nature Switzerland},
  address = {Cham},
  isbn = {978-3-031-78255-8},
  doi = {https://doi.org/10.1007/978-3-031-78255-8_12},
}
Ngorima, S. A., Helberg, A. S. J., & Davel, M. H. (2024). A Data Pilot-Aided Temporal Convolutional Network for Channel Estimation in IEEE 802.11p Vehicle-to-Vehicle Communications. Southern Africa Telecommunication Networks and Applications Conference (SATNAC).

In modern communication systems, having an accurate channel estimator is crucial. However, when there is mobility, it becomes difficult to estimate the channel and the pilot signals, which are used for channel estimation, become insufficient. In this paper, we introduce the use of Temporal
Convolutional Networks (TCNs) with data pilot-aided (DPA) channel estimation and temporal averaging (TA) to estimate vehicle-to-vehicle same direction with Wall (VTV-SDWW) channels. The TCN-DPA-TA estimator showed an improvement in Bit Error Rate (BER) performance of up to 1 order of magnitude. Furthermore, the BER performance of the TCN-DPA without TA also improved by up to 0.7 magnitude compared to the best classical estimator.

@article{515,
  author = {Simbarashe Ngorima and Albert Helberg and Marelie Davel},
  title = {A Data Pilot-Aided Temporal Convolutional Network for Channel Estimation in IEEE 802.11p Vehicle-to-Vehicle Communications},
  abstract = {In modern communication systems, having an accurate channel estimator is crucial. However, when there is mobility, it becomes difficult to estimate the channel and the pilot signals, which are used for channel estimation, become insufficient. In this paper, we introduce the use of TemporalConvolutional Networks (TCNs) with data pilot-aided (DPA) channel estimation and temporal averaging (TA) to estimate vehicle-to-vehicle same direction with Wall (VTV-SDWW) channels. The TCN-DPA-TA estimator showed an improvement in Bit Error Rate (BER) performance of up to 1 order of magnitude. Furthermore, the BER performance of the TCN-DPA without TA also improved by up to 0.7 magnitude compared to the best classical estimator.},
  year = {2024},
  journal = {Southern Africa Telecommunication Networks and Applications Conference (SATNAC)},
  pages = {356–361},
}
Mouton, C. ., Rabe, R. ., Haasbroek, D. G., Theunissen, M. W., Potgieter, H. L., & Davel, M. H. (2024). Is network fragmentation a useful complexity measure?. NeurIPS 2024 Workshop SciForDL.

It has been observed that the input space of deep neural network classifiers can exhibit ‘fragmentation’, where the model function rapidly changes class as the input space is traversed. The severity of this fragmentation tends to follow the double descent curve, achieving a maximum at the interpolation regime. We study this phenomenon in the context of image classification and ask whether fragmentation could be predictive of generalization performance. Using a fragmentation-based complexity measure, we show this to be possible by achieving good performance on the PGDL (Predicting Generalization in Deep Learning) benchmark. In addition, we report on new observations related to fragmentation, namely (i) fragmentation is not limited to the input space but occurs in the hidden representations as well, (ii) fragmentation follows the trends in the validation error throughout training, and (iii) fragmentation is not a direct result of increased weight norms. Together, this indicates that fragmentation is a phenomenon worth investigating further when studying the generalization ability of deep neural networks.

@misc{514,
  author = {Coenraad Mouton and Randle Rabe and Daniël Haasbroek and Marthinus Theunissen and Hermanus Potgieter and Marelie Davel},
  title = {Is network fragmentation a useful complexity measure?},
  abstract = {It has been observed that the input space of deep neural network classifiers can exhibit ‘fragmentation’, where the model function rapidly changes class as the input space is traversed. The severity of this fragmentation tends to follow the double descent curve, achieving a maximum at the interpolation regime. We study this phenomenon in the context of image classification and ask whether fragmentation could be predictive of generalization performance. Using a fragmentation-based complexity measure, we show this to be possible by achieving good performance on the PGDL (Predicting Generalization in Deep Learning) benchmark. In addition, we report on new observations related to fragmentation, namely (i) fragmentation is not limited to the input space but occurs in the hidden representations as well, (ii) fragmentation follows the trends in the validation error throughout training, and (iii) fragmentation is not a direct result of increased weight norms. Together, this indicates that fragmentation is a phenomenon worth investigating further when studying the generalization ability of deep neural networks.},
  year = {2024},
  journal = {NeurIPS 2024 Workshop SciForDL},
  month = {12/2024},
}
le Roux, V. ., Davel, M. H., & Bosman, J. . (2024). Parsimonious airfoil Parameterisation: A deep learning framework with Bidirectional LSTM and Gaussian Mixture models. Expert Systems With Applications, 255. http://doi.org/https://doi.org/10.1016/j.eswa.2024.124726

The choice of airfoil parameterisation method significantly influences the overall wing optimisation performance by affecting the flexibility and computational efficiency of the process. Ideally, one should be able to intuitively constrain airfoil shape and structural characteristics as input to the optimisation process. Current parameterisation techniques lack the flexibility to generate airfoils efficiently by specifying parsimonious shape and structural features. To address this limitation, a deep learning framework is proposed, enabling conditional airfoil generation from an airfoil’s shape and structural feature definition. Specifically, we demonstrate the application of Bidirectional Long Short Term Memory models and Bayesian Gaussian Mixture models to derive airfoil coordinates from a compact set of shape and structural characteristics that we define. The proposed framework is shown to achieve favorable airfoil performance optimisation due to improved exploration and exploitation of the design space, compared to traditional approaches. Overall, the proposed optimisation framework is able to realise a 9.04% performance improvement over an airfoil design optimised with traditional parameterisation techniques.

@article{513,
  author = {Vincent le Roux and Marelie Davel and Johan Bosman},
  title = {Parsimonious airfoil Parameterisation: A deep learning framework with Bidirectional LSTM and Gaussian Mixture models},
  abstract = {The choice of airfoil parameterisation method significantly influences the overall wing optimisation performance by affecting the flexibility and computational efficiency of the process. Ideally, one should be able to intuitively constrain airfoil shape and structural characteristics as input to the optimisation process. Current parameterisation techniques lack the flexibility to generate airfoils efficiently by specifying parsimonious shape and structural features. To address this limitation, a deep learning framework is proposed, enabling conditional airfoil generation from an airfoil’s shape and structural feature definition. Specifically, we demonstrate the application of Bidirectional Long Short Term Memory models and Bayesian Gaussian Mixture models to derive airfoil coordinates from a compact set of shape and structural characteristics that we define. The proposed framework is shown to achieve favorable airfoil performance optimisation due to improved exploration and exploitation of the design space, compared to traditional approaches. Overall, the proposed optimisation framework is able to realise a 9.04% performance improvement over an airfoil design optimised with traditional parameterisation techniques.},
  year = {2024},
  journal = {Expert Systems With Applications},
  volume = {255},
  month = {10 July 2024},
  doi = {https://doi.org/10.1016/j.eswa.2024.124726},
}
Mouton, C. ., Theunissen, M. W., & Davel, M. H. (2024). Input margins can predict generalization too. In In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence (AAAI’24/IAAI’24/E. AAAI Conference on Artificial Intelligence (AAAI).

Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or its representation internal to the network. While margins have been shown to be correlated with the generalization ability of a model when measured at its hidden representations (hidden margins), no such link between large margins and generalization has been established for input margins. We show that while input margins are not generally predictive of generalization, they can be if the search space is appropriately constrained. We develop such a measure based on input margins, which we refer to as ‘constrained margins’. The predictive power of this new measure is demonstrated on the ‘Predicting Generalization in Deep Learning’ (PGDL) dataset and contrasted with hidden representation margins. We find that constrained margins achieve highly competitive scores and outperform other margin measurements in general. This provides a novel insight on the relationship between generalization and classification margins, and highlights the importance of considering the data manifold for investigations of generalization in DNNs

@inbook{512,
  author = {Coenraad Mouton and Marthinus Theunissen and Marelie Davel},
  title = {Input margins can predict generalization too},
  abstract = {Understanding generalization in deep neural networks is an active area of research. A promising avenue of exploration has been that of margin measurements: the shortest distance to the decision boundary for a given sample or its representation internal to the network. While margins have been shown to be correlated with the generalization ability of a model when measured at its hidden representations (hidden margins), no such link between large margins and generalization has been established for input margins. We show that while input margins are not generally predictive of generalization, they can be if the search space is appropriately constrained. We develop such a measure based on input margins, which we refer to as ‘constrained margins’. The predictive power of this new measure is demonstrated on the ‘Predicting Generalization in Deep Learning’ (PGDL) dataset and contrasted with hidden representation margins. We find that constrained margins achieve highly competitive scores and outperform other margin measurements in general. This provides a novel insight on the relationship between generalization and classification margins, and highlights the importance of considering the data manifold for investigations of generalization in DNNs},
  year = {2024},
  journal = {In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence (AAAI'24/IAAI'24/E},
  pages = {14379 - 14387},
  month = {20 February 2024},
  publisher = {AAAI Conference on Artificial Intelligence (AAAI)},
}

2023

Davel, M. H., Lotz, S. ., Theunissen, M. W., de Villiers, A. ., Grant, C. ., Rabe, R. ., Schoombie, S. ., & Conacher, C. . (2023). Knowledge Discovery in Time Series Data. In Deep Learning Indaba 2023.

• Complex time series data often encountered in scientific and engineering domains. • Deep learning (DL) is particularly successful here: – large data sets, multivariate input and/or ouput, – highly complex sequences of interactions. • Model interpretability: – Ability to understand a model’s decisions in a given context [1]. – Techniques typically not originally developed for time series data. – Time series interpretations themselves become uninterpretable. • Knowledge Discovery: – DL has potential to reveal interesting patterns in large data sets. – Potential to produce novel insights about the task itself [2, 3]. • ‘know-it’: Collaborative project that studies knowledge discovery in time series data.

@{507,
  author = {Marelie Davel and Stefan Lotz and Marthinus Theunissen and Almaro de Villiers and Chara Grant and Randle Rabe and Stefan Schoombie and Cleo Conacher},
  title = {Knowledge Discovery in Time Series Data},
  abstract = {• Complex time series data often encountered in scientific and engineering domains.
• Deep learning (DL) is particularly successful here:
– large data sets, multivariate input and/or ouput,
– highly complex sequences of interactions.
• Model interpretability:
– Ability to understand a model’s decisions in a given context [1].
– Techniques typically not originally developed for time series data.
– Time series interpretations themselves become uninterpretable.
• Knowledge Discovery:
– DL has potential to reveal interesting patterns in large data sets.
– Potential to produce novel insights about the task itself [2, 3].
• ‘know-it’: Collaborative project that studies knowledge discovery in
time series data.},
  year = {2023},
  journal = {Deep Learning Indaba 2023},
  month = {September 2023},
}
Olivier, J. C., & Barnard, E. . (2023). Minimum phase finite impulse response filter design. The Institute of Engineering and Technology, 17. http://doi.org/ https://doi.org/10.1049/sil2.12166

The design of minimum phase finite impulse response (FIR) filters is considered. The study demonstrates that the residual errors achieved by current state-of-the-art design methods are nowhere near the smallest error possible on a finite resolution digital computer. This is shown to be due to conceptual errors in the literature pertaining to what constitutes a factorable linear phase filter. This study shows that factorisation is possible with a zero residual error (in the absence of machine finite resolution error) if the linear operator or matrix representing the linear phase filter is positive definite. Methodology is proposed able to design a minimum phase filter that is optimal—in the sense that the residual error is limited only by the finite precision of the digital computer, with no systematic error. The study presents practical application of the proposed methodology by designing two minimum phase Chebyshev FIR filters. Results are compared to state-of-the-art methods from the literature, and it is shown that the proposed methodology is able to reduce currently achievable residual errors by several orders of magnitude.

@article{506,
  author = {Jan Olivier and Etienne Barnard},
  title = {Minimum phase finite impulse response filter design},
  abstract = {The design of minimum phase finite impulse response (FIR) filters is considered. The study demonstrates that the residual errors achieved by current state-of-the-art design methods are nowhere near the smallest error possible on a finite resolution digital computer. This is shown to be due to conceptual errors in the literature pertaining to what constitutes a factorable linear phase filter. This study shows that factorisation is possible with a zero residual error (in the absence of machine finite resolution error) if the linear operator or matrix representing the linear phase filter is positive definite. Methodology is proposed able to design a minimum phase filter that is optimal—in the sense that the residual error is limited only by the finite precision of the digital computer, with no systematic error. The study presents practical application of the proposed methodology by designing two minimum phase Chebyshev FIR filters. Results are compared to state-of-the-art methods from the literature, and it is shown that the proposed methodology is able to reduce currently achievable residual errors by several orders of magnitude.},
  year = {2023},
  journal = {The Institute of Engineering and Technology},
  volume = {17},
  edition = {7},
  month = {July 2023},
  doi = {https://doi.org/10.1049/sil2.12166},
}
Ngorima, S. A., Helberg, A. S. J., & Davel, M. H. (2023). Sequence Based Deep Neural Networks for Channel Estimation in Vehicular Communication Systems. In Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science (Vol. 1976). Springer, Cham. http://doi.org/https://doi.org/10.1007/978-3-031-49002-6_12

Channel estimation is a critical component of vehicular communications systems, especially in high-mobility scenarios. The IEEE 802.11p standard uses preamble-based channel estimation, which is not sufficient in these situations. Recent work has proposed using deep neural networks for channel estimation in IEEE 802.11p. While these methods improved on earlier baselines they still can perform poorly, especially in very high mobility scenarios. This study proposes a novel approach that uses two independent LSTM cells in parallel and averages their outputs to update cell states. The proposed approach improves normalised mean square error, surpassing existing deep learning approaches in very high mobility scenarios.

@inbook{504,
  author = {Simbarashe Ngorima and Albert Helberg and Marelie Davel},
  title = {Sequence Based Deep Neural Networks for Channel Estimation in Vehicular Communication Systems},
  abstract = {Channel estimation is a critical component of vehicular communications systems, especially in high-mobility scenarios. The IEEE 802.11p standard uses preamble-based channel estimation, which is not sufficient in these situations. Recent work has proposed using deep neural networks for channel estimation in IEEE 802.11p. While these methods improved on earlier baselines they still can perform poorly, especially in very high mobility scenarios. This study proposes a novel approach that uses two independent LSTM cells in parallel and averages their outputs to update cell states. The proposed approach improves normalised mean square error, surpassing existing deep learning approaches in very high mobility scenarios.},
  year = {2023},
  journal = {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science},
  volume = {1976},
  pages = {176 - 186},
  month = {29 November 2023},
  publisher = {Springer, Cham},
  isbn = {978-3-031-49001-9},
  doi = {https://doi.org/10.1007/978-3-031-49002-6_12},
}
Lotz, S. ., Nel, A. ., Wicks, R. ., Roberts, O. ., Engelbrecht, N. ., Strauss, R. ., Botha, G. ., Kontar, E. ., Pitňa, A. ., & Bale, S. . (2023). The Radial Variation of the Solar Wind Turbulence Spectra near the Kinetic Break Scale from Parker Solar Probe Measurements. In The Astrophysical Journal (2nd ed., Vol. 942). The American Astronomical Society. http://doi.org/10.3847/1538-4357/aca903

In this study we examine the radial dependence of the inertial and dissipation range indices, as well as the spectral break separating the inertial and dissipation range in power density spectra of interplanetary magnetic field fluctuations using Parker Solar Probe data from the fifth solar encounter between ∼0.1 and ∼0.7 au. The derived break wavenumber compares reasonably well with previous estimates at larger radial distances and is consistent with gyro-resonant damping of Alfvénic fluctuations by thermal protons. We find that the inertial scale power-law index varies between approximately −1.65 and −1.45. This is consistent with either the Kolmogorov (−5/3) or Iroshnikov–Kraichnan (−3/2) values, and has a very weak radial dependence with a possible hint that the spectrum becomes steeper closer to the Sun. The dissipation range power-law index, however, has a clear dependence on radial distance (and turbulence age), decreasing from −3 near 0.7 au (4 days) to −4 [±0.3] at 0.1 au (0.75 days) closer to the Sun.

@inbook{503,
  author = {Stefan Lotz and Amore Nel and Robert Wicks and Owen Roberts and Nicholas Engelbrecht and Roelf Strauss and Gert Botha and Eduard Kontar and Alexander Pitňa and Stuart Bale},
  title = {The Radial Variation of the Solar Wind Turbulence Spectra near the Kinetic Break Scale from Parker Solar Probe Measurements},
  abstract = {In this study we examine the radial dependence of the inertial and dissipation range indices, as well as the spectral break separating the inertial and dissipation range in power density spectra of interplanetary magnetic field fluctuations using Parker Solar Probe data from the fifth solar encounter between ∼0.1 and ∼0.7 au. The derived break wavenumber compares reasonably well with previous estimates at larger radial distances and is consistent with gyro-resonant damping of Alfvénic fluctuations by thermal protons. We find that the inertial scale power-law
index varies between approximately −1.65 and −1.45. This is consistent with either the Kolmogorov (−5/3) or Iroshnikov–Kraichnan (−3/2) values, and has a very weak radial dependence with a possible hint that the spectrum becomes steeper closer to the Sun. The dissipation range power-law index, however, has a clear dependence on radial distance (and turbulence age), decreasing from −3 near 0.7 au (4 days) to −4 [±0.3] at 0.1 au (0.75 days) closer to the Sun.},
  year = {2023},
  journal = {The Astrophysical Journal},
  volume = {942},
  edition = {2},
  month = {01/2023},
  publisher = {The American Astronomical Society},
  doi = {10.3847/1538-4357/aca903},
}
Ramalepe, S. ., Modipa, T. I., & Davel, M. H. (2023). The Analysis of the Sepedi-English Code-switched Radio News Corpus. Journal of the Digital Humanities Association of Southern Africa, 4(Vol. 4 No. 01 (2022): Proceedings of the 3rd workshop on Resources for African Indigenous Languages (RAIL). http://doi.org/https://doi.org/10.55492/dhasa.v4i01.4444

Code-switching is a phenomenon that occurs mostly in multilingual countries where multilingual speakers often switch between languages in their conversations. The unavailability of large scale code-switched corpora hampers the development and training of language models for the generation of code-switched text. In this study, we explore the initial phase of collecting and creating Sepedi-English code-switched corpus for generating synthetic news. Radio news and the frequency of code-switching on read news were considered and analysed. We developed and trained a Transformer-based language model using the collected code-switched dataset. We observed that the frequency of code-switched data in the dataset was very low at 1.1%. We complemented our dataset with the news headlines dataset to create a new dataset. Although the frequency was still low, the model obtained the optimal loss rate of 2,361 with an accuracy of 66%.

@article{502,
  author = {Simon Ramalepe and Thipe Modipa and Marelie Davel},
  title = {The Analysis of the Sepedi-English Code-switched Radio News Corpus},
  abstract = {Code-switching is a phenomenon that occurs mostly in multilingual countries where multilingual speakers often switch between languages in
their conversations. The unavailability of large scale code-switched corpora hampers the development and training of language models for the generation of code-switched text. In this study, we explore the initial phase of collecting and creating Sepedi-English code-switched corpus for generating synthetic news. Radio news and the frequency of code-switching on read news were considered and analysed. We developed and trained a Transformer-based language model using the collected code-switched dataset. We observed that the frequency of code-switched data in the dataset was very low at 1.1%. We complemented our dataset with the news headlines dataset to create a new dataset.
Although the frequency was still low, the model obtained the optimal loss rate of 2,361 with an accuracy of 66%.},
  year = {2023},
  journal = {Journal of the Digital Humanities Association of Southern Africa},
  volume = {4},
  edition = {1},
  month = {2023-01-25},
  issue = {Vol. 4 No. 01 (2022): Proceedings of the 3rd workshop on Resources for African Indigenous Languages (RAIL)},
  doi = {https://doi.org/10.55492/dhasa.v4i01.4444},
}
Ramalepe, S. ., Modipa, T. I., & Davel, M. H. (2023). Transformer-based text generation for code-switched Sepedi-English news. In Southern African Conference for Artificial Intelligence Research (SACAIR).

Code-switched data is rarely available in written form and this makes the development of large datasets required to train codeswitched language models difficult. Currently, available Sepedi-English code-switched corpora are not large enough to train a Transformer-based model for this language pair. In prior work, larger synthetic datasets have been constructed using a combination of a monolingual and a parallel corpus to approximate authentic code-switched text. In this study, we develop and analyse a new Sepedi-English news dataset (SepEnews). We collect and curate data from local radio news bulletins and use this to augment two existing sources collected from Sepedi newspapers and news headlines, respectively. We then develop and train a Transformer-based model for generating historic code-switched news, and demonstrate and analyse the system’s performance.

@{501,
  author = {Simon Ramalepe and Thipe Modipa and Marelie Davel},
  title = {Transformer-based text generation for code-switched Sepedi-English news},
  abstract = {Code-switched data is rarely available in written form and this makes the development of large datasets required to train codeswitched language models difficult. Currently, available Sepedi-English code-switched corpora are not large enough to train a Transformer-based
model for this language pair. In prior work, larger synthetic datasets have been constructed using a combination of a monolingual and a parallel
corpus to approximate authentic code-switched text. In this study, we develop and analyse a new Sepedi-English news dataset (SepEnews). We collect and curate data from local radio news bulletins and use this to augment two existing sources collected from Sepedi newspapers and news headlines, respectively. We then develop and train a Transformer-based model for generating historic code-switched news, and demonstrate and analyse the system’s performance.},
  year = {2023},
  journal = {Southern African Conference for Artificial Intelligence Research (SACAIR)},
  pages = {84 - 97},
  month = {December 2023},
}
Middel, C. ., & Davel, M. H. (2023). Comparing Transformer-based and GBDT models on tabular data: A Rossmann Store Sales case study. In Southern African Conference for Artificial Intelligence Research (SACAIR).

Heterogeneous tabular data is a common and important data format. This empirical study investigates how the performance of deep transformer models compares against benchmark gradient boosting decision tree (GBDT) methods, the more typical modelling approach. All models are optimised using a Bayesian hyperparameter optimisation protocol, which provides a stronger comparison than the random grid search hyperparameter optimisation utilized in earlier work. Since feature skewness is typically handled differently for GBDT and transformer-based models, we investigate the effect of a pre-processing step that normalises feature distribution on the model comparison process. Our analysis is based on the Rossmann Store Sales dataset, a widely recognized benchmark for regression tasks.

@{500,
  author = {Coenraad Middel and Marelie Davel},
  title = {Comparing Transformer-based and GBDT models on tabular data: A Rossmann Store Sales case study},
  abstract = {Heterogeneous tabular data is a common and important data format. This empirical study investigates how the performance of deep transformer models compares against benchmark gradient boosting decision tree (GBDT) methods, the more typical modelling approach. All models are optimised using a Bayesian hyperparameter optimisation protocol, which provides a stronger comparison than the random grid search hyperparameter optimisation utilized in earlier work. Since feature skewness is typically handled differently for GBDT and transformer-based models, we investigate the effect of a pre-processing step that normalises feature distribution on the model comparison process. Our analysis is based on the Rossmann Store Sales dataset, a widely recognized benchmark for regression tasks.},
  year = {2023},
  journal = {Southern African Conference for Artificial Intelligence Research (SACAIR)},
  pages = {115 - 129},
  month = {December 2023},
}
Olaifa, M. ., van Vuuren, J. J., Plessis, D. du, & Leenen, L. . (2023). Security Issues in Cyber Threat Intelligence Exchange: A Review. In Computing Conference (Vol. Lecture Notes in Networks and Systems 739).

The cost and time required by individual organizations to build an effective cyber defence can become overwhelming with the growing number of cyber attacks. Hence, the introduction of platforms that encourage collaborative effort in the fight against cyber attacks is considered advantageous. However, the acceptability and efficiency of the CTI exchange platforms is massively challenged by lack of trust caused by security issues encountered in such communities. This review examines the security and participation cost issues revolving around the willingness of participants to either join or actively participate in CTI exchange communities and proposed solutions to the security issues from the research perspective.

@{499,
  author = {Moses Olaifa and Joey van Vuuren and Deon Plessis and Louise Leenen},
  title = {Security Issues in Cyber Threat Intelligence Exchange: A Review},
  abstract = {The cost and time required by individual organizations to
build an effective cyber defence can become overwhelming with the growing
number of cyber attacks. Hence, the introduction of platforms that
encourage collaborative effort in the fight against cyber attacks is considered
advantageous. However, the acceptability and efficiency of the CTI
exchange platforms is massively challenged by lack of trust caused by
security issues encountered in such communities. This review examines
the security and participation cost issues revolving around the willingness
of participants to either join or actively participate in CTI exchange communities
and proposed solutions to the security issues from the research
perspective.},
  year = {2023},
  journal = {Computing Conference},
  volume = {Lecture Notes in Networks and Systems 739},
  month = {20-21 October 2023},
}
Botha, J. ., Pederson, T. ., & Leenen, L. . (2023). An Analysis of the MTI Crypto Investment Scam: User Case . In Proceedings of the 22-nd European Conference on Cyber Warfare and Security (ECCWS).

Since the start of the Covid-19 pandemic, blockchain and cryptocurrency adoption has increased significantly. The adoption rate of blockchain-based technologies has surpassed the Internet adoption rate in the 90s and early 2000s. As this industry has grown significantly, so too has the instances of crypto scams. Numerous cryptocurrency scams exist to exploit users. The generally limited understanding of how cryptocurrencies operate has increased the possible number of scams, relying on people’s misplaced sense of trust and desire for making money quickly and easily. As such, investment scams have also been growing in popularity. Mirror Trading International (MTI) has been named South Africa’s biggest crypto scam in 2020, resulting in losses of $1.7 billion. It is also one of the largest reported international crypto investment scams. This paper focuses on a specific aspect of the MTI scam; an analysis on the fund movements on the blockchain from the perpetrators and members who benefited the most from the scam. The authors used various Open-Source Intelligence (OSINT) tools, alongside QLUE, as well as news articles and blockchain explorers. These tools and techniques are used to follow the money-trial on the blockchain, in search of possible mistakes made by the perpetrator. This could include instances where some personal information might have been leaked. With such disclosed personal information, OSINT tools and investigative techniques can be used to identify the criminals. Due to the CEO of MTI having been arrested, and the case currently being dealt with in the court of law in South Africa, this paper also presents investigative processes that could be followed. Thus, the focus of this paper is to follow the money and consequently propose a process for an investigator to investigate crypto crimes and scams on the blockchain. As the adoption of blockchain technologies continues to increase at unprecedented rates, it is imperative to produce investigative toolkits and use cases to help reduce time spent trying to catch bad actors within the generally anonymous realm of cryptocurrencies

@{498,
  author = {Johnny Botha and Thor Pederson and Louise Leenen},
  title = {An Analysis of the MTI Crypto Investment Scam: User Case},
  abstract = {Since the start of the Covid-19 pandemic, blockchain and cryptocurrency adoption has increased significantly. The adoption rate of blockchain-based technologies has surpassed the Internet adoption rate in the 90s and early 2000s. As this industry has grown significantly, so too has the instances of crypto scams. Numerous cryptocurrency scams exist to exploit users. The generally limited understanding of how cryptocurrencies operate has increased the possible number of scams, relying on people’s misplaced sense of trust and desire for making money quickly and easily. As such, investment scams have also been growing in popularity. Mirror Trading International (MTI) has been named South Africa’s biggest crypto scam in 2020, resulting in losses of $1.7 billion. It is also one of the largest reported international crypto investment scams. This paper focuses on a specific aspect of the MTI scam; an analysis on the fund movements on the blockchain from the perpetrators and members who benefited the most from the scam. The authors used various Open-Source Intelligence (OSINT) tools, alongside QLUE, as well as news articles and blockchain explorers. These tools and techniques are used to follow the money-trial on the blockchain, in search of possible mistakes made by the perpetrator. This could include instances where some personal information might have been leaked. With such disclosed personal information, OSINT tools and investigative techniques can be used to identify the criminals. Due to the CEO of MTI having been arrested, and the case currently being dealt with in the court of law in South Africa, this paper also presents investigative processes that could be followed. Thus, the focus of this paper is to follow the money and consequently propose a process for an investigator to investigate crypto crimes and scams on the blockchain. As the adoption of blockchain technologies continues to increase at unprecedented rates, it is imperative to produce investigative toolkits and use cases to help reduce time spent trying to catch bad actors within the generally anonymous realm of cryptocurrencies},
  year = {2023},
  journal = {Proceedings of the 22-nd European Conference on Cyber Warfare and Security (ECCWS)},
  month = {June 2023},
}
Vorster, J. ., & Leenen, L. . (2023). Consensus Simulator for Organisational Structures. In the 13th International Conference on Simulation and Modelling Methodologies, Technologies and Applications (SimulTech).. Rome, Italy.

In this paper we present a new simulator to investigate consensus within organisations, based on organisational structure, team dynamics, and artefacts. We model agents who can interact with each other and with artefacts, as well as the mathematical models that govern agent behaviour. We show that for a fixed problem size, there is a maximum time within which all agents will reach consensus, independent of number of agents. We present the results from simulating wide ranges of problem sizes and agent group sizes and report on two significant statistics; the time to reach consensus and the effort to reach consensus. The time to reach consensus has implications for project delivery timelines, and the effort relates to project economics.

@{497,
  author = {Johannes Vorster and Louise Leenen},
  title = {Consensus Simulator for Organisational Structures},
  abstract = {In this paper we present a new simulator to investigate consensus within organisations, based on organisational
structure, team dynamics, and artefacts. We model agents who can interact with each other and with artefacts,
as well as the mathematical models that govern agent behaviour. We show that for a fixed problem size, there
is a maximum time within which all agents will reach consensus, independent of number of agents. We present
the results from simulating wide ranges of problem sizes and agent group sizes and report on two significant
statistics; the time to reach consensus and the effort to reach consensus. The time to reach consensus has
implications for project delivery timelines, and the effort relates to project economics.},
  year = {2023},
  journal = {the 13th International Conference on Simulation and Modelling Methodologies, Technologies and Applications (SimulTech).},
  month = {12- 14 2023},
  address = {Rome, Italy},
}
Vorster, J. ., & Leenen, L. . (2023). Exploring the Effects of Subversive Agents on Consensus-Seeking Processes Using a Multi-Agent Simulator . In Proceedings of the 13th International Conference on Simulation and Modelling Methodologies, Technologies and Applications (SimulTech 2023). Portugal: SCITEPRESS - Science and Technology Publications, Lda.

In this paper we explore the effects of subversive agents on the effectiveness of consensus-seeking processes. A subversive agent can try and commit industrial espionage, or, could be a disgruntled employee. The ability of an organisation to effectively execute projects, especially projects within large and complex organisation such as those found in large corporates, governments and military institutions, depend on team members reaching consensus on everything from the project vision through various design phases and eventually project implementation and realisation. What could the effect be of agents trying to subvert such a process in a way that does not raise suspicions? Such an agent cannot openly sabotage the project, but rather tries to influence others in a way that increases the time it takes to reach consensus, thus delaying projects in subtle ways. Here we explore the effect such agents could have on the time and effort to reach consensus though the use of a stochastic Multi-Agent-Simulation (MAS).

@inbook{495,
  author = {Johannes Vorster and Louise Leenen},
  title = {Exploring the Effects of Subversive Agents on Consensus-Seeking Processes Using a Multi-Agent Simulator},
  abstract = {In this paper we explore the effects of subversive agents on the effectiveness of consensus-seeking processes.
A subversive agent can try and commit industrial espionage, or, could be a disgruntled employee. The ability
of an organisation to effectively execute projects, especially projects within large and complex organisation
such as those found in large corporates, governments and military institutions, depend on team members
reaching consensus on everything from the project vision through various design phases and eventually project
implementation and realisation. What could the effect be of agents trying to subvert such a process in a way
that does not raise suspicions? Such an agent cannot openly sabotage the project, but rather tries to influence
others in a way that increases the time it takes to reach consensus, thus delaying projects in subtle ways. Here
we explore the effect such agents could have on the time and effort to reach consensus though the use of a
stochastic Multi-Agent-Simulation (MAS).},
  year = {2023},
  journal = {Proceedings of the 13th International Conference on Simulation and Modelling Methodologies, Technologies and Applications (SimulTech 2023)},
  month = {07/2023},
  publisher = {SCITEPRESS - Science and Technology Publications, Lda},
  address = {Portugal},
}
Botha, J. ., Botha, D. ., & Leenen, L. . (2023). An Analysis of Crypto Scams during the Covid-19 Pandemic: 2020-2022. In Proceedings of the 18th International Conference on Cyber Warfare and Security (ICCWS). Maryland USA, 9-10 March 2023. Academic Publishers.

Blockchain and cryptocurrency adoption has increased significantly since the start of the Covid-19 pandemic. This adoption rate has overtaken the Internet adoption rate in the 90s and early 2000s, but as a result, the instances of crypto scams have also increased. The types of crypto scams reported are typically giveaway scams, rug pulls, phishing scams, impersonation scams, Ponzi schemes as well as pump and dumps. The US Federal Trade Commission (FTC) reported that in May 2021 the number of crypto scams were twelve times higher than in 2020, and the total loss increased by almost 1000%. The FTC also reported that Americans have lost more than $80 million due to cryptocurrency investment scams from October 2019 to October 2020, with victims between the ages of 20 and 39 represented 44% of the reported cases. Social Media has become the go-to place for scammers where attackers hack pre-existing profiles and ask targets’ contacts for payments in cryptocurrency. In 2020, both Joe Biden and Bill Gates’ Twitter accounts were hacked where the hacker posted tweets promising that for all payments sent to a specified address, double the amount will be returned, and this case of fraud was responsible for $100,000 in losses. A similar scheme using Elon Musk’s Twitter account resulted in losses of nearly $2 million. This paper analyses the most significant blockchain and cryptocurrency scams since the start of the Covid-19 pandemic, with the aim of raising awareness and contributing to protection against attacks. Even though the blockchain is a revolutionary technology with numerous benefits, it also poses an international crisis that cannot be ignored.

@inbook{494,
  author = {Johnny Botha and D.P. Botha and Louise Leenen},
  title = {An Analysis of Crypto Scams during the Covid-19 Pandemic: 2020-2022},
  abstract = {Blockchain and cryptocurrency adoption has increased significantly since the start of the Covid-19 pandemic. This adoption rate has overtaken the Internet adoption rate in the 90s and early 2000s, but as a result, the instances of crypto scams have also increased. The types of crypto scams reported are typically giveaway scams, rug pulls, phishing scams, impersonation scams, Ponzi schemes as well as pump and dumps. The US Federal Trade Commission (FTC) reported that in May 2021 the number of crypto scams were twelve times higher than in 2020, and the total loss increased by almost 1000%. The FTC also reported that Americans have lost more than $80 million due to cryptocurrency investment scams from October 2019 to October 2020, with victims between the ages of 20 and 39 represented 44% of the reported cases. Social Media has become the go-to place for scammers where attackers hack pre-existing profiles and ask targets’ contacts for payments in cryptocurrency. In 2020, both Joe Biden and Bill Gates’ Twitter accounts were hacked where the hacker posted tweets promising that for all payments sent to a specified address, double the amount will be returned, and this case of fraud was responsible for $100,000 in losses. A similar scheme using Elon Musk’s Twitter account resulted in losses of nearly $2 million. This paper analyses the most significant blockchain and cryptocurrency scams since the start of the Covid-19 pandemic, with the aim of raising awareness and contributing to protection against attacks. Even though the blockchain is a revolutionary technology with numerous benefits, it also poses an international crisis that cannot be ignored.},
  year = {2023},
  journal = {Proceedings of the 18th International Conference on Cyber Warfare and Security (ICCWS). Maryland USA, 9-10 March 2023},
  month = {2023},
  publisher = {Academic Publishers},
}
Jafta, Y. ., Leenen, L. ., & Meyer, T. . (2023). Investigating Ontology-based Data Access with GitHub. In Lecture Notes in Computer Science 13870 (Proceedings of the 20th Extended Semantic Web Conference) (Vol. 13870). Springer.

Data analysis-based decision-making is performed daily by domain experts. As data grows, getting access to relevant data becomes a challenge. In an approach known as Ontology-based data access (OBDA), AQ1 ontologies are advocated as a suitable formal tool to address complex data access. This technique combines a domain ontology with a data source by using a declarative mapping specification to enable data access using a domain vocabulary.We investigate this approach by studying the theoretical background; conducting a literature review on the implementation of OBDA in production systems; implementing OBDA on a relational dataset using an OBDA tool and; providing results and analysis of query answering.We selected Ontop (https://ontop-vkg.org) to illustrate how this technique enhances the data usage of the GitHub community. AQ2 Ontop is an open-source OBDA tool applied in the domain of relational databases. The implementation consists of the GHTorrent dataset and an extended SemanGit ontology. We perform a set of queries to highlight a subset of the features of this data access approach. The results look positive and can assist various use cases related to GitHub data with a semantic approach. OBDA does provide benefits in practice, such as querying in domain vocabulary and making use of reasoning over the axioms in the ontology. However, the practical impediments we observe are in the “manual” development of a domain ontology and the creation of a mapping specification which requires deep knowledge of a domain and the data. Also, implementing OBDA within the practical context of an information system requires careful consideration for a suitable user interface to facilitate the query construction from ontology vocabulary. Finally, we conclude with a summary of the paper and direction for future research.

@inbook{493,
  author = {Yahlieel Jafta and Louise Leenen and Thomas Meyer},
  title = {Investigating Ontology-based Data Access with GitHub},
  abstract = {Data analysis-based decision-making is performed daily by
domain experts. As data grows, getting access to relevant data becomes a
challenge. In an approach known as Ontology-based data access (OBDA), AQ1
ontologies are advocated as a suitable formal tool to address complex
data access. This technique combines a domain ontology with a data
source by using a declarative mapping specification to enable data access
using a domain vocabulary.We investigate this approach by studying the
theoretical background; conducting a literature review on the implementation
of OBDA in production systems; implementing OBDA on a relational
dataset using an OBDA tool and; providing results and analysis of
query answering.We selected Ontop (https://ontop-vkg.org) to illustrate
how this technique enhances the data usage of the GitHub community. AQ2
Ontop is an open-source OBDA tool applied in the domain of relational
databases. The implementation consists of the GHTorrent dataset and
an extended SemanGit ontology. We perform a set of queries to highlight
a subset of the features of this data access approach. The results look
positive and can assist various use cases related to GitHub data with
a semantic approach. OBDA does provide benefits in practice, such as
querying in domain vocabulary and making use of reasoning over the
axioms in the ontology. However, the practical impediments we observe
are in the “manual” development of a domain ontology and the creation
of a mapping specification which requires deep knowledge of a domain
and the data. Also, implementing OBDA within the practical context
of an information system requires careful consideration for a suitable
user interface to facilitate the query construction from ontology vocabulary.
Finally, we conclude with a summary of the paper and direction
for future research.},
  year = {2023},
  journal = {Lecture Notes in Computer Science 13870 (Proceedings of the 20th Extended Semantic Web Conference)},
  volume = {13870},
  month = {2023},
  publisher = {Springer},
}

2022

Brooks, W. ., Davel, M. H., & Mouton, C. . (2022). Does Simple Trump Complex? Comparing Strategies for Adversarial Robustness in DNNs. In (Vol. vol 2326). Springer Nature Switzerland. http://doi.org/https://doi.org/10.1007/978-3-031-78255-8_15

@inbook{519,
  author = {William Brooks and Marelie Davel and Coenraad Mouton},
  title = {Does Simple Trump Complex? Comparing Strategies for Adversarial Robustness in DNNs},
  abstract = {},
  year = {2022},
  volume = {vol 2326},
  pages = {253 - 269},
  month = {12/2024},
  publisher = {Springer Nature Switzerland},
  doi = {https://doi.org/10.1007/978-3-031-78255-8_15},
}
Ramalepe, S. P., Modipa, T. I., & Davel, M. H. (2022). The development of a Sepedi text generation model using transformers. In Southern Africa Telecommunication Networks and Applications Conference (SATNAC).

Text generation is one of the important sub-tasks of natural language generation (NLG), and aims to produce humanly readable text given some input text. Deep learning approaches based on neural networks have been proposed to solve text generation tasks. Although these models can generate text, they do not necessarily capture long-term dependencies accurately, making it difficult to coherently generate longer sentences. Transformer-based models have shown significant improvement in text generation. However, these models are computationally expensive and data hungry. In this study, we develop a Sepedi text generation model using a Transformerbased approach and explore its performance. The developed model has one Transformer block with causal masking on the attention layers and two separate embedding layers. To train the model, we use the National Centre for Human Language Technology (NCHLT) Sepedi text corpus. Our experimental setup varied the model embedding size, batch size and the sequence length. The final model was able to reconstruct unseen test data with 75% accuracy: the highest accuracy achieved to date, using a Sepedi corpus.

@{511,
  author = {Simon Ramalepe and Thipe Modipa and Marelie Davel},
  title = {The development of a Sepedi text generation model using transformers},
  abstract = {Text generation is one of the important sub-tasks of natural language generation (NLG), and aims to produce humanly readable text given some input text. Deep learning approaches based on neural networks have been proposed to solve text generation tasks. Although these models can generate text, they do not necessarily capture long-term dependencies accurately, making it difficult to coherently generate longer sentences. Transformer-based models have shown significant improvement in text generation. However, these models are computationally expensive and data hungry. In this study, we develop a Sepedi text generation model using a Transformerbased approach and explore its performance. The developed model has one Transformer block with causal masking on the attention layers and two separate embedding layers. To train the model, we use the National Centre for Human Language Technology (NCHLT) Sepedi text corpus. Our experimental setup varied the model embedding size, batch size and the sequence length. The final model was able to reconstruct unseen test data with 75% accuracy: the highest accuracy achieved to date, using a Sepedi corpus.},
  year = {2022},
  journal = {Southern Africa Telecommunication Networks and Applications Conference (SATNAC)},
  pages = {51 - 56},
  month = {August 2022},
}
Oosthuizen, M. ., Hoffman, A. ., & Davel, M. H. (2022). A Comparative Study of Graph Neural Network Speed Prediction during Periods of Congestion. In In Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - NCTA (Vol. 1). http://doi.org/10.5220/0011374100003332

Traffic speed prediction using deep learning has been the topic of many studies. In this paper, we analyse the performance of Graph Neural Network-based techniques during periods of traffic congestion. We first compare a selection of recently proposed techniques that claim to achieve good results using the METR-LA and PeMS-BAY data sets. We then investigate the performance of three of these approaches – Graph WaveNet, Spacetime Neural Network (STNN) and Spatio-Temporal Attention Wavenet (STAWnet) – during congested periods, using recurrent congestion patterns to set a threshold for general congestion through the entire traffic network. Our results show that performance deteriorates significantly during congested time periods, which is concerning, as traffic speed prediction is usually of most value during times of congestion. We also found that, while the above approaches perform almost equally in the absence of congestion, there are much bigger differences in performance during peri ods of congestion.

@{510,
  author = {Marko Oosthuizen and Alwyn Hoffman and Marelie Davel},
  title = {A Comparative Study of Graph Neural Network Speed Prediction during Periods of Congestion},
  abstract = {Traffic speed prediction using deep learning has been the topic of many studies. In this paper, we analyse the performance of Graph Neural Network-based techniques during periods of traffic congestion. We first compare a selection of recently proposed techniques that claim to achieve good results using the METR-LA and PeMS-BAY data sets. We then investigate the performance of three of these approaches – Graph WaveNet, Spacetime Neural Network (STNN) and Spatio-Temporal Attention Wavenet (STAWnet) – during congested periods, using recurrent congestion patterns to set a threshold for general congestion through the entire traffic network. Our results show that performance deteriorates significantly during congested time periods, which is concerning, as traffic speed prediction is usually of most value during times of congestion. We also found that, while the above approaches perform almost equally in the absence of congestion, there are much bigger differences in performance during peri ods of congestion.},
  year = {2022},
  journal = {In Proceedings of the 14th International Joint Conference on Computational Intelligence (IJCCI 2022) - NCTA},
  volume = {1},
  pages = {331 - 338},
  month = {October 2022},
  isbn = {978-989-758-611-8},
  doi = {10.5220/0011374100003332},
}
Oosthuizen, A. J., Helberg, A. S. J., & Davel, M. H. (2022). Adversarial training for channel state information estimation in LTE multi-antenna systems. In Southern African Conference for Artificial Intelligence Research (Vol. 1734). Springer, Cham. http://doi.org/https://doi.org/10.1007/978-3-031-22321-1_1

Deep neural networks can be utilised for channel state information (CSI) estimation in wireless communications. We aim to decrease the bit error rate of such networks without increasing their complexity, since the wireless environment requires solutions with high performance while constraining implementation cost. For this reason, we investigate the use of adversarial training, which has been successfully applied to image super-resolution tasks that share similarities with CSI estimation tasks. CSI estimators are usually trained in a Single-In Single-Out (SISO) configuration to estimate the channel between two specific antennas and then applied to multi-antenna configurations. We show that the performance of neural networks in the SISO training environment is not necessarily indicative of their performance in multi-antenna systems. The analysis shows that adversarial training does not provide advantages in the SISO environment, however, adversarially trained models can outperform non-adversarially trained models when applying antenna diversity to Long-Term Evolution systems. The use of a feature extractor network is also investigated in this study and is found to have the potential to enhance the performance of Multiple-In Multiple-Out antenna configurations at higher SNRs. This study emphasises the importance of testing neural networks in the context of use while also showing possible advantages of adversarial training in multi-antenna systems without necessarily increasing network complexity.

@{509,
  author = {Andrew Oosthuizen and Albert Helberg and Marelie Davel},
  title = {Adversarial training for channel state information estimation in LTE multi-antenna systems},
  abstract = {Deep neural networks can be utilised for channel state information (CSI) estimation in wireless communications. We aim to decrease the bit error rate of such networks without increasing their complexity, since the wireless environment requires solutions with high performance while constraining implementation cost. For this reason, we investigate the use of adversarial training, which has been successfully applied to image super-resolution tasks that share similarities with CSI estimation tasks. CSI estimators are usually trained in a Single-In Single-Out (SISO) configuration to estimate the channel between two specific antennas and then applied to multi-antenna configurations. We show that the performance of neural networks in the SISO training environment is not necessarily indicative of their performance in multi-antenna systems. The analysis shows that adversarial training does not provide advantages in the SISO environment, however, adversarially trained models can outperform non-adversarially trained models when applying antenna diversity to Long-Term Evolution systems. The use of a feature extractor network is also investigated in this study and is found to have the potential to enhance the performance of Multiple-In Multiple-Out antenna configurations at higher SNRs. This study emphasises the importance of testing neural networks in the context of use while also showing possible advantages of adversarial training in multi-antenna systems without necessarily increasing network complexity.},
  year = {2022},
  journal = {Southern African Conference for Artificial Intelligence Research},
  volume = {1734},
  pages = {3 - 17},
  month = {November 2022},
  publisher = {Springer, Cham},
  isbn = {978-3-031-22320-4},
  doi = {https://doi.org/10.1007/978-3-031-22321-1_1},
}
  • CSIR
  • DSI
  • Covid-19