Portfolio

How Transformers Speak: An Interacting Multi-Particle System Perspective

The Transformer neural network architecture is a cornerstone of many modern state-of-the-art AI systems, from large language models for text generation to image segmentation for autonomous vehicles. Still, little is known about the inner working principles of Transformers and how to interpret them. In this report, we take one step towards opening the black box with a series of empirical evaluations. First, we demonstrate that in the latent space of a Transformer model the tokens cluster over time, indicating a kind of consensus dynamics. Second, we draw a connection to clustered federated consensus-based optimisation, which affords the interpretation of tokens cooperating in groups to evolve towards a consensus point that is most relevant to the group. Our work provides stepping stones for further discoveries that benefit the explainability and trustability of Transformer-based AI applications.

Factory Manipulation with Cooperative Multi-Agent Reinforcement Learning

Efficient factory automation is crucial for modern manufacturing, but traditional pre-programmed approaches of- ten fall short in handling dynamic and complex tasks. This report investigates the application of cooperative multi-agent reinforcement learning to address these challenges in a simulated factory environment. By training multiple robot arms to collab- orate in transporting and sorting objects, we aim to optimise efficiency and adaptability in factory manipulation. Our results show that training successfully cooperating agents is hard and time-consuming as well and can, up to now, only be achieved for a simplified problem setting

Flood Forecasting with Graph Neural Networks

Due to climate change, riverine floods have become increasingly common. Forecasting them requires accurate discharge predictions. In this regard, deep learning methods recently started outperforming classical hydrolog- ical modeling techniques based on differential equations. The current state-the-art approaches treat forecasting at spatially distributed gauge stations as isolated problems. However, incorporating the known river network topology into the model has the potential to leverage the physical relationships between stations. Thus, we propose modeling river discharge for a network of gauging stations with a Graph Neural Network (GNN). To assess the benefit of relating stations to each other, we compare the forecasting performance achieved by different adjacency definitions: no adjacency at all, which is equivalent to existing approaches; binary adja- cency of nearest up-/downstream stations; weighted adjacency according to physical relationships like stream length between stations; and learned adjacency via joint parameterization. Our results show that the model does not benefit from the river network topology information, regardless of the number of layers. The learned edge weights correlate with neither of the static definitions and exhibit no regular pattern. Furthermore, a worst- case analysis shows that the GNN struggles to predict sudden discharge spikes. In employing the Gradient Flow Framework (GRAFF), we find that parameter sharing across layers does not hurt model performance and that a mixture of attractive and repulsive forces act on vertex representations in the latent space of the GNN.

Attraction-Repulsion Dynamics versus Homophily in GNNs

Most established Graph Neural Network architec- tures rely on the input graphs to be homophilic. On heterophilic graphs, they often perform worse than structure-agnostic models. The recently pro- posed Gradient Flow Framework (GraFF) gen- eralises many architectures as discretisations of gradient flow differential equations. Governed by the spectrum of one of its channel-mixing matri- ces, GraFF allows for both attractive and repulsive interactions between node representations in the latent space of a GNN. In this work, we exam- ine the natural relationship between this attrac- tion/repulsion behaviour and the homophily ratio of the input graph. Our findings suggest that this relationship is much weaker than expected.

Noise Schedules for Diffusion Probabilistic Models

Diffusion Probabilistic Models (DPM) gained attention recently due to their math- ematical properties and competitive performance on image generation tasks. A major ingredient in DPMs is the noise schedule which dictates the noising process that a denoising model must learn to undo. We explore and compare different choices for the noise schedule in a common framework on a non-standard dataset.

Deep Active Learning with Artificial Neural Networks for Automatic Detection of Mercury’s Bow Shock and Magnetopause Crossing Signatures in NASA’s MESSENGER Magnetometer Observations

Between 2011 and 2015, NASA ’s MESSENGER spacecraft orbited Mercury, where it collected magnetic measurements through the onboard magnetometer. With these, several studies attempted to geometrically model the exact shape of the planet’s bow shock and magnetopause boundaries. However, despite being highly complex, these static models struggle to adapt to changing environments. In cases like this, where it is necessary to capture fine structures in discrete signals, deep learning has been able to outperform traditional modeling in various applications over the last decade. Hence, we devise a deep neural network that identifies the spacecraft’s bow shock and magne- topause crossings within a local time window of measurements and predicts upcoming crossings beyond that window. The model achieves an overall macro F1 of 0.82 and ac- curacies of 80 % and 88 % on the bow shock and magnetopause crossings, respectively. Furthermore, we employ an active learning paradigm to determine how many Mercury years’ worth of observations are required for a representative model. We find that two Mercury years’ worth of measurements are sufficient for a satisfactory performance, which is only about a tenth of the entire data. This work may be relevant to future re- search concerning the BepiColombo mission by ESA and JAXA, whose space probes will enter orbit around the planet in December 2025.

Finding Single-Source Shortest Paths on Planar Graphs with Nonnegative Edge Weights in Linear Time

The need to find shortest paths in a graph from some fixed source vertex to all other vertices is quite obvious and therefore one of the most important problems in graph theory. For general graphs, the standard way to go is the Dijkstra algorithm. On planar graphs, this approach takes linearithmic time in the number of vertices. However, we present an algorithm published by Henzinger et al. in 1997 that accomplishes the task in linear time on planar graphs.