Andrzej Banburski

What follows here are the (semi-technical) chronicles of the research I've done so far, intermingled with where I would like my work to go.

Machine Learning

My work on the theory of Deep Learning has so far mostly focused on the question of generalization and the apparent absence of "over-fitting" by Deep Neural Networks, defined as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. The results of the work I'm part of suggest that stochastic gradient descent algorithms generically converge to global minima and that simple variations of GD converge to the minimum norm, maximum margin solution, thus providing an implicit mechanism for complexity control. More recently, I have been working on deriving the empirically seen phenomenon of Neural Collapse, which suggests that in late stage of training dynamics of learning in deep networks greatly simplify, with the classifiers and the last layer features taking on a simple self-dual geometrical structure.

A second major direction of my research has been the problem of adversarial attacks on neural networks. Being surrounded by neuroscientists at MIT, I have been working on incorporating ideas from biology as potential defenses against adversaries. CNNs are currently the best models of primate vision, but they are significantly more susceptible to small perturbations - why is that? Is there something missing from these models, or are they completely wrong? So far, we have managed to show that including simple mechanisms of non-uniform sampling in the retina and eccentricity dependence in the cortex do lead to significant improvements in robustness, but there is still a big gap between humans and these systems.

I have also been working with my students on combining the neural network approach together with the work in the field of program synthesis to attempt tackling more abstract, rule-based symbolic reasoning. Such a combination is crucial, as standard neural networks seem to fail spectacularly at learning exact rules. The introduction of the Abstraction and Reasoning Corpus dataset by Chollet has been a perfect testbed for some of our ideas

Theoretical Physics

I started my PhD research by investigating Relative Locality - a phenomenological model of quantum gravity induced modifications to relativistic dynamics of point particles. One of the motivations behind it that I found very enticing is that when we perform experiments, we never actually measure distances, but rather we infer them from energy and momentum of quantum particles we emit and absorb and the timing of these events. This, together with the lesson from lower-dimensional models that gravitational effects modify the standard relativistic dispersion relations, motivate us to study particle dynamics on momentum spaces with non-trivial geometry. In my first work in Relative Locality, I showed that allowing such curvature can have the consequence of breaking global momentum conservation for a single propagating particle. I followed this with a construction of a curved momentum space that preserved full Lorentz symmetry. The non-commutative structure of the result turned out to be related to the discrete Snyder spacetime.

Following my work in phenomenology, I decided to shift my focus to Spin Foam models - a proposal for path integral quantization of General Relativity. The central insight in Spin Foams is that gravity can be described by imposing so-called simplicity constraints onto a topological field theory (BF theory). Using a newly developed spinor representation of spin networks, together with my collaborators, we have discovered a simpler way of imposing these constraints. This allowed us to obtain the first analytical results for behavior of transition amplitudes in 4D quantum gravity under changes of triangulation (more technically, we evaluated the 4-dimensional Pachner moves). This work gives the hope of analytically studying the non-perturbative renormalization of 4D quantum gravity, which is one of the projects I am currently working on.

Another direction I am investigating currently concerns the nature of time in quantum gravitational scenarios. The question that I am curious about is how would one go about operationally defining time intervals in extreme gravity regions. Our current relational definitions that are used in atomic clocks are only valid due to the existence of bound states - these however are not stable in extreme situations.

Mixed Reality

I have spent two summers working at Microsoft Research under Jaron Lanier on mixed reality applications to physics and mathematics research. The first time, I was part of the COMRADRE (Center of Mixed Reality Advanced Development and Research) lab in Redmond, WA. There, we worked on novel applications of mixed reality on hacked together headsets called "Reality Mashers". Apart from smaller applications, like visualizing a hypercube that was superimposed onto a physical cube that a user could rotate, I prototyped an environment for visualizing mathematical equations and performing operations on them using gestures and voice input.

My second internship took place in Mountain View, CA, where I pushed the idea from my previous work further. This time I switched my work to the HoloLens platform and created a multi-user collaborative platform for mathematical research. I explored new ways of interacting with mathematical objects. For example, transformations were represented by virtual magnifying lenses. If a user looks through the lens at a data set, the rendered image is modified by the applied transformation. The interesting aspect of this is that several lenses can be put in sequence, to obtain compositions, or can be reversed to get the inverse operation. This allows multiple users to explore a meta-structure of related mathematical expressions, a task that gets confusing in a 2D medium.

J. Gant, A. Banburski, A. Deza, T. Poggio, "Evaluating the Adversarial Robustness of a Foveated Texture Transform Module in a CNN," submitted to the Shared Visual Representations in Human & Machine Intelligence (SVRHM) workshop at NeurIPS 2021
A. Rangamani, A. Banburski, "Neural Collapse in Deep Homogeneous Classifiers and the role of Weight Decay," submitted to ICASSP 2022
S. Alford, A. Gandhi, A. Rangamani, A. Banburski, T. Wang, S. Dandekar, J. Chin, T. Poggio, P. Chin, "Two Learning Approaches for Abstraction and Reasoning," Complex Systems 2021 Conference
A. Rangamani, M. Xu, A. Banburski, Q. Liao, T. Poggio, "Dynamics and Neural Collapse in Deep Classifiers with the Square Loss," CBMM Memo 117
A. Banburski, F. De La Torre, N. Pant, I. Shastri, T. Poggio, "Margin Distribution: Are All Data Equal?," TOPML 2021 Workshop
A. Banburski, F. De La Torre, N. Pant, I. Shastri, T. Poggio, "Distribution of Classification Margins: Are All Data Equal?," CBMM Memo 115, submitted to AAAI 2021
A. Banburski, A. Gandhi, S. Alford, S. Dandekar, P. Chin, T. Poggio, "Dreaming with ARC," Learning Meets Combinatorial Algorithms Workshop at NeurIPS2020
A. Deza, Q. Liao, A. Banburski, T. Poggio, "Hierarchically Local Tasks and Deep Convolutional Networks," CBMM Memo 109
M. Reddy, A. Banburski, N. Pant, T. Poggio, "Biologically Inspired Mechanisms for Adversarial Robustness," NeurIPS 2020.
T. Poggio, Q. Liao, A. Banburski, "Complexity Control by Gradient Descent in Deep Networks," Nature Communications volume 11, Article number: 1027 (2020).
T. Poggio, A. Banburski, "An overview of some issues in the theory of deep networks," IEEJ Trans Elec Electron Eng, 15: 1560-1571
T. Poggio, G. Kur, A. Banburski, "Double descent in the condition number," CBMM Memo 102, arXiv:1912.06190
T. Poggio, A. Banburski, Q. Liao, "Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization," PNAS June 2020, 201907369.
A. Banburski, Q. Liao, B. Miranda, L. Rosasco, J. Hidary, T. Poggio, "Weight and Batch Normalization implement Classical Generalization Bounds," ICML 2019 Workshop on Understanding and Improving Generalization in Deep Learning.
A. Banburski, Q. Liao, B. Miranda, F. De La Torre, L. Rosasco, J. Hidary, T. Poggio, "Theory III: Dynamics and Generalization in Deep Networks: a simple solution,"
arXiv:1903.04991
Q. Liao, B. Miranda, A. Banburski, J. Hidary, T. Poggio, "A Surprising Linear Relationship Predicts Test Performance in Deep Networks," arXiv:1807.09659
T. Poggio, Q. Liao, B. Miranda, A. Banburski, X. Boix, J. Hidary, "Theory IIIb: Generalization in Deep Networks," arXiv:1806.11379
A. Banburski, "Towards vertex renormalization in 4d Spin Foams ", PhD thesis
J. Lanier, V. Mateevitsi, K. Rathinavel, L. Shapira, J. Menke, P. Therien, J. Hudman, G. Speiginer, A. S. Won, A. Banburski, X. B. Palos, J. A. Fernandez, J. P. Lurashi, W. Chang, "The RealityMashers: Augmented Reality Wide Field-of-View Optical See-Through Head Mounted Displays," ISMAR 2016
A. Banburski, L.Q. Chen, "A simpler way of imposing simplicity constraints," arXiv:1512.05331 [gr-qc], accepted by Physical Review D for publication
A. Banburski, L.Q. Chen, L. Freidel, J. Hnybida, "Pachner moves in a 4d Riemannian holomorphic Spin Foam model," Physical Review D 92, 124014 (2015), arXiv:1412.8247 [gr-qc]
A. Banburski, L. Freidel, "Snyder Momentum Space in Relative Locality," Physical Review D 90, 076010 (2014), arXiv:1308.0300 [gr-qc]
A. Banburski, "Twisting loops and global momentum non-conservation in Relative Locality," Physical Review D 88, 076012 (2013), arXiv:1305.7289 [gr-qc]
A. Banburski, P. Schuster, "The Production and Discovery of True Muonium in Fixed-Target Experiments," Physical Review D 86, 093007 (2012), arXiv:1206.3961 [hep-ph]

Machine Learning

Theoretical Physics

Mixed Reality

Publications