Projects
Our projects focus on the structural analysis of modern learning methods and the principled development of optimization techniques that underpin practical machine learning.
Foundations of Modern Architectures
We analyze modern learning architectures to identify structural biases, implicit constraints, and essential mechanisms that govern model behavior.
Structural Position Bias in Transformers
We provide a formal characterization of structural priors inherent in transformer architectures. Our results prove that the interaction of attention mechanisms, residual connections, and causal masking induces an intrinsic position bias that arises independently of training objectives and data.
In particular, we show that the architecture systematically assigns greater influence to early and late sequence positions, providing a principled explanation for phenomena such as the “lost-in-the-middle” effect.
→ Related publications: [arXiv 2026]
Diffusion Models for Protein Design
We develop principled diffusion-based generative models for the joint design of protein structures and surfaces. Our framework integrates geometric receptor representations with denoising diffusion bridge models to generate ligand surfaces and corresponding backbone structures in a coherent pipeline.
By explicitly linking surface geometry, structural alignment, and diffusion dynamics, the model enforces complementarity and physical plausibility throughout generation. Extensive validation demonstrates that the approach produces structurally viable proteins across diverse design scenarios.
→ Related publication: [NeurIPS 2025]
Reliability Limits of Explanation Methods in Transformer Models
We analyze popular explanation methods that attempt to interpret word embeddings by mapping them to human-interpretable semantic features. These approaches assume that if such features can be predicted accurately from the embeddings, then the embeddings must encode the corresponding knowledge.
We show that this assumption does not hold. The same methods can successfully predict random or unrelated features, demonstrating that prediction accuracy alone does not establish genuine semantic interpretability. This work clarifies structural limitations of embedding-based explanation techniques and challenges widely used evaluation practices in explainable AI.
→ Related publication: [ECML 2025]
Capsule Neural Networks
We performed a structural analysis of capsule neural networks and their dynamic routing mechanism. Our results show that the implicit parse-tree assumption underlying the architecture does not hold in general. We provide both theoretical arguments and empirical evidence demonstrating that this limitation is intrinsic to the architectural mechanism. As a result, the model cannot deliver the representational advantages it was originally designed to achieve.
This work provides a principled explanation for the persistent performance gap between capsule networks and standard convolutional architectures.
→ Related publications: [AAAI 2023]
Optimization and Training Dynamics
We study optimization principles governing the training of classical and deep learning models, connecting theoretical guarantees with empirical performance.
Optimization Methods for Machine Learning
We develop optimization methods grounded in structural principles rather than incremental heuristics. By reducing a broad class of optimization strategies to a small number of fundamental mechanisms, we derive unified frameworks that provide theoretical guarantees while remaining practically efficient in both classical and deep learning settings.
Our optimization framework is designed to be lightweight and transparent: it depends only on NumPy, supports GPU execution, and requires no external solver libraries. In extensive empirical evaluations, it consistently achieves substantially higher computational efficiency than many specialized approaches, while maintaining robustness and reliability across problem classes.
→ Related publications: [ICML 2012], [NeurIPS 2019], [AAAI 2022], [NeurIPS 2022]
→ Software and source code: [GitHub - GenoSolver]
→ Project website: [GENO]
Matrix and Tensor Calculus for Learning Algorithms
Differentiation is at the core of modern machine learning.We developed general algorithms for computing derivatives of matrix and tensor expressions, including higher-order derivatives such as Hessians and multi-output functions. Unlike standard automatic differentiation frameworks (e.g., PyTorch, TensorFlow, JAX), which are optimized for scalar-valued outputs, our approach directly supports general matrix- and tensor-valued expressions.
For scalar-output problems, performance is comparable to reverse-mode automatic differentiation. In non-scalar and structured higher-order settings, our method achieves substantial computational advantages. In extensive benchmarks, we observe speed-ups of up to two orders of magnitude on CPUs and three orders of magnitude on GPUs.
The methods are available through MatrixCalculus.org, which is used by more than 100,000 users annually.
→ Related publications: [NeurIPS 2018], [AAAI 2020]
→ Online service: [MatrixCalculus.org]