Duke University
Abstract:
In this talk I will present both mathematical and numerical analysis as well as experiments to understand a few basic computational issues in using neural networks, as a particular form of nonlinear representation, and show how the network structure, activation function, and parameter initialization can affect its approximation properties and the learning process. In particular, we propose a structured and balanced approximation using multi-component and multi-layer neural network (MMNN) structure. Using sine as the activation function and an initial scaling strategy, we show that scaled Fourier MMNNs (SFMMNN) have a distinct adaptive property as a nonlinear approximation. Computational examples will be presented to verify our analysis and demonstrate the efficacy of our method.
At the end, I will raise a few issues and challenges when using neural networks in scientific computing.
The Hong Kong Polytechnic University
Abstract:
Neural network (NN) solvers for partial differential equations (PDE) have been widely used in simulating complex systems in various scientific and engineering fields. However, most existing NN solvers mainly focus on satisfying the given PDEs, without explicitly considering intrinsic physical properties such as mass conservation or energy dissipation. This limitation can result in unstable or nonphysical solutions, particularly in long-term simulations. To address this issue, we propose Sidecar, a novel framework that enhances the accuracy and physical consistency of existing NN solvers by incorporating structure-preserving knowledge. This framework builds upon our previously proposed TDSR-ETD method for solving gradient flow problems, which satisfies discrete analogues of the energy-dissipation laws by introducing a time-dependent spectral renormalization (TDSR) factor. Inspired by this approach, our Sidecar framework parameterizes the TDSR factor using a small copilot network, which is trained to guide the existing NN solver in preserving physical structure. This design allows flexible integration of the structure-preserving knowledge into various NN solvers and can be easily extended to different types of PDEs. Our experimental results on a set of benchmark PDEs demonstrate that it improves the existing neural network solvers in terms of accuracy and consistency with structure-preserving properties.
The Chinese University of Hong Kong, Shenzhen
Abstract:
Audition and vision complement each other for perception. Adding visual cues to audio-based speech separation can improve separation performance. This presentation introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the TF-CrossNet architecture, which is a recently proposed deep neural network (DNN) that performs complex spectral mapping for speech separation by leveraging global attention and positional encoding. Complex spectral mapping trains a DNN to directly estimate the real and imaginary spectrograms of the target signal from those of a noisy mixture. To effectively utilize visual cues, the proposed system incorporates pre-extracted visual embeddings and employs a visual encoder comprising temporal convolutional layers. Audio and visual features are fused in an early fusion layer before feeding to AV-CrossNet blocks. We evaluate AV-CrossNet on multiple open datasets, including LRS, VoxCeleb, TCD-TIMIT, and COG-MHEAR challenge. Evaluation results demonstrate that AV-CrossNet advances the state-of-the-art performance in all audiovisual tasks, even on untrained and mismatched datasets.
Lehigh University
Abstract:
Quantum Interior Point Methods (QIPMs) for linear and semi-definite optimization (LO and SDO) problems build on classic polynomial time IPMs. Quantum Computing (QC) inspired to design Inexact Infeasible and Inexact Feasible Primal-Dual, and Inexact Dual IPM variants. These are novel algorithms in both the QC and classic computing environments. Enhancing Quantum Interior Point Methods (QIPMs) with Iterative Refinement (IR) leads to exponential improvements in the worst-case overall running time of QIPMs, compared to previous best-performing QIPMs. We also discuss how the proposed IR scheme can be used in classical inexact IPMs with conjugate gradient methods. Further, the proposed IR scheme exhibits quadratic convergence for LO and SDO towards an optimal solution without any assumption on problem characteristics. On the practical side, IR can be useful to find precise solutions while using inexact LO and SDO solvers.
National University of Singapore
Abstract:
We investigate the convergence properties of a general class of Adam-family methods for minimizing quadratically regularized nonsmooth nonconvex optimization problems, especially in the context of training nonsmooth neural networks with weight decay. Motivated by AdamW, we propose a novel framework for Adam-family methods with decoupled weight decay. Within our framework, the estimators for the first-order and second-order moments of stochastic subgradients are updated independently of the weight decay term. Under mild assumptions and with non-diminishing stepsizes for updating the primary optimization variables, we establish the convergence properties of our proposed framework. In addition, we show that our framework encompasses a wide variety of well-known Adam-family methods, hence offering convergence guarantees for these methods in the training of nonsmooth neural networks. As a practical application of our framework, we propose a method named AdamD (Adam with Decoupled Weight Decay). Numerical experiments demonstrate that AdamD outperforms Adam and is comparable to the popular AdamW, in both the aspects of generalization performance and efficiency.
[Based on joint work with Kuangyu Ding and Nachuan Xiao]
Berlin University of Technology
Abstract:
The price of anarchy (PoA) is a standard measure to quantify the inefficiency of equilibria in congestion games. This is particularly important in transportation networks, where the PoA indicates how well the available network capacity is used. We investigate the PoA under varying transportation demands in arbitrary (atomic and non-atomic) congestion games and show that it converges to 1 very fast with growing total demand for a large class of cost functions, and regardless of which strategies (pure or mixed) a user chooses. This implies that the selfish choice of routes is the best one can do in highly congested transportation networks.
The lecture will be elementary and explain all concepts.
The material presented is based on joint work with Zijun Wu, Yanyan Chen, Dachuan Xu, and Chunying Ren