University of Chinese Academy of Sciences
Abstract:
Spanning tree problems with specialized constraints can be difficult to solve in real-world scenarios, often requiring intricate algorithmic design and exponential time. Recently, there has been growing interest in end-to-end deep neural networks for solving routing problems. However, such methods typically produce sequences of vertices, which makes it difficult to apply them to general combinatorial optimization problems where the solution set consists of edges, as in various spanning tree problems. In this talk, we will introduce NeuroPrim, a novel framework for solving various spanning tree problems by defining a Markov Decision Process(MDP) for general combinatorial optimization problems on graphs. Our approach reduces the action and state space using Prim's algorithm and trains the resulting model using REINFORCE. We apply our framework to three difficult problems on Euclidean space: the Degree-constrained Minimum Spanning Tree (DCMST) problem, the Minimum Routing Cost Spanning Tree (MRCST) problem, and the Steiner Tree Problem in graphs (STP). Experimental results on literature instances demonstrate that our model outperforms strong heuristics and achieves small optimality gaps of up to 250 vertices. Additionally, we find that our model has strong generalization ability, with no significant degradation observed on problem instances as large as 1000. Our results suggest that our framework can be effective for solving a wide range of combinatorial optimization problems beyond spanning tree problems.
Fudan University
Abstract: Using one of the typical mental disorders or/and AD as examples, we explore the etiology of the disease with genomic, proteomic and other type of omic data and novel AI algorithms. The discoveries will subsequently enable us to subtype various diseases and then develop different treatment strategies in TMS and drug therapy. Some further developments aiming to integrate micro-, meso- and macroscopic data results are discussed.
Xi’an Jiaotong University
National Engineering Laboratory for Algorithm and Analysis Technologiy on Big Data
Abstract:
The effectiveness of current deep learning methods heavily depends on the high quality of training datasets. However, when training datasets exhibit data biases, such as complex label noise or class imbalance, their effectiveness often cannot be ensured. This challenge is known as the robust learning problem in deep learning. It has become a significant bottleneck that severely restricts the practical application of deep learning in real-world scenarios and represents a critical issue that the field urgently needs to address. This report focuses on discussing sample filtering as a representative methodology for addressing data bias. It introduces how traditional manual weighting methods, designed for a limited range of data bias types, have evolved into automated weighting approaches capable of addressing more diverse data biases in the context of large models. In particular, the report delves into the meta-learning essence underlying this methodology, uncovering its theoretical foundations and revealing its potential generalizability for addressing complex robust deep learning problems in real-world scenarios. Furthermore, it explores the foundational ideas of such methodologies in adaptive learning rate schedule adjustment within optimization algorithms, based on the characteristics of specific problems.
Huawei Technologies Co., Ltd.
Abstract:
In recent years, AI technologies, represented by LLM, has developed rapidly. It is necessary to break through classical Shannon information theory and explore Level-2 information theory, which focuses on semantics. This talk will first review the issues of reliable communication and Shannon capacity. It will then discuss the mathematical model of semantic issues in the era of intelligence. The HGR problem, generalized Gardner capacity and the derivation of the scaling law will be introduced. We hope this talk will inspire researchers to pay more attention to the novel semantic theory in the era of large models and contribute original work.
Shanghai Jiao Tong University
Abstract:
Quantum computers have the potential to gain algebraic and even up to exponential speed up compared with its classical counterparts, and can lead to technology revolution in the 21st century. Since quantum computers are designed based on quantum mechanics principle, they are most suitable to solve the Schrodinger equation, and linear PDEs (and ODEs) evolved by unitary operators. The most efficient quantum PDE solver is quantum simulation based on solving the Schrodinger equation. It will be interesting to explore what other problems in scientific computing, such as ODEs, PDEs, and linear algebra that arise in both classical and quantum systems, can be handled by quantum simulation. We will present a systematic way to develop quantum simulation algorithms for general differential equations. Our basic framework is dimension lifting, that transfers nonlinear PDEs to linear ones, and linear ones to Schrodinger type PDEs. For non-autonomous PDEs and ODEs, or Hamiltonian systems with time-dependent Hamiltonians, we also add an extra dimension to transform them into autonomous PDEs that have only time-independent coefficients, thus quantum simulations can be done without using the cumbersome Dyson’s series and time-ordering operators. Our formulation allows both qubit and qumode (continuous-variable) formulations, and their hybridizations, and provides the foundation for analog quantum computing.
Beijing International Center for Mathematical Research, Peking University
Abstract:
Nature is composed of various complex systems. One characteristic of complex systems is multistability or multiple equilibria. A long-standing problem in computational science is how to efficiently find all possible stationary states of complex systems without relying on random guesses? We introduce a novel concept of “Solution Landscape”, which is a pathway map consisting of all stationary points and their connections. We develop a series of generic saddle dynamics methods to construct the solution landscapes for different types of complex systems, which not only identifies all possible minima, but also advances our understanding of how a complex system navigates its landscape. We apply the solution landscape approach to target several challenging problems, including defect configurations of nematic liquid crystals, nucleation of quasicrystals, and excited states of rotational Bose–Einstein condensates, etc.
Beijing International Center for Mathematical Research, Peking University
Abstract: This talk will explore new paradigms for integrating data, models, algorithms, and theories in mathematical optimization. Firstly, we try to understand acceleration methods through ordinary differential equations (ODEs). Under convergence and stability conditions, we formulate a learning optimization problem that minimizes stopping time. This involves transforming the rapid convergence observed in continuous-time models into discrete-time iterative methods based on data. Next, we introduce a Monte Carlo strategy optimization algorithm for solving integer programming problems. This approach constructs probabilistic models to learn parameterized strategy distributions from data, enabling the sampling of integer solutions. Lastly, we discuss the vision of advancing automated theorem proving through formalization assisted by artificial intelligence.
University of Michigan
Abstract:
Recent empirical studies have shown that diffusion models possess a unique reproducibility property, transiting from memorization to generalization as the number of training samples increases. This demonstrates that diffusion models can effectively learn image distributions and generate new samples. Remarkably, these models achieve this even with a small number of training samples, despite the challenge of large image dimensions, effectively circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging two key empirical observations: (i) the low intrinsic dimensionality of image datasets and (ii) the low-rank property of the denoising autoencoder in trained diffusion models. With these setups, we rigorously demonstrate that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem across the training samples. This insight has practical implications for training and controlling diffusion models. Specifically, it enables us to precisely characterize the minimal number of samples necessary for accurately learning the low-rank data support, shedding light on the phase transition from memorization to generalization. Additionally, we empirically establish a correspondence between the subspaces and the semantic representations of image data, which enables one-step, transferrable, efficient image editing. Moreover, our results have profound practical implications for training efficiency and model safety, and they also open up numerous intriguing theoretical questions for future research.
Shenzhen Research Institute of Big Data
Abstract:
Solving large-scale linear programming (LP) problems is an important task in various areas such as communication networks, power systems, finance, and logistics. Recently, graph neural networks (GNNs) have emerged as powerful tools for tackling complex optimization problems, with theoretical results demonstrating their capacity to approximate LP solution mappings. However, these theoretical results typically require GNNs to have large parameter sizes. In our recent work, we propose a new type of GNN called PDHG-Net and propose a two-stage L2O method to solve large-scale LP problems. The new architecture PDHG-Net is designed by unrolling the recently emerged PDHG and combining with channel-expansion techniques. Experiments show that our approach with a small-sized PDHG-Net (with four layers only) can significantly accelerate LP solving, achieving up to a 3× speedup compared to PDHG for large-scale LP problems. We further provide a theoretical foundation for the effectiveness of small-size GNNs. We prove that polylogarithmic-depth, constant-width GNNs are sufficient to solve packing and covering LPs—two foundational LP classes. Our results open new avenues for leveraging compact GNNs in optimization tasks.
Shenzhen Research Institute of Big Data
Abstract:
This talk explores how machine learning techniques can enhance the efficiency of classical optimization algorithms. First, we address the computational bottleneck in interior-point methods (IPMs), specifically the solution of linear systems via matrix factorization. We propose leveraging Long Short-Term Memory (LSTM) neural networks to approximate these solutions, integrating the learned approximations into the IPM framework to reduce computational overhead. Next, we turn to the Max-k-Cut problem, a fundamental discrete optimization challenge. We present a learning-based relax-optimize-and-sample approach that efficiently identifies high-quality solutions for large-scale instances. The proposed method leverages machine learning to complement traditional optimization paradigms, achieving notable improvements in performance. Comprehensive computational experiments demonstrate the effectiveness and potential of these techniques.
Singapore University of Technology and Design
Abstract: Machine learning is playing an increasingly crucial role in the field of wireless communications due to its high efficiency in dealing with complex computations and models. Federated learning is a machine learning technique that allows multiple entities to train a model together while keeping their data decentralized. In this technique, each device shares its local model parameters instead of the entire dataset used to train it. In this talk, we provide a comprehensive coverage of federated learning and we specifically investigate federated learning from a practical perspective to investigate personalized services and heterogeneous deployment. Lastly, we will share about Singapore’s national Future Communications Research and Development Programme (FCP), which is hosted at Singapore University of Technology and Design.
Southeast University
Abstract:
This keynote will provide a comprehensive study on digital twin online channel modeling (DTOCM), a visual platform designed for online/real-time simulations of channel characteristics for 6G and beyond. It will start with a few 6G/B6G challenges such as wireless network optimization and propagation scenario classification/identification. Then, a potential solution to network optimization, i.e., DTOCM, will be explained in detail in terms of its principles, construction steps, and applications to typical 6G scenarios. Subsequently, the capabilities of the proposed DTOCM regarding real-time simulations of channel characteristics and visualization will be illustrated. Finally, some future 6G/B6G research directions will be discussed.
Beijing Institute of Technology
Abstract: Matrix monotonicity optimization theory aims to exploit the semi-positive definite monotonicity of matrix variables to achieve efficient solutions to complex matrix optimization problems in array signal processing. Matrix variable optimization plays an increasingly important role in communication array signal processing. During the evolution of communication systems, communication array architectures, functions, and performance metrics show diverse trends, which bring new challenges to the solution of matrix variable optimization problems. Specifically, from traditional MIMO architectures to new intelligent hypersurface-assisted architectures, digital-analog hybrid architectures, and other array structures, the corresponding matrix variables often have nonconvex constant-mode constraints. Meanwhile, the working mode of the communication system has evolved from the traditional single data transmission to the multi-functional integration of transmission, sensing, and concealment, and the corresponding matrix optimization problem also has more complex constraints and objective Han. Matrix monotonicity optimization can make use of monotonicity to reveal the optimal structure of matrix variables under multiple objective functions, multiple constraints, and even multiple communication qualities in signal processing, thus greatly reducing the dimensions of required optimization variables and realizing the efficient solution of complex problems.
Shenzhen Research Institute of Big Data
Abstract:
Federated fine-tuning aims to fine-tune a pre-trained model with private data from distributed clients by exchanging models rather than data under the orchestration of a parameter server (PS). However, as large models are acing in almost every machine learning task, the communication overhead and memory demand are surging accordingly, hindering the practical deployment on consumer devices. To overcome the bottleneck forged by the growing communication overhead of federated learning and lower the high memory demand of large model fine-tuning, we propose FeedSign, an Federated fine-tuning algorithm where a client uploads its update model and downloads the global model of any size using exactly 1 bit per step, while the memory demand is squeezed to the amount needed for inference. This is realized by utilizing zeroth-order (ZO) optimizers on large models and shared pseudo-random number generators (PRNG) across devices to split the gradient estimate from the clients to 1) a direction corresponding to a designated random seed and 2) a binary vote from the client indicating whether the seed-corresponding direction grants a local loss descent, which is the only information the clients should convey to the PS. We conduct theoretical analysis on FeedSign and show that it converges at an exponential rate O(e^{-t}), where t is the number of elapsed steps, the same rate as in first-order (FO) methods can attain in big O notation. Moreover, it is also found that FeedSign enjoys good robustness against data heterogeneity and Byzantine attacks. We conduct extensive experiments on models across different structures and sizes (11M to 13B) and found that the proposed method performs better or closely, depending on scenarios, compared to its ZO and FO counterparts albeit an orders-of-magnitude lower communication overhead. We also discuss some interesting advantages as byproducts guaranteed by the minimalistic design of FeedSign.
Fudan University
Abstract:
The Nyquist theorem constrains conventional signal sampling, which imposes significant computational and hardware costs for sensing, recognising, and decoding wideband signals with high sampling rates. We propose an end-to-end processing framework that combines deep learning with sub-Nyquist sampling to address this challenge. The framework enables spectrum sensing, signal recognition, and demodulation for wide multiband signal processing. By decomposing wideband signal recovery into two consecutive stages, spectrum sensing and non-blind recovery, we overcome the sampling bottleneck imposed by the twofold Landau rate and eliminate reliance on signal sparsity, as traditional methods assume. Consequently, we achieve real-time sensing, recognition, and demodulation of non-sparse signals, offering a novel theoretical perspective and an efficient, practical solution for wideband signal processing.
China Mobile Communications Group Co.,Ltd
Abstract:
The 6G targets implementing the social vision of digital twin and ubiquitous intelligence, supporting more diverse scenarios and applications. Compared to 5G networks focusing on communications only, 6G is required to natively support new capabilities such as sensing, computing, AI, big data, and security, while facilitating Everything as a Service (XaaS). However, directly adding new functions to the legacy 5G network architecture complicates the network and leads to low service efficiency and high operation and maintenance costs. Since deployment practice of 5G has shown that network automation and intelligence are promising in simplifying the network’s operation and maintenance, 6G will adopt/employ these verified technologies to achieve a high-level network autonomy. This speech proposes a technology framework for a 6G autonomous radio access network (RAN) that embraces the design of native cloud, native AI, and network digital twin.
UC Berkeley
Abstract:
In this talk, we will discuss algorithms for the minimization of two classes of nonconvex functions with downward cusps. The first class is the optimal value function over perturbed objective and constraints that could in particular appear as the recourse of two-stage stochastic programs. The second class is the composition of a convex function with a nonsmooth map that can be merely lower semicontinuous instead of continuously differentiable, which covers the composite value functions of nonlinear programs and the value-at-risk constraints. We have explored variational properties and designed numerical algorithms to solve these challenging problems. The newly derived results expand on the study of so-called amenable functions introduced by Poliquin and Rockafellar in 1992, which are compositions of convex functions with smooth maps, and the prox-linear methods for their minimization.
Academy of Mathematics and System Science, Chinese Academy of Sciences
Abstract:
In this talk, we address the activity detection problem in single-cell and multi-cell massive multiple-input multiple-output (MIMO) systems. In these systems, active devices transmit their signature sequences to one or more base stations (BSs), and the BSs then detect the active devices based on the received signals. We will present some recent results of the covariance-based approach for activity detection in massive MIMO systems, focusing on scaling law results and efficient coordinate descent (CD) algorithms.
Shenzhen Research Institute of Big Data
Abstract:
Low-rank canonical polyadic decomposition (CPD) under a class of non-Euclidean loss functions that frequently arise in statistical machine learning and signal processing. These loss functions are often used for certain types of tensor data, e.g., count and binary tensors, where the least squares loss is considered unnatural. However, pressing computational and theoretical challenges, such as scalability and convergence issues, still remain. This work offers a unified stochastic algorithmic framework for large-scale CPD decomposition under a variety of non-Euclidean loss functions. Our key contribution lies in a tensor fiber sampling strategy-based flexible stochastic mirror descent framework. Leveraging the sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm ensures global convergence to a stationary point under reasonable conditions. Numerical results show that our framework attains promising non-Euclidean CPD performance. The proposed framework also exhibits substantial computational savings compared to state-of-the-art methods.