August 4th: Xiaodong He, Microsoft Research Asia, Beijing

August 5th: Deliang Wang, Ohio State Univ., USA

August 6th: Guangbin Huang, NTU, Singapore

Course Schedule

Session 1

8:30 to 10:00

Session 2

10:15 to 11:45

Session 3

13:30 to 15:00

Session 4

15:15 to 16:45

Date: August 4th
Instructor: Xiaodong He
Title: Deep Learning and Natural Language Understanding
This tutorial provides an extensive overview on recently developed techniques in deep learning and continuous-space representations for natural language problems. Approaches to important real-world natural language applications, including machine translation, semantic representation modeling, question answering and semantic parsing will be the main focus.
Deep learning techniques have demonstrated tremendous success in the speech and language processing community in recent years, establishing new state-of-the-art performance in speech recognition, language modeling, and have shown great potential for many other natural language processing tasks. The focus of this tutorial is to provide an extensive overview on recent deep learning approaches to problems in language or text processing, with particular emphasis on important real-world applications including language understanding, semantic representation modeling, question answering and semantic parsing, etc.
In this tutorial, we will first survey the latest deep learning technology, presenting both theoretical and practical perspectives that are most relevant to our topic. We plan to cover common methods of deep neural networks and more advanced methods of recurrent, recursive, stacking and convolutional networks. In addition, we will introduce recently proposed continuous-space representations for both semantic word embedding and knowledge base embedding, which are modeled by either matrix/tensor decomposition or neural networks. Next, we will review general problems and tasks in text/language processing, and underline the distinct properties that differentiate language processing from other tasks such as speech and image object recognition. More importantly, we highlight the general issues of natural language processing, and elaborate on how new deep learning technologies are proposed and fundamentally address these issues. We then place particular emphasis on several important applications, including (1) machine translation and response generation, (2) semantic information retrieval and (3) semantic parsing and question answering, (4) reinforcement learning for NLP, (5) vision and language multimodal learning. For each task, we will discuss what particular architectures of deep learning models are suitable given the nature of the task, and how learning can be performed efficiently and effectively using end-to-end optimization strategies.

Part I. Background of neural network learning architectures

• Background: A review of deep learning theory and applications in relevant fields
• Advanced architectures for modeling language structure
• Common problems and concepts in language processing:
• Why deep learning is needed
• Concept of embedding
• Classification/prediction vs. representation/similarity
Learning techniques: regularization, optimization, GPU, etc.

Part II. Machine translation

• Overview of Machine Translation
• Deep learning translation models for SMT
• Recurrent neural network for language model for SMT
• Sequence to sequence neural machine translation
• Response generation as (neural) machine translation

Part III. Learning semantic embedding

• Semantic embedding: from words to sentences
• The Deep Structured Semantic Model/Deep Semantic Similarity Model (DSSM)
• DSSM in practice: Information Retrieval, Recommendation

Part IV. Natural language understanding

• Continuous Word Representations & Lexical Semantics
• Semantic Parsing & Question Answering
• Knowledge Base Embedding

Part V. Reinforcement learning in Language Understanding
• Reinforcement learning with action space defined by natural language
• Reinforcement learning with combinatorial action space

Part VI. Vision & Language Multimodal Understanding
• Image captioning
• Visual Question Answering

Part VII. Conclusion

Xiaodong He is a Senior Researcher in the Deep Learning Technology Center of Microsoft Research, Redmond, WA, USA. He is also an Affiliate Professor in the Department of Electrical Engineering at the University of Washington (Seattle) serving in doctoral supervisory committees. His research interests are mainly in Artificial Intelligence areas including deep learning, natural language, vision, speech, information retrieval, and knowledge representation. He has published in ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, NIPS, ICLR, ICASSP, Proc. IEEE, IEEE TASLP, IEEE SPM. He has received several awards including the Outstanding Paper Award of ACL 2015. He and colleagues won No. 1 place in the 2008 NIST Machine Translation Evaluation and the 2011 IWSLT Evaluation (Chinese-to-English), respectively. More recently, he and colleagues developed the MSR image captioning system that won the 1st Prize at the MS COCO Captioning Challenge 2015 (tied with Google). His work on image captioning was reported in Communications of the ACM. He is leading the image captioning effort now is part of Microsoft Cognitive Services and the development of CaptionBot, which enables Seeing AI and many other applications.
He has held editorial positions on several IEEE Journals including T-ASLP, J-STSP, SPL, SPM, served as an area chair for NAACL-HLT 2015, and served in the organizing committee/program committee of major speech and language processing conferences. He is an elected member of the IEEE SLTC for the term of 2015-2017. He is a senior member of IEEE and a member of ACL.
He received the BS degree from Tsinghua University (Beijing) in 1996, MS degree from Chinese Academy of Sciences (Beijing) in 1999, and the PhD degree from the University of Missouri - Columbia in 2003.

Date: August 5th
Instructor: Deliang Wang
Title: Supervised Speech Separation

Abstract :
The acoustic environment typically contains multiple simultaneous sound sources, and the target speech usually occurs with other interfering sounds. This creates a problem of speech separation, popularly known as the cocktail party problem. Speech separation has a wide range of important applications, including robust automatic speech and speaker recognition, hearing prosthesis, and audio information retrieval (or audio data mining). As a result, a large number of studies in speech and audio processing have been devoted to speech separation, which becomes even more important in recent years with the widespread adoption of mobile communication devices such as smart phones.
Traditional approaches to speech separation include speech enhancement based on analyzing signal statistics, beamforming or spatial filtering, and computational auditory scene analysis. An emerging trend in speech separation is the introduction of supervised learning. So-called supervised speech separation is a data-driven approach that trains a learning machine to perform speech separation. In particular, deep neural networks (DNNs) have been increasingly used for supervised speech separation in recent years. Among the major successes of supervised speech separation is the demonstration of substantial speech intelligibility improvements by hearing impaired listeners in some noisy environments, an accomplishment that has eluded the signal processing field for decades.
This tutorial is designed to introduce the latest developments in supervised speech segregation, with emphasis on DNN based separation methods. The tutorial will systematically introduce the fundamentals of supervised speech separation, including learning machines, features, and training targets. The tutorial will cover both the separation of speech from nonspeech noises and from competing talkers. We will also treat and compare supervised masking and supervised mapping based techniques for speech separation.
The proposed tutorial intends to provide the participants a solid understanding of supervised speech separation with the following foci. First, explain how to formulate the speech separation problem in the supervised learning framework. Second, describe the foundations behind representative algorithms, in conjunction with real-world applications. Third, discuss both monaural (one-microphone) and binaural (two-microphone) speech separation and how to combine them.

DeLiang Wang received the B.S. degree in 1983 and the M.S. degree in 1986 from Peking (Beijing) University, Beijing, China, and the Ph.D. degree in 1991 from the University of Southern California, Los Angeles, CA, all in computer science. From July 1986 to December 1987 he was with the Institute of Computing Technology, Academia Sinica, Beijing. Since 1991, he has been with the Department of Computer Science & Engineering and the Center for Cognitive and Brain Sciences at Ohio State University, Columbus, OH, where he is currently a Professor. He also holds a visiting appointment at the Center of Intelligent Acoustics and Immersive Communications, Northwestern Polytechnical University, Xi’an, China. He has been a visiting scholar to Harvard University, Oticon A/S (Denmark), and Starkey Hearing Technologies. Wang's research interests include machine perception and neurodynamics. Among his recognitions are the Office of Naval Research Young Investigator Award in 1996, the 2005 Outstanding Paper Award from IEEE Transactions on Neural Networks, and the 2008 Helmholtz Award from the International Neural Network Society. In 2014, he was named a University Distinguished Scholar by Ohio State University. He serves as Co-Editor-in-Chief of Neural Networks, and on the editorial boards of several journals including IEEE/ACM Transactions on Audio, Speech, and Language Processing. He is an IEEE Fellow.

Date: August 6th
Extreme Learning Machines (ELMs): Enabling Pervasive Learning and Pervasive Intelligence
Instructor: Guangbin Huang
Neural networks (NN) and support vector machines (SVM) play key roles in machine learning and data analysis in the past 3 decades. However, it is known that these popular learning techniques face some challenging issues such as: intensive human intervene, slow learning speed, poor learning scalability. The objective of this lecture is two-folds: 1) it will introduce a new generation of learning theory (the resultant biologically inspired learning technique referred to as Extreme Learning Machine (ELM)); 2) it will show the potential trend of combining ELM and deep learning (DL), which not only expedites the learning speed (up to thousands times faster) and reduces the learning complexity but also improves the learning accuracy in benchmark applications such as OCR, traffic sign recognition, hand gesture recognition, object tracking, 3D Graphics, etc. ELM theories can indeed give some theoretical support to local receptive fields and pooling strategies which are popularly used in deep learning. ELM theories may have explained the reasons why the brain are globally ordered but may be locally random. This lecture wishes to share with audiences the trends of machine learning: 1) turning point from machine learning engineering to machine learning science; 2) convergence of machine learning and biological learning; 3) from human and (living) thing intelligence to machine intelligence; 4) from Internet of Things (IoT) to Internet of Intelligent Things and Society of Intelligent Things.

Part I - ELM Philosophy and Basic ELMs:

1) Neural networks and machine learning history
2) Rethink machine learning and artificial intelligence
3) Philosophy and belief of Extreme Learning Machines (ELM)
• Do we really need so many different type of learning algorithms for so many type of networks (various types of SLFNs, regular and irregular multi-layers of networks, various type of neurons)?
• Can the gap between machine learning and biological learning be filled?
• Should learning be transparent or of blackbox?
• SVM provides suboptimal solutions.
4) Machine learning and Internet of Things
5) Pervasive learning and pervasive intelligence
6) Machine intelligence and human intelligence

Part II – Hierarchical ELM

1) Unsupervised/semi supervised ELM
2) Feature learning
3) Hierarchical ELM
4) ELM and Deep Learning
5) ELM + (other algorithms)

Part III – ELM Theories and Open Problems

1) ELM theories:
• Universal approximation capability
• Classification capability
2) Incremental learning
3) Online sequential learning
4) Open problems


Prof. Guang-bin Huang is a professor in the School of Electrical & Electronic Engineering College of Engineering at the Nanyang Technological University, Singapore. He serves as an Associate Editor of Neurocomputing, Cognitive Computation, neural networks, and IEEE Transactions on Cybernetics. He was awarded “Highly Cited Researcher” by Thomson Reuters and listed in “2015 The World's Most Influential Scientific Minds” and “2014 The World's Most Influential Scientific Minds”. He is nominee and shortlisted of 2016 President's Science Award. His current research interests include big data analytics, human computer interface, brain computer interface, image processing/understanding, machine learning theories and algorithms, extreme learning machine, and pattern recognition.

He is Principal Investigator of BMW-NTU Joint Future Mobility Lab on Human Machine Interface and Assisted Driving, Principal Investigator (data and video analytics) of Delta – NTU Joint Lab, Principal Investigator (Scene Understanding) of ST Engineering – NTU Corporate Lab, and Principal Investigator (Marine Data Analysis and Prediction) of Rolls Royce – NTU Corporate Lab. He has led/implemented several key industrial projects (e.g., Chief architect/designer and technical leader of Singapore Changi Airport Cargo Terminal 5 Inventory Control System (T5 ICS) Upgrading Project, etc.).

One of his main works is to propose a new machine learning theory and learning techniques called Extreme Learning Machines (ELM), which fills the gap between traditional feedforward neural networks, support vector machines, clustering and feature learning techniques. ELM theories have recently been confirmed with biological learning evidence directly, and filled the gap between machine learning and biological learning. ELM theories have also addressed “Father of Computers” J. von Neumann’s concern on why “an imperfect neural network, containing many random connections, can be made to perform reliably those functions which might be represented by idealized wiring diagrams.”