Date: 16 May 2016
Time: 12:00 - 13:00
Location: Wolfson Lecture Theatre, Computing, Queen Mother Building
Host: Dr Jianguo Zhang
Title: A Deep Learning Framework for Human Action Parsing
Abstract: Automatically recognizing objects, scenes and actions is a core component of an artificial intelligence system. In this talk, I will cover two of my main research areas – human action recognition and deep learning. Action recognition has been an active research topic in computer vision due to its various applications in human-machine interaction, robotics, video surveillance and visual big data search. I will first review some related work on handcrafted features, feature/deep learning and attributes learning. Then I will introduce our recent multi-task system that can jointly solve three main problems: 1) Where in the video do the actions occur? (2) What categories do the actions belong to? and (3) How are these actions performed? This multi-task learning framework is designed based on a state-of-the-art 3D deep convolutional neural network (3D-CNN). Specifically, in the training phase, action localization, classification and attributes learning can be jointly optimized via the proposed deep architecture. Once model training is completed, given an upcoming test video, we can describe each individual action in the video simultaneously as: where the action occurs, what the action is and how the action is performed. To train the deep network, we also introduce a new large-scale aligned action dataset, NASA, with 200K well labelled video clips. Finally, I will present the results of detailed action parsing on challenging, realistic datasets that are collected by us or publicly available. Some initial results on zero-shot learning via the obtained action attributes will be discussed too.
Bio: Ling Shao is Professor of Computer Vision and Machine Intelligence and Head of the Computer Vision and Artificial Intelligence Group with the Department of Computer Science and Digital Technologies at Northumbria University, Newcastle upon Tyne and an Advanced Visiting Fellow with the Department of Electronic and Electrical Engineering at the University of Sheffield. He received the B.Eng. degree in Electronic and Information Engineering from the University of Science and Technology of China (USTC), the M.Sc. degree in Medical Image Analysis and the Ph.D. (D.Phil.) degree in Computer Vision at the Robotics Research Group from the University of Oxford. Previously, he was a Senior Lecturer (2009-2014) with the Department of Electronic and Electrical Engineering at the University of Sheffield and a Senior Scientist (2005-2009) with Philips Research, The Netherlands. His research interests include Computer Vision, Image/Video Processing, Pattern Recognition and Machine Learning. He has authored/co-authored over 200 papers in refereed journals/conferences such as IEEE TPAMI, TIP, TNNLS, IJCV, ICCV, CVPR, IJCAI and ACM MM, and holds over 10 EU/US patents. Ling Shao is an Associate Editor of IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Image Processing, IEEE Transactions on Circuits and Systems for Video Technology, and several other journals. He has edited three books and several special issues for journals such as TNNLS and PR. He has organized a number of international workshops with top conferences including ICCV, ECCV and ACM Multimedia. He is/was an Area Chair for BMVC'14/15/16, ICPR’16, WACV'14 and ICME'15 and has been serving as a Program Committee member for many international conferences, including ICCV, CVPR, ECCV, BMVC, and ACM MM, and as a reviewer for many leading journals. He is a Fellow of the British Computer Society, a Fellow of the IET, a Senior Member of the IEEE, and a Life Member of the ACM.