-
Biography
Education
Jul. 2001 – Jun. 2005, Ph.D. in Computer Science and Technology, Tsinghua University
Jul. 1999 – Jul. 2001, Master in Computer Science and Technology, Tsinghua University
Jul. 1995 – Jul. 1999, Bachelor in Computer Science and Technology, Tsinghua University
Professional Experience
Dec. 2024 – present, Full Professor, Tsinghua Shenzhen International Graduate School
May. 2008 – present, Honorary Research Associate, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
Mar. 2019 – Dec. 2024, Research Associate, Shenzhen International Graduate School, Tsinghua University
Dec. 2008 – Mar. 2019, Research Associate, Graduate School at Shenzhen, Tsinghua University
Sep. 2007 – Dec. 2008, Lecturer, Graduate School at Shenzhen, Tsinghua University
May. 2005 – Sep. 2007, Postdoctoral Fellow, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong
Additional Positions
Committee Member, Speech Dialogue and Auditory Processing Technical Committee, China Computer Federation (CCF TFSDAP);
Member, China Computer Federation (CCF);
Member, Institute of Electrical and Electronics Engineers (IEEE);
Member, International Speech Communication Association (ISCA);
Reviewer, IEEE/ACM Transactions on Speech and Audio Processing, Speech Communication, Multimedia Tools and Applications;
Reviewer, INTERSPEECH, ICASSP, ISCSLP, NCMMSC, ACL, IJCNLP, NeurIPS, AAAI, IJCAI;
Opening
Personal Webpage
Download CV
-
Current Courses
• Digital Processing of Speech Signal
• Big Data Analytics (B)
Master’s & Ph.D. Advising
-
Research Interests
1. Speech signal processing
2. Audio-visual speech processing
3. Expressive text-to-audio-visual speech synthesis
4. Natural language understanding and generation
5. Multimedia applications
6. Affective computing
7. Machine learning
Projects
1. National Natural Science Foundation of China (62076144): Paralinguistic Speech Attributes Disentangled Representation Learning and Controllable Speech Synthesis for Intelligent Speech Interaction
2. National Natural Science Foundation of China – Research Grants Council (Hong Kong) Joint Research Scheme (61531166002, N_CUHK404/15): Interactive Attribute Mining and Animation Speech Synthesis for Web-based Spoken Dialog Interactions
3. National Natural Science Foundation of China Key Project (61433018):Psychological Mechanism and Computational Modeling for Internet Discourse Understanding
4. National Natural Science Foundation of China (61375027): Perception and Generation of Deep Information for Natural Spoken Dialog Interaction
Research Output
-
Selected Publications
1. Shun Lei*, Yixuan Zhou*, Boshi Tang*, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu#, Helen Meng, SongCreator: Lyrics-based Universal Song Generation, [in] Proc. Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 1-34. Vancouver, Canada. December 10-15, 2024.
2. Yixuan Zhou*, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou*, Shun Lei*, Songtao Zhou, Zhiyong Wu#, Jia Jia#, VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 554-563. Melbourne, Australia, October 28-November 1, 2024.
3. Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li*, Shuoyi Zhou*, Songtao Zhou, Xiaoyu Qin#, Zhiyong Wu#, SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 1255-1264. Melbourne, Australia, October 28-November 1, 2024.
4. Xu He*, Qiaochu Huang*, Zhensong Zhang, Zhiwei Lin*, Zhiyong Wu#, Sicheng Yang*, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu, Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model, [in] Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 2263-2273. Seattle, USA, June 16-22, 2024.
5. Yaoxun Xu*, Hangting Chen, Jianwei Yu#, Qiaochu Huang*, Zhiyong Wu#, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu, SECap: Speech Emotion Captioning with Large Language Model, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 19323-19331. Vancouver, Canada, February 20-27, 2024.
6. Zilin Wang*, Haolin Zhuang*, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen*, Yu Yang, Boshi Tang*, Zhiyong Wu#, Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 301-309. Vancouver, Canada, February 20-27, 2024.
7. Boshi Tang*, Zhiyong Wu, Xixin Wu#, Qiaochu Huang*, Jun Chen*, Shun Lei*, Helen Meng, SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 15267-15275. Vancouver, Canada, February 20-27, 2024.
8. Jingbei Li*, Sipan Li*, Ping Chen*, Luwen Zhang*, Yi Meng*, Zhiyong Wu#, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang, Joint Multiscale Cross-Lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 32, pp. 517-528. IEEE, November 10, 2023.
9. Xixin Wu, Hui Lu, Kun Li*, Zhiyong Wu#, Xunying Liu, Helen Meng, Hiformer: Sequence Modeling Networks with Hierarchical Attention Mechanisms, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 31, pp. 3993-4003. IEEE, September 8, 2023.
10. Shun Lei*, Yixuan Zhou*, Liyang Chen*, Zhiyong Wu#, Xixin Wu, Shiyin Kang, Helen Meng, MSStyleTTS: Multi-scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 31, pp. 3290-3303. IEEE, August 2, 2023.
11. Hui Lu*, Xixin Wu#, Zhiyong Wu, Helen Meng, SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 2829-2837. Ottawa, Canada, October 29 - November 3, 2023.
12. Sicheng Yang*, Zilin Wang*, Zhiyong Wu#, Minglei Li#, Zhensong Zhang, Qiaochu Huang*, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai, UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 1033-1044. Ottawa, Canada, October 29 - November 3, 2023.
13. Sicheng Yang*, Zhiyong Wu#, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao*, Ming Cheng*, Long Xiao*, DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models, [in] Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 5860-5868. Macao, China, August 19-25, 2023.
14. Sicheng Yang*, Zhiyong Wu#, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao*, Haolin Zhuang*, QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation, [in] Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 2321-2330. Vancouver, Canada, June 18-22, 2023.
15. Zhihan Yang*, Zhiyong Wu#, Ying Shan, Jia Jia#, What Does Your Face Sound Like? 3D Face Shape Towards Voice, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 13905-13913. Washington DC, USA, February 7-14, 2023.
16. Haibin Wu, Xu Li, Andy T Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee#, Improving the Adversarial Robustness for Speaker Verification by Self-supervised Learning, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 30, pp. 202-217. IEEE, January 8, 2022.
17. Jingbei Li*, Yi Meng*, Xixin Wu#, Zhiyong Wu#, Jia Jia, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang, Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 5811-5820. Lisboa, Portugal, October 10-14, 2022.
18. Xixin Wu, Yuewen Cao, Hui Lu*, Songxiang Liu, Disong Wang, Zhiyong Wu#, Xunying Liu, Helen Meng, Speech Emotion Recognition Using Sequential Capsule Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 3280-3291. IEEE, October 15, 2021.
19. Xixin Wu, Yuewen Cao, Hui Lu*, Songxiang Liu, Shiyin Kang, Zhiyong Wu#, Xunying Liu, Helen Meng, Exemplar-Based Emotive Speech Synthesis, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 874-886. IEEE, January 18, 2021.
20. Suping Zhou, Jia Jia#, Zhiyong Wu, Zhihan Yang*, Yanfeng Wang, Wei Chen, Fanbo Meng, Shuo Huang, Jialie Shen, Xiaochuan Wang, Inferring Emotion from Large-Scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach, [in] Proc. the 35th AAAI Conference on Artificial Intelligence (AAAI), pp. 6039-6047. Virtual, Online, February 2-9, 2021.
21. Yingmei Guo*, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang#, Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding, [in] Proc. 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3226-3237. Punta Cana, Dominican Republic, November 7-11, 2021.
22. Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia#, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, Kun Li, Zhiyong Wu, Yuanchun Shi, Xiaobo Lu, Ziwei Liu, PTeacher: A Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback, [in] Proc. 2021 CHI Conference on Human Factors in Computing Systems (CHI), pp. 1-14. Yokohama, Japan, May 8-13, 2021.
Books
Patents
Others
-
Awards and Honors
1. (2024) First Prize in the Higher Education Category of the 5th Shenzhen Education and Teaching Achievement Award for IMDT Cross-Innovation to Create a New Model for Cultivating 'New Engineering' Digital and Intelligent Talent
2. (2023) Shenzhen Municipal Outstanding Scientific Research Output Award in Technological Advancements for “Research and Development of Key Technologies for Intelligent Speech Perception and Interaction, and Their Industrialization”
3. (2021) Beijing Municipal Outstanding Scientific Research Output Award in Technological Advancements for “Key Technologies for Personalized and Emotional Human Computer Speech Interaction and Their Industrialization”
4. (2016) Ministry of Education (MoE) Higher Education Outstanding Scientific Research Output Award in Technological Advancements for “Chinese Speech Perception and Interaction Modeling and Applications”
5. (2009) Ministry of Education (MoE) Higher Education Outstanding Scientific Research Output Award in Technological Advancements for “Research and Applications of Multimodal Multilingual Speech and Language Interaction”
6. (2023) Annual Teaching Excellence Award of Tsinghua University
7. (2020) Annual Teaching Excellence Award of Tsinghua University
8. (2022) “Good teacher, Beneficial friend” Award of Tsinghua University