清华大学深圳国际研究生院

Selected Publications

1. Shun Lei*, Yixuan Zhou*, Boshi Tang*, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu#, Helen Meng, SongCreator: Lyrics-based Universal Song Generation, [in] Proc. Annual Conference on Neural Information Processing Systems (NeurIPS), pp. 1-34. Vancouver, Canada. December 10-15, 2024.

2. Yixuan Zhou*, Xiaoyu Qin, Zeyu Jin, Shuoyi Zhou*, Shun Lei*, Songtao Zhou, Zhiyong Wu#, Jia Jia#, VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 554-563. Melbourne, Australia, October 28-November 1, 2024.

3. Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li*, Shuoyi Zhou*, Songtao Zhou, Xiaoyu Qin#, Zhiyong Wu#, SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 1255-1264. Melbourne, Australia, October 28-November 1, 2024.

4. Xu He*, Qiaochu Huang*, Zhensong Zhang, Zhiwei Lin*, Zhiyong Wu#, Sicheng Yang*, Minglei Li, Zhiyi Chen, Songcen Xu, Xiaofei Wu, Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model, [in] Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 2263-2273. Seattle, USA, June 16-22, 2024.

5. Yaoxun Xu*, Hangting Chen, Jianwei Yu#, Qiaochu Huang*, Zhiyong Wu#, Shixiong Zhang, Guangzhi Li, Yi Luo, Rongzhi Gu, SECap: Speech Emotion Captioning with Large Language Model, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 19323-19331. Vancouver, Canada, February 20-27, 2024.

6. Zilin Wang*, Haolin Zhuang*, Lu Li, Yinmin Zhang, Junjie Zhong, Jun Chen*, Yu Yang, Boshi Tang*, Zhiyong Wu#, Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 301-309. Vancouver, Canada, February 20-27, 2024.

7. Boshi Tang*, Zhiyong Wu, Xixin Wu#, Qiaochu Huang*, Jun Chen*, Shun Lei*, Helen Meng, SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 15267-15275. Vancouver, Canada, February 20-27, 2024.

8. Jingbei Li*, Sipan Li*, Ping Chen*, Luwen Zhang*, Yi Meng*, Zhiyong Wu#, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang, Joint Multiscale Cross-Lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 32, pp. 517-528. IEEE, November 10, 2023.

9. Xixin Wu, Hui Lu, Kun Li*, Zhiyong Wu#, Xunying Liu, Helen Meng, Hiformer: Sequence Modeling Networks with Hierarchical Attention Mechanisms, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 31, pp. 3993-4003. IEEE, September 8, 2023.

10. Shun Lei*, Yixuan Zhou*, Liyang Chen*, Zhiyong Wu#, Xixin Wu, Shiyin Kang, Helen Meng, MSStyleTTS: Multi-scale Style Modeling with Hierarchical Context Information for Expressive Speech Synthesis, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 31, pp. 3290-3303. IEEE, August 2, 2023.

11. Hui Lu*, Xixin Wu#, Zhiyong Wu, Helen Meng, SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 2829-2837. Ottawa, Canada, October 29 - November 3, 2023.

12. Sicheng Yang*, Zilin Wang*, Zhiyong Wu#, Minglei Li#, Zhensong Zhang, Qiaochu Huang*, Lei Hao, Songcen Xu, Xiaofei Wu, Changpeng Yang, Zonghong Dai, UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 1033-1044. Ottawa, Canada, October 29 - November 3, 2023.

13. Sicheng Yang*, Zhiyong Wu#, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao*, Ming Cheng*, Long Xiao*, DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models, [in] Proc. International Joint Conference on Artificial Intelligence (IJCAI), pp. 5860-5868. Macao, China, August 19-25, 2023.

14. Sicheng Yang*, Zhiyong Wu#, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao*, Haolin Zhuang*, QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation, [in] Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Conference (CVPR), pp. 2321-2330. Vancouver, Canada, June 18-22, 2023.

15. Zhihan Yang*, Zhiyong Wu#, Ying Shan, Jia Jia#, What Does Your Face Sound Like? 3D Face Shape Towards Voice, [in] Proc. AAAI Conference on Artificial Intelligence (AAAI), pp. 13905-13913. Washington DC, USA, February 7-14, 2023.

16. Haibin Wu, Xu Li, Andy T Liu, Zhiyong Wu, Helen Meng, Hung-Yi Lee#, Improving the Adversarial Robustness for Speaker Verification by Self-supervised Learning, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 30, pp. 202-217. IEEE, January 8, 2022.

17. Jingbei Li*, Yi Meng*, Xixin Wu#, Zhiyong Wu#, Jia Jia, Helen Meng, Qiao Tian, Yuping Wang, Yuxuan Wang, Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks, [in] Proc. ACM International Conference on Multimedia (ACM MM), pp. 5811-5820. Lisboa, Portugal, October 10-14, 2022.

18. Xixin Wu, Yuewen Cao, Hui Lu*, Songxiang Liu, Disong Wang, Zhiyong Wu#, Xunying Liu, Helen Meng, Speech Emotion Recognition Using Sequential Capsule Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 3280-3291. IEEE, October 15, 2021.

19. Xixin Wu, Yuewen Cao, Hui Lu*, Songxiang Liu, Shiyin Kang, Zhiyong Wu#, Xunying Liu, Helen Meng, Exemplar-Based Emotive Speech Synthesis, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), vol. 29, pp. 874-886. IEEE, January 18, 2021.

20. Suping Zhou, Jia Jia#, Zhiyong Wu, Zhihan Yang*, Yanfeng Wang, Wei Chen, Fanbo Meng, Shuo Huang, Jialie Shen, Xiaochuan Wang, Inferring Emotion from Large-Scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach, [in] Proc. the 35th AAAI Conference on Artificial Intelligence (AAAI), pp. 6039-6047. Virtual, Online, February 2-9, 2021.

21. Yingmei Guo*, Linjun Shou, Jian Pei, Ming Gong, Mingxing Xu, Zhiyong Wu, Daxin Jiang#, Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding, [in] Proc. 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3226-3237. Punta Cana, Dominican Republic, November 7-11, 2021.

22. Yaohua Bu, Tianyi Ma, Weijun Li, Hang Zhou, Jia Jia#, Shengqi Chen, Kaiyuan Xu, Dachuan Shi, Haozhe Wu, Zhihan Yang, Kun Li, Zhiyong Wu, Yuanchun Shi, Xiaobo Lu, Ziwei Liu, PTeacher: A Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback, [in] Proc. 2021 CHI Conference on Human Factors in Computing Systems (CHI), pp. 1-14. Yokohama, Japan, May 8-13, 2021.

导航

导航

Wu Zhiyong

Biography

Education

Professional Experience

Additional Positions

Opening

Personal Webpage

Download CV

Current Courses

Master’s & Ph.D. Advising

Research Interests

Projects

Research Output

Selected Publications

Books

Patents

Others

Awards and Honors