Hongyu Liu

Taken at a real-life location from SteinsGate (my favorite anime).

I am a third-year (final) Ph.D. student in the Department of Computer Science and Engineering at HKUST, where I am fortunate to be advised by Prof. Qifeng Chen.

Prior to joining HKUST, I spent two incredible years at Huya as a computer vision engineer, working under the guidance of Dr. Xintong Han and Dr. Jia Xu. I also had the privilege of interning at Tencent AI Lab, where I was mentored by Dr. Yibing Song and Dr. Wei Liu. During this time, I collaborated frequently with my close friend Dr. Ziyu Wan. Before that, I also interned at Huawei and SenseTime, where I gained valuable experience in computer vision and deep learning. My primary research interests lie in 2D/3D generation, digital humans, neural rendering, and contrastive learning. Currently, I am interning at Ant Research, where I am supervised by Dr. Xuan Wang, focusing on 3D generation.

Outside of academia, I am passionate about fishing—it’s more than a hobby; it’s a dream of mine to fish across all of China!

🔥 I am actively seeking global opportunities for academic exchange visits starting in Fall 2025. If you are interested or aware of any opportunities, please feel free to reach out to me via email—I would be delighted to connect!

News

Feb 26, 2025	AvatarArtist is accepted to CVPR 2025.
Jan 10, 2025	I will be doing a summer internship at NVIDIA, where I will be mentored by Dr. Shalini De Mello and Dr. Koki Nagano.
Jul 01, 2024	Follow-Your-Emoji paper is accepted to SIGGRAPH Asia 2024.
May 01, 2024	We released a survey paper:LLMs Meet Multimodal Generation and Editing: A Survey.
Mar 01, 2024	HeadArtist is accepted to SIGGRAPH 2024.

Recent 3 Publications

AvatarArtist: Open-Domain 4D Avatarization

Hongyu Liu, Xuan Wang, Ziyu Wan, Yue Ma, Jingye Chen, and 4 more authors

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Abs Bib HTML PDF Code

This work focuses on open-domain 4D avatarization, with the purpose of creating a 4D avatar from a portrait image in an arbitrary style. We select parametric triplanes as the intermediate 4D representation and propose a practical training paradigm that takes advantage of both generative adversarial networks (GANs) and diffusion models. Our design stems from the observation that 4D GANs excel at bridging images and triplanes without supervision yet usually face challenges in handling diverse data distributions. A robust 2D diffusion prior emerges as the solution, assisting the GAN in transferring its expertise across various domains. The synergy between these experts permits the construction of a multi-domain image-triplane dataset, which drives the development of a general 4D avatar creator. Extensive experiments suggest that our model, AvatarArtist, is capable of producing high-quality 4D avatars with strong robustness to various source image domains. The code, the data, and the models will be made publicly available to facilitate future studies..
@article{liu2025avatarartist, title = {AvatarArtist: Open-Domain 4D Avatarization}, author = {Liu, Hongyu and Wang, Xuan and Wan, Ziyu and Ma, Yue and Chen, Jingye and Fan, Yanbo and Shen, Yujun and Song, Yibing and Chen, Qifeng}, journal = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2025}, }
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

Yue Ma^*, Hongyu Liu^*, Hongfa Wang^*, Heng Pan^*, Yingqing He, and 6 more authors

ACM Special Interest Group for Computer Graphics and Interactive Techniques Asia (SIGGRAPH Asia), 2024

Abs Bib HTML PDF Code

We present Follow-Your-Emoji, a diffusion-based framework for portrait animation that animates a reference portrait with target landmark sequences, addressing the challenges of preserving the identity of the reference portrait, transferring target expressions, and maintaining temporal consistency and fidelity by equipping the Stable Diffusion model with two key technologies: an explicit motion signal, namely expression-aware landmarks, to ensure accurate motion alignment, portray exaggerated expressions, and avoid identity leakage, and a facial fine-grained loss to enhance subtle expression perception and reference portrait reconstruction using expression and facial masks, achieving significant performance across diverse freestyle portraits (e.g., real humans, cartoons, sculptures, animals) with a progressive generation strategy for stable long-term animation, while introducing EmojiBench, a comprehensive benchmark of diverse portraits, driving videos, and landmarks, to verify its superiority through extensive evaluations.
@article{ma2024follow, title = {Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation}, author = {Ma, Yue and Liu, Hongyu and Wang, Hongfa and Pan, Heng and He, Yingqing and Yuan, Junkun and Zeng, Ailing and Cai, Chengfei and Shum, Heung-Yeung and Liu, Wei and Chen, Qifeng}, journal = {ACM Special Interest Group for Computer Graphics and Interactive Techniques Asia (SIGGRAPH Asia)}, year = {2024}, }
HeadArtist: Text-conditioned 3d head generation with self score distillation

Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, and 2 more authors

In ACM Special Interest Group for Computer Graphics and Interactive Techniques (SIGGRAPH), 2024

Abs Bib HTML PDF Code

This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photo-realistic appearance, significantly outperforming state-of-the-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.
@inproceedings{liu2024headartist, title = {HeadArtist: Text-conditioned 3d head generation with self score distillation}, author = {Liu, Hongyu and Wang, Xuan and Wan, Ziyu and Shen, Yujun and Song, Yibing and Liao, Jing and Chen, Qifeng}, booktitle = {ACM Special Interest Group for Computer Graphics and Interactive Techniques (SIGGRAPH)}, pages = {1--12}, year = {2024}, }