HeadArtist: Text-conditioned 3D Head Generation with
Self Score Distillation

Paper Project Code Generation Gallery Editing Gallery


Abstract

This work presents HeadArtist for 3D head generation from text descriptions. With a landmark-guided ControlNet serving as the generative prior, we come up with an efficient pipeline that optimizes a parameterized 3D head model under the supervision of the prior distillation itself. We call such a process self score distillation (SSD). In detail, given a sampled camera pose, we first render an image and its corresponding landmarks from the head model, and add some particular level of noise onto the image. The noisy image, landmarks, and text condition are then fed into the frozen ControlNet twice for noise prediction. Two different classifier-free guidance (CFG) weights are applied during these two predictions, and the prediction difference offers a direction on how the rendered image can better match the text of interest. Experimental results suggest that our approach delivers high-quality 3D head sculptures with adequate geometry and photo-realistic appearance, significantly outperforming state-of-the-art methods. We also show that the same pipeline well supports editing the generated heads, including both geometry deformation and appearance change.

architecture

Example Generation Results

HeadArtist generates reasonable geometry and high fidelity texture.

† denotes "a head of". ‡ denotes "a DSLR portrait of". We adopt negative prompts "worst quality, low quality, overexposed, underexposed, semi-realistic, over saturation" for all results.

‡ a young man with curly hair wearing glasses
‡ an middle aged Asian woman with short hair, angry expression
‡ young Asian lady with ponytail hairstyle
† T800 in Terminator
† Illidan Stormrage
a head sculpture of Mario
Full Results

Example Editing Results

HeadArtist can edit geometry and texture.

† denotes "a head of". ‡ denotes "a DSLR portrait of". Text in orange denotes the editing instruction. We adopt negative prompts "worst quality, low quality, overexposed, underexposed, semi-realistic, over saturation" for all results.

‡ young man with a muscular jawline, stubble bread

‡ young man with a muscular jawline, stubble bread, he has happy expression

older man with a muscular jawline, stubble bread

‡ Obama with a baseball cap

Skull of Obama with a baseball cap

Pixar style Obama with a baseball cap

Full Editing Results

Comparison Results

We adopt 10 prompts from HeadSculpt to make fair comparisons. For the HeadSculpt, we directly use the results from their webpage. For the other methods, we get the results by threestudio within the default configuration.

Dreamfusion

ProlificDreamer

LatentNerf

Fantasia3D

HeadSculpt

Ours

a DSLR portrait of a female soldier, wearing a helmet

Dreamfusion

ProlificDreamer

LatentNerf

Fantasia3D

HeadSculpt

Ours

a DSLR portrait of a boy with facial painting

Dreamfusion

ProlificDreamer

LatentNerf

Fantasia3D

HeadSculpt

Ours

a DSLR portrait of Lionel Messi

Full Comparisons

Citation

@article{liu2023HeadArtist,
  author = {Hongyu Liu, Xuan Wang, Ziyu Wan, Yujun Shen, Yibing Song, Jing Liao, Qifeng Chen},
  title = {HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation},
  journal = {arXiv:2312.07539},
  year = {2023},
}