FreeStyle : Free Lunch for Text-guided Style Transfer using Diffusion Models

School of Cyberspace Security, Sun Yat-Sen University
Institute of Software Research, Chinese Academy of Sciences
University of Chinese Academy of Sciences
School of Computer Science, Soochow University
Harbin Institute of Technology

Abstract

The rapid development of generative diffusion models has significantly advanced the field of style transfer. However, most current style transfer methods based on diffusion models typically involve a slow iterative optimization process, e.g., model fine-tuning and textual inversion of style concept. In this paper, we introduce FreeStyle, an innovative style transfer method built upon a pre-trained large diffusion model, requiring no further optimization. Besides, our method enables style transfer only through a text description of the desired style, eliminating the necessity of style images. Specifically, we propose a dual-stream encoder and single-stream decoder architecture, replacing the conventional U-Net in diffusion models. In the dual-stream encoder, two distinct branches take the content image and style text prompt as inputs, achieving content and style decoupling. In the decoder, we further modulate features from the dual streams based on a given content image and the corresponding style text prompt for precise style transfer.

FreeStyle Framework

We introduce FreeStyle, an innovative style transfer method built upon a pre-trained large diffusion model, requiring no further optimization.

Pixel Style

Origami Art Style

Embroidery Art Style

Studio Ghibli Style

Chinese Ink

Oil Painting

JOJO Style

Children Crayon Painting Style

BibTeX


      @article{he2024freestyle,
      title={Freestyle: Free lunch for text-guided style transfer using diffusion models},
      author={He, Feihong and Li, Gang and Zhang, Mengyuan and Yan, Leilei and Si, Lingyu and Li, Fanzhang},
      journal={arXiv preprint arXiv:2401.15636},
      year={2024}
      }