Image Processing

Title: Deep Preset: Blending and Retouching Photos with Color Style Transfer

Abstract: End-users, without knowledge in photography, desire to beautify their photos to have a similar color style as a well-retouched reference. However, the definition of style in recent image style transfer works is inappropriate. They usually synthesize undesirable results due to transferring exact colors to the wrong destination. It becomes even worse in sensitive cases such as portraits. In this work, we concentrate on learning low-level image transformation, especially color-shifting methods, rather than mixing contextual features, then present a novel scheme to train color style transfer with ground-truth. Furthermore, we propose a color style transfer named Deep Preset. It is designed to 1) generalize the features representing the color transformation from content with natural colors to retouched reference, then blend it into the contextual features of content, 2) predict hyper-parameters (settings or preset) of the applied low-level color transformation methods, 3) stylize content to have a similar color style as reference. We script Lightroom, a powerful tool in editing photos, to generate 600,000 training samples using 1,200 images from the Flick2K dataset and 500 user-generated presets with 69 settings. Experimental results show that our Deep Preset outperforms the previous works in color style transfer quantitatively and qualitatively.

Published: 2021 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2113-2121, Jan. 2021 (acceptance rate 35.4%). [paper][code]

Title: DrawGAN: Text to Image Synthesis with Drawing Generative Adversarial Networks

Abstract: In this paper, we propose a novel drawing generative adversarial networks (DrawGAN) for text-to-image synthesis. The whole model divides the image synthesis into three stages by imitating the process of drawing. The first stage synthesizes the simple contour image based on the text description, the second stage generates the foreground image with detailed information, and the third stage synthesizes the final result. Through the step-by-step synthesis process from simple to complex and easy to difficult, the model can draw the corresponding results step by step and finally achieve the higher-quality image synthesis effect. Our method is validated on the Caltech-UCSD Birds 200 (CUB) dataset and the Microsoft Common Objects in Context (MS COCO) dataset. The experimental results demonstrate the effectiveness and superiority of our method. In terms of both subjective and objective evaluation, our method’s results surpass the existing state-of-the-art methods

Published: IEEE International Conference on Acoustics, Speach and Signal Processing (ICASSP) Jun. 2021.