Neural Style Transfer Guide

4 min readMar 5, 2022

Neural Style Transfer is a technique that applies the Style of 1 image to the content of another image. It’s a generative algorithm meaning that the algorithm generates an image as the output. As you’re probably wondering, how does it work? In this post, we’ll be explaining how the vanilla Neural Style Transfer algorithm adds different styles to an image and what makes the algorithm unique and interesting.

Background
Algorithm Overview
Comparison to Image Classification
Content Loss
Style Loss
Training Loop and Optimization
Conclusion

1. Background

If you’re reading this tutorial, you’re probably already familiar with some common Deep Learning algorithms such as GANs (Generative Adversarial Networks). Both Style Transfer and traditional GANs share the similarity of being able to generate images as the output. What makes Style Transfer distinct from a traditional GAN? One of the main differences is that Style Transfer and GANs generate images in a different way.

In Style Transfer, we’ll want to take 2 images as input (the Content Image and the Style Image) and output the Stylized Content Image.

Content Image + Style Image ⇒ Stylized Content Image

2. Algorithm Overview

Given the Content Image and the Style Image as inputs, how do we generate a Stylized Content Image? Importantly, the vanilla Style Transfer algorithm uses an existing pretrained network whose weights will be frozen throughout the training process. Because we’ll be dealing with image data, Convolutional Neural Networks such as VGG or Resnet will be common choices for the pertained neural network. Note that we’ll be using the pretrained network’s frozen weights as a building block to the Style Transfer algorithm.

We start with a pretrained network such as VGG. Note that the VGG network weights are frozen throughout the training process. Instead, the only weights that will change are the input image’s weights.

3. Comparison to Image Classification

The vanilla style transfer algorithm shares some similarities to a traditional Image Classification Neural Network.

For Style Transfer, the input image weights are updated with Gradient Descent while the weights of the already pretrained Convolutional Neural Network are frozen.
In the Image Classification, the input image weights are kept frozen while the weights of the Convolutional Neural Network are updated using Gradient Descent.

Figure 1: Comparing the Style Transfer algorithm to a traditional Convolutional Neural Network (Image by Author)

From the comparison above, we can see that in many traditional Neural Network applications, the weights of a Neural Network are usually kept constant. The fact that we can update the input weights of a Neural Network that can be changed and updated using Gradient Descent may even come as a surprise to some of us that have never seen this done before. Because the image being updated is located at the beginning of the network, it’s important to note that the generated stylized image is not an output of the neural network. The generated image will instead be located at the beginning of the network where the input image was updated and modified.

4. Content Loss

The Content Loss tries to keep the content of the original input image the same. As an example, if there’s an apple in the original image, it will try to keep an apple in the stylized image rather than change the apple to an orange. The Content Loss can easily be calculated as the Mean Squared Error between the generated image’s feature maps and the original content image’s feature maps.

5. Style Loss

The Style Loss tries to add features of the Style Image to the generated output image. The algorithm does so by calculating the Gram Matrices of the images. Calculating the Gram Matrices provide a way to calculate a distance metric where the smaller the distance, the more similar the style that 2 images are.

6. Training Loop and Optimization

Like many other Deep Learning algorithms, we’ll need a training loop to generate a stylized content image. In this case, the training loop can be a simple loop that iterates for a fixed number (e.g. 5000) steps. Within the training loop we’ll calculate the Content Loss and the Style Loss and use the combined loss function to optimize the weights of our input image. The combined loss function can be optimized using a standard optimization algorithm such as Gradient Descent or the Adam Optimization Algorithm.

7. Conclusion

In conclusion, we’ve seen how the vanilla Style Transfer algorithm learns to generate a stylized image from an input content image and another input style image. This algorithm produces the Stylized Content Image by optimizing for both the Content Loss and the Style Loss. One of the most interesting things about this algorithm is that the weights of the input image are optimized while the weights of the Convolutional Neural Network are frozen. Concepts such as the Style Loss, Content Loss, and optimizing the Input Image Weights are important for keeping in your toolbox when learning about new Deep Learning topics.

Neural Style Transfer Guide

Table of Contents