Image Style Transfer Using Keras and Tensorflow 使用Keras和Tensorflow生成风格转移图片

fast.ai有关Image Style Transfer的介绍是我见过最好的。Image Style Transfer实质上就是生成一副新的Image,使得它与给定Content Image的内容可以尽量接近,同时又兼具Stype Image的风格。为简单考虑,我们分开考虑如何分别生成Content Image和Style Image。不过,这里所讲的“生成”并不是train一个神经网络Model的参数,相反,我们会从一副随机生成的图片开始,通过L-BFGS的迭代优化算法来最小化这幅图片和给定Content Image之间的基于像素的MSE,此时,每次迭代都会利用梯度下降更新我们生成的图片。

最初提到Image Style Transfer的论文设定我们利用卷积神经网络,那么我们这里也选取熟悉的VGG16。为了更好的保留图片信息,我们需要将原始VGG16的max pooling层替换成average pooling层。

#Max pooling after Convolution
x = MaxPooling2D((2, 2), strides=(2, 2), name=’block1_pool’)(x)

#Average pooling after Convolution
x = AveragePooling2D((2, 2), strides=(2, 2), name=’block1_pool’)(x)

下面我们考察下如何生成Content Image。
#Step 0 — Create a pre-trained VGG16 model with last 3 FC layers removed
#and pooling layer replaced with average pooling 首先当然是创建一个VGG16网络,include_top=False表示舍弃后3层全连层
model = VGG16_Avg(include_top=False)

#Step 1 — Load Content Image and Image Preprocessing 接着加载Content图片,crop剪裁并无他意,由于我们并不是用VGG16来做图片分类,因此没有必要一定遵循VGG16规定的224乘224的输入图片尺寸
img = Image.open(‘JasonRidge.jpeg’)
img = img.crop(box=(30,60,430,460))
img

Oricontent

#Image Preprocessing — Subtract mean value and convert from RGB to BGR to feed VGG16 VGG16要求输入图片都减去ImageNet的均值并转化为BGR格式
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)
preproc = lambda x: (x – rn_mean)[:, :, :, ::-1]
deproc = lambda x,s: np.clip(x.reshape(s)[:, :, :, ::-1] + rn_mean, 0, 255)

img_arr = preproc(np.expand_dims(np.array(img), 0))
shp = img_arr.shape

%matplotlib inline
import matplotlib.pyplot as plt
plt.imshow(np.squeeze(img_arr))

Preprocontent.png

#Step 2 — Build Neural Network Graph (Model)
from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave
from keras import metrics
#We take the first Convolution output of the 5th also the last Convolution Block
#We can latet verify with earlier layer 我们选取最后一个卷积块的第一层输出作为衡量标准
layer = model.get_layer(‘block5_conv1’).output
layer_model = Model(model.input, layer)
targ = K.variable(layer_model.predict(img_arr))

#Step 3 — Build Neural Network Loss and Gradient (Optimization)
#Loss is a scalar value — MSE 注意loss是一个标量,而梯度是一个ndarrdy
loss = metrics.mse(layer, targ)
grads = K.gradients(loss, model.input)
loss_fn = K.function([model.input], [loss])
grad_fn = K.function([model.input], grads)

#Create a container class for both Loss and Gradients Evaluator类可以在优化阶段方便计算Loss和梯度
class Evaluator(object):
def init(self, loss_f, grad_f, shp):
self.loss_f, self.grad_f, self.shp = loss_f, grad_f, shp

def loss(self, x):
loss_ = self.loss_f([x.reshape(self.shp)])
return np.array(loss_).astype(np.float64)

def grads(self, x):
grad_ = self.grad_f([x.reshape(self.shp)])
return np.array(grad_).flatten().astype(np.float64)
evaluator = Evaluator(loss_fn, grad_fn, shp)

#Step 4 — Time to Optimize
#Define a method which will optimize (minimize) the Loss using L_BFGS algorithm solve_image方法会将每次迭代更新的图片保存下来
def solve_image(eval_obj, niter, x):
for i in range(niter):
x, min_val, info = fmin_l_bfgs_b(eval_obj.loss, x.flatten(),
fprime=eval_obj.grads, maxfun=20)

x = np.clip(x, -127,127)
print(‘Current loss value:’, min_val)
imsave(‘results/Content_gen_conv5_1_{i}.png’.format(i=i+1), deproc(x.copy(), shp)[0])
return x

#Generate a random ‘noise’ image which will be the starting point of the optimization iteration 这里生成我们最初的图片,其实就是随机数组成的噪音图片
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/100
x = rand_img(shp)
imsave(‘results/Content_gen_conv5_1_0.png’,x[0])

#Start the optimization with 10 iterations
iterations = 10
x = solve_image(evaluator, iterations, x)

('Current loss value:', array([ 92.49624634]))
('Current loss value:', array([ 38.59172058]))
('Current loss value:', array([ 25.30461121]))
('Current loss value:', array([ 19.56064987]))
('Current loss value:', array([ 16.33964348]))
('Current loss value:', array([ 14.33158779]))
('Current loss value:', array([ 12.92133713]))
('Current loss value:', array([ 11.83940792]))
('Current loss value:', array([ 10.99672127]))
('Current loss value:', array([ 10.28565693]))

Step 5 — Visulization

from IPython.display import HTML
from matplotlib import animation, rc

fig, ax = plt.subplots()
def animate(i):ax.imshow(Image.open(‘results/Content_gen_conv5_1_{i}.png’.format(i=i)))

anim = animation.FuncAnimation(fig, animate, frames=iterations+1, interval=500)
HTML(anim.to_html5_video())

可惜wordpress没法上传视频,不然那种从无到有做转变的一定会让你惊艳到。

Image.open(‘results/Content_gen_conv5_1_10.png’)

上面左图就是最初的噪音图片,右图则是10次优化后生成的图片。

有了生成Content Image的经验,我们就来考察下如何做Style Transfer。其实核心的改动就在Step2和Step3。下面就着重介绍下要点。

#Step 2 — Build Neural Network Graph (Model)

outputs = {l.name: l.output for l in model.layers}
#Content Model和上面完全一致,只是我们选了更早的卷积层输出,以期可以保留尽量多的图片内容
content_name = ‘block3_conv2’
content_layer = outputs[content_name]
content_model = Model(model.input, content_layer)
content_targ = K.variable(content_model.predict(content_arr))
#Style Model则不同,它需要5个卷积层的输出
style_layers = [outputs[‘block{}_conv2’.format(o)] for o in range(1,6)]
style_model = Model(model.input, style_layers)
style_targs = [K.variable(o) for o in style_model.predict(style_arr)]

#Step 3 — Build Neural Network Loss and Gradient (Optimization)
#First create a fucntion to calculate the Gram Matrix for Style Loss 下面新定义的方法专门计算格拉姆矩阵
def gram_matrix(x):
#We want each row to be a channel, and the columns to be flattened x,y locations
features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
#The dot product of this with its transpose shows the correlation
#between each pair of channels
return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()

def style_loss(x, targ): return metrics.mse(gram_matrix(x), gram_matrix(targ))

#Style Loss is a scalar
style_wgts = [0.05,0.2,0.2,0.25,0.3]
loss_sty = sum(style_loss(l1[0], l2[0])*w for l1,l2,w in zip(style_layers, style_targs, style_wgts))

#Content Loss is a scalar
loss_con = metrics.mse(content_layer, content_targ)

#Total Loss is the linear sum of Style Loss and Content Loss
#But here the denominator 10 is just used to balance whether
#Style or Content is more emphasized 这里,我们兼顾Content Loss和Style Loss的方法就是取2者的和,Content Loss下面的分母10是用来控制2者的权重,分母越大,则Content Loss的权重就越小,相应生成的图片也会更加侧重保留风格而非内容。
loss = loss_sty + loss_con/10

#K.gradients = tf.contrib.keras.backend.gradients 有了总的Loss,梯度的处理和上面Image处理也完全一致。
grads = K.gradients(loss, model.input)

#K.function = tf.contrib.keras.backend.function
#Note that loss_fn will return a scalar Loss while grad_fn a numpy array Gradients
loss_fn = K.function([model.input], [loss])
grad_fn = K.function([model.input], grads)

好了,那我们来看下最终的成果吧。

看下帅大叔和百花图以及小萌妹的搭配是什么效果吧。

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s