TensorFlow 函数总结

tf 函数

tf.split

tf.split()：axis的意思就是输入张量的哪一个维度，如果是0就表示对第0维度进行切割。num_or_size_splits就是切割的数量，如果是2就表示输入张量被切成2份，每一份是一个列表。

tf.split(
    value,
    num_or_size_splits,
    axis=0,
    num=None,
    name='split'
)

如果 num_or_size_splits 传入的是一个整数，这个整数代表这个张量最后会被切成几个小张量。此时，传入 axis 的数值就代表切割哪个维度（从0开始计数）。调用 tf.split(my_tensor, 2，0) 返回两个 10 * 30 * 40 的小张量。

import tensorflow as tf

A = [[1, 2, 3], [4, 5, 6]]
a0 = tf.split(A, num_or_size_splits=3, axis=1)#不改变维数（！！）
a1 = tf.unstack(A, num=3,axis=1)
a2 = tf.split(A, num_or_size_splits=2, axis=0)
a3 = tf.unstack(A, num=2,axis=0)
with tf.Session() as sess:
    print(sess.run(a0))
    print(sess.run(a1))
    print(sess.run(a2))
    print(sess.run(a3))
       
[array([[1],[4]]), array([[2],[5]]), array([[3],[6]])]
[array([1, 4]), array([2, 5]), array([3, 6])] 
[array([[1, 2, 3]]), array([[4, 5, 6]])] 
[array([1, 2, 3]), array([4, 5, 6])]

如果 num_or_size_splits 传入的是一个向量，那么向量有几个分量就分成几份，切割的维度还是由 axis 决定。比如调用 tf.split(my\_tensor, [10, 5, 25], 2)，则返回三个张量分别大小为 20 × 30 × 10、20 × 30 × 5、20 × 30 × 25。很显然，传入的这个向量各个分量加和必须等于 axis 所指示原张量维度的大小 (10 + 5 + 25 = 40)。

tf.concat

连接两个（或多个）通道（矩阵）

tf.concat(
    values,
    axis,
    name='concat'
)

# axis：0表示行，1表示列

>>> t1 = [[1, 2, 3], [4, 5, 6]]
>>> t2 = [[7, 8, 9], [10, 11, 12]]

>>> print(sess.run(tf.concat([t1, t2], 0)))
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

>>> print(sess.run(tf.concat([t1, t2], 1)))
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]
 
# tensor t3 with shape [2, 3]
# tensor t4 with shape [2, 3]
tf.shape(tf.concat([t3, t4], 0))  # [4, 3]
tf.shape(tf.concat([t3, t4], 1))  # [2, 6] 


>>> t1 = [[[1, 1, 1],[2, 2, 2]],[[3, 3, 3],[4, 4, 4]]]
>>> t2 = [[[5, 5, 5],[6, 6, 6]],[[7, 7, 7],[8, 8, 8]]]

>>> print(sess.run(tf.concat([t1, t2], 1)))
[[[1 1 1]
  [2 2 2]
  [5 5 5]
  [6 6 6]]

 [[3 3 3]
  [4 4 4]
  [7 7 7]
  [8 8 8]]]

>>> print(sess.run(tf.concat([t1, t2], 0)))
[[[1 1 1]
  [2 2 2]]

 [[3 3 3]
  [4 4 4]]

 [[5 5 5]
  [6 6 6]]

 [[7 7 7]
  [8 8 8]]]

tf.cond

cond （ 
    pred ， 
    true_fn = None ， 
    false_fn = None ， 
    strict = False ， 
    name = None ， 
    fn1 = None ， 
    fn2 = None
 ）

如果断言 pred 为 true 则返回 true_fn() ，否则返回 false_fn()

例子

import tensorflow as tf
a = tf.constant(2)
b = tf.constant(3)
x = tf.constant(4)
y = tf.constant(5)
z = tf.multiply(a, b)
result = tf.cond(x < y, lambda: tf.add(x, z), lambda: tf.square(y))
with tf.Session() as session:
    print(result.eval())
    print(z.eval())

>>>10
>>>6

作用

在 dropout 中判断是否在训练：

self.is_training = tf.placeholder(tf.bool)

...
...
...

def dropout_with_keep():
    return tf.nn.dropout(conv_a, dropout_keep_prob)

def dropout_no_keep():
    return tf.nn.dropout(conv_a, 1.0)

if dropout_keep_prob != -1:
    conv_o_dr = tf.cond(is_training, dropout_with_keep, dropout_no_keep)
else:
    conv_o_dr = conv_a

或者：

with tf.variable_scope('control'):
    # it controls dropout and batch_norm layers
    is_training = tf.placeholder_with_default(True, [], 'is_training')

...
...
...

def _dropout(X, is_training, rate=0.5):
    keep_prob = tf.constant(
        1.0 - rate, tf.float32,
        [], 'keep_prob'
    )
    result = tf.cond(
        is_training,
        lambda: tf.nn.dropout(X, keep_prob),
        lambda: tf.identity(X),
        name='dropout'
    )
    return result

tf.tile

tf.tile() 应用于需要张量扩展的场景，具体说来就是：
如果现有一个形状如 [width, height] 的张量，需要得到一个基于原张量的，形状如 [batch_size,width,height] 的张量，其中每一个 batch 的内容都和原张量一模一样。

tile(
    input,
    multiples,
    name=None
)

示例

import tensorflow as tf
temp = tf.tile([1,2,3],[2])
temp2 = tf.tile([[1,2],[3,4],[5,6]],[2,3])
with tf.Session() as sess:
    print(sess.run(temp))
    	
    	# [1 2 3 1 2 3]
    
temp = tf.tile([[1,2,3],[1,2,3]],[1,1])
temp2 = tf.tile([[1,2,3],[1,2,3]],[2,1])
temp3 = tf.tile([[1,2,3],[1,2,3]],[2,2])
with tf.Session() as sess:
    print(sess.run(temp))
    
    	# [[1 2 3] 
		  [1 2 3]]
		  
    print(sess.run(temp2))
    
    	# [[1 2 3] 
		  [1 2 3] 
		  [1 2 3] 
		  [1 2 3]]
		  
    print(sess.run(temp3))
    
    	# [[1 2 3 1 2 3] 
		  [1 2 3 1 2 3] 
		  [1 2 3 1 2 3] 
		  [1 2 3 1 2 3]]

tf.contrib.layers.flatten

tf.contrib.layers.flatten(P) 这个函数就是把P保留第一个维度，把第一个维度包含的每一子张量展开成一个行向量，返回张量是一个二维的，shape = (batch_size, ...), 一般用于卷积神经网络全链接层前的预处理。

例如 CNN 的 conv 层输出的 tensor 的 shape 为 $\text{[batch_size, height, width, channel]}$, 刚展开会就是 $\text{[batch_size, height × width × channel]}$。

tf.contrib.layers.fully_connection

tf.contrib.layers.fully_connection(F，num_output,activation_fn) 这个函数就是全链接成层, F 是输入，num_output 是下一层单元的个数，activation_fn 是激活函数，默认是 relu

tf.Variable与tf.get_variable()

tf.Variable(initial_value=None, trainable=True, \
	collections=None, validate_shape=True, \
	caching_device=None, name=None, \
	variable_def=None, dtype=None, \
	expected_shape=None, import_scope=None)

tf.get_variable(name, shape=None, dtype=None, \
	initializer=None, regularizer=None, trainable=True, \
	collections=None, caching_device=None, \
	partitioner=None, validate_shape=True, custom_getter=None)

区别

使用tf.Variable时，如果检测到命名冲突，系统会自己处理。使用tf.get_variable()时，系统不会处理冲突，而会报错。

基于这两个函数的特性，当我们需要共享变量的时候，需要使用tf.get_variable()。在其他情况下，这两个的用法是一样的
对于tf.Variable函数，变量名称是一个可选的参数，通过name=”v”的形式给出。而tf.get_variable函数，变量名称是一个必填的参数，它会根据变量名称去创建或者获取变量。

import tensorflow as tf
w_1 = tf.Variable(3,name="w_1")
w_2 = tf.Variable(1,name="w_1")
print w_1.name
print w_2.name

#输出
#w_1:0
#w_1_1:0

import tensorflow as tf

w_1 = tf.get_variable(name="w_1",initializer=1)
w_2 = tf.get_variable(name="w_1",initializer=2)

#错误信息
#ValueError: Variable w_1 already exists, disallowed. Did
#you mean to set reuse=True in VarScope?

1
2
3

#定义的基本等价  
v = tf.get_variable("v",shape=[1],initializer.constant_initializer(1.0))  
v = tf.Variable(tf.constant(1.0,shape=[1]),name="v")

tf.name_scope() / tf.variable_scope()

主要目的是为了更加方便地管理参数命名

tf.name_scope 主要结合 tf.Variable() 来使用，方便参数命名管理。

import tensorflow as tf

with tf.name_scope('conv1') as scope:
    weights1 = tf.Variable([1.0, 2.0], name='weights')
    bias1 = tf.Variable([0.3], name='bias')

# 下面是在另外一个命名空间来定义变量的
with tf.name_scope('conv2') as scope:
    weights2 = tf.Variable([4.0, 2.0], name='weights')
    bias2 = tf.Variable([0.33], name='bias')

# 所以，实际上weights1 和 weights2 这两个引用名指向了不同的空间，不会冲突
print(weights1.name)
print(weights2.name)

>>>conv1/weights:0
>>>conv2/weights:0

tf.variable_scope() 主要结合 tf.get_variable() 来使用，实现变量共享。

import tensorflow as tf
# 注意， bias1 的定义方式
with tf.variable_scope('v_scope') as scope1:
    Weights1 = tf.get_variable('Weights', shape=[2, 3])
#     bias1 = tf.Variable([0.52], name='bias')

# 下面来共享上面已经定义好的变量
# note: 在下面的 scope 中的get_variable()变量必须已经定义过了，才能设置 reuse=True，否则会报错
with tf.variable_scope('v_scope', reuse=True) as scope2:
    Weights2 = tf.get_variable('Weights')
    bias2 = tf.Variable([0.52], name='bias')

print(Weights1.name)
print(Weights2.name)
print(bias2.name)

>>>v_scope/Weights:0
>>>v_scope/Weights:0
>>>v_scope_1/bias:0

tf.control_dependencies()

1	tf.control_dependencies(self, control_inputs)

通过以上的解释，我们可以知道，该函数接受的参数control_inputs，是Operation或者Tensor构成的list。返回的是一个上下文管理器，该上下文管理器用来控制在该上下文中的操作的依赖。也就是说，上下文管理器下定义的操作是依赖control_inputs中的操作的，control_dependencies用来控制control_inputs中操作执行后，才执行上下文管理器中定义的操作。

例子

如果我们想要确保获取更新后的参数，name我们可以这样组织我们的代码。

opt = tf.train.Optimizer().minize(loss)

with tf.control_dependencies([opt]): #先执行opt
  updated_weight = tf.identity(weight)  #再执行该操作

with tf.Session() as sess:
  tf.global_variables_initializer().run()
  sess.run(updated_weight, feed_dict={...}) # 这样每次得到的都是更新后的weight

tf.placeholder_with_default()

placeholder_with_default(
    input,
    shape,
    name=None
)

该函数将返回一个张量。与 input 具有相同的类型。一个占位符张量，默认为 input 的占位符张量 (如果未送入)。

with tf.variable_scope('inputs'):
    X = tf.placeholder_with_default(
        data['x_batch'], [None, IMAGE_SIZE, IMAGE_SIZE, 3], 'X')
    Y = tf.placeholder_with_default(
        data['y_batch'], [None, NUM_CLASSES], 'Y')

tf.device()

如果需要切换成CPU运算，可以调用 tf.device(device_name) 函数，其中 device_name 格式如 /cpu:0 其中的0表示设备号，TF 不区分 CPU 的设备号，设置为0即可。GPU 区分设备号 \gpu:0 和 \gpu:1 表示两张不同的显卡。

在一些情况下，我们即使是在GPU下跑模型，也会将部分Tensor储存在内存里，因为这个Tensor可能太大了，显存不够放，相比于显存，内存一般大多了，于是这个时候就常常人为指定为CPU设备。这种形式我们在一些代码中能见到。如：

1 2	with tf.device('/cpu:0'): build_CNN() # 此时，这个CNN的Tensor是储存在内存里的，而非显存里。

例子

1 2	with tf.device('/cpu:0'), tf.variable_scope('input_pipeline'): data = _get_data(NUM_CLASSES, IMAGE_SIZE)

tf.Graph()

一个TensorFlow的运算，被表示为一个数据流的图。
一幅图中包含一些操作（Operation）对象，这些对象是计算节点。前面说过的Tensor对象，则是表示在不同的操作（operation）间的数据节点

你一旦开始你的任务，就已经有一个默认的图已经创建好了。而且可以通过调用tf.get_default_graph()来访问到。
添加一个操作到默认的图里面，只要简单的调用一个定义了新操作的函数就行。比如下面的例子展示的：

import tensorflow as tf
import numpy as np

c=tf.constant(value=1)
assert c.graph is tf.get_default_graph()
print(c.graph)
print(tf.get_default_graph())

# 输出
<tensorflow.python.framework.ops.Graph object at 0x107324cc0>
<tensorflow.python.framework.ops.Graph object at 0x107324cc0>

另外一种典型的用法就是要使用到 Graph.as_default() 的上下文管理器（ context manager），它能够在这个上下文里面覆盖默认的图。如下例：

import tensorflow as tf
import numpy as np

c = tf.constant(value=1)
print(c.graph)
print(tf.get_default_graph())
print()

g = tf.Graph()
print("g:", g)
with g.as_default():
    d = tf.constant(value=2)
    print(d.graph)
    print()

g2 = tf.Graph()
print("g2:", g2)
with g2.as_default():
    e = tf.constant(value=15)
    print(e.graph)
    print()

f = tf.constant(value=1)
print(f.graph)


# 输出
<tensorflow.python.framework.ops.Graph object at 0x104845da0>
<tensorflow.python.framework.ops.Graph object at 0x104845da0>

g: <tensorflow.python.framework.ops.Graph object at 0x1815af77f0>
<tensorflow.python.framework.ops.Graph object at 0x1815af77f0>

g2: <tensorflow.python.framework.ops.Graph object at 0x1815af7748>
<tensorflow.python.framework.ops.Graph object at 0x1815af7748>

<tensorflow.python.framework.ops.Graph object at 0x104845da0>

可以看到，如果在 with 外的话，graph 是系统默认的图，而不是带 with 的默认图

基本 CNN 函数

def _batch_norm(X, is_training):
    return tf.layers.batch_normalization(
        X, scale=False, center=True,
        momentum=BATCH_NORM_MOMENTUM,
        training=is_training, fused=True
    )


def _global_average_pooling(X):
    return tf.reduce_mean(
        X, axis=[1, 2],
        name='global_average_pooling'
    )


def _max_pooling(X):
    return tf.nn.max_pool(
        X, [1, 3, 3, 1], [1, 2, 2, 1], 'SAME',
        name='max_pooling'
    )


def _avg_pooling(X):
    return tf.nn.avg_pool(
        X, [1, 3, 3, 1], [1, 2, 2, 1], 'SAME',
        name='avg_pooling'
    )

def _nonlinearity(X):
    return tf.nn.relu(X, name='ReLU')


def _dropout(X, is_training, rate=0.5):
    keep_prob = tf.constant(
        1.0 - rate, tf.float32,
        [], 'keep_prob'
    )
    result = tf.cond(
        is_training,
        lambda: tf.nn.dropout(X, keep_prob),
        lambda: tf.identity(X),
        name='dropout'
    )
    return result



# Reading cifar dataset
def unpickle(file):
    import cPickle
    f = open(file, 'rb')
    dict = cPickle.load(f)
    f.close()
    return dict

tf.one_hot()

import tensorflow as tf


CLASS = 8
label1 = tf.constant([0, 1, 2, 3, 4, 5, 6, 7])
sess1 = tf.Session()
print('label1:', sess1.run(label1))
b = tf.one_hot(label1, CLASS, 1, 0)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(b)
    print('after one_hot', sess.run(b))
    
# 输出
label1: [0 1 2 3 4 5 6 7]
after one_hot [[1 0 0 0 0 0 0 0]
				[0 1 0 0 0 0 0 0]
 				[0 0 1 0 0 0 0 0]
 				[0 0 0 1 0 0 0 0]
				[0 0 0 0 1 0 0 0]
 				[0 0 0 0 0 1 0 0]
				[0 0 0 0 0 0 1 0]
				[0 0 0 0 0 0 0 1]]

tf 梯度

apply_gradients()

apply_gradients(grads_and_vars,global_step=None,name=None)

作用：
把梯度 “应用”（Apply）到变量上面去。其实就是按照梯度下降的方式加到上面去。这是 minimize() 函数的第二个步骤。返回一个应用的操作。

参数:
grads_and_vars: compute_gradients() 函数返回的 (gradient, variable) 对的列表
global_step: Optional Variable to increment by one after the variables have been updated.
name: 可选，名字

tf.gradients()

tf.gradients(ys, xs, 
             grad_ys=None, 
             name='gradients',
             colocate_gradients_with_ops=False,
             gate_gradients=False,
             aggregation_method=None,
             stop_gradients=None)

对求导函数而言，其主要功能即求导公式： $\frac{∂y}{∂x}$ 。在tensorflow中， $y$ 和 $x$ 都是 tensor。

更进一步，tf.gradients() 接受求导值 ys 和 xs 不仅可以是 tensor，还可以是 list，形如 [tensor1, tensor2, …, tensorn]。当 ys 和 xs 都是 list 时，它们的求导关系为：

假设返回值是 $[grad1, grad2, grad3]，ys=[y1, y2]，xs=[x1, x2, x3]$ 。则，真实的计算过程为:

$$grad1=\frac{∂ y1}{∂x1}+\frac{∂y2}{∂x1}$$
$$grad2=\frac{∂ y1}{∂x2}+\frac{∂y2}{∂x2}$$
$$grad3=\frac{∂ y1}{∂x3}+\frac{∂y2}{∂x3}$$

实例

以线性回归为例，实践 tf.gradients() 的基础功能。线性回归： $y=3×x+2$

import numpy as np
import tensorflow as tf


sess = tf.Session()

x_input = tf.placeholder(tf.float32, name='x_input')
y_input = tf.placeholder(tf.float32, name='y_input')
w = tf.Variable(2.0, name='weight')
b = tf.Variable(1.0, name='biases')
y = tf.add(tf.multiply(x_input, w), b)
loss_op = tf.reduce_sum(tf.pow(y_input - y, 2)) / (2 * 32)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss_op)

gradients_node = tf.gradients(loss_op, w)
print(gradients_node)


init = tf.global_variables_initializer()
sess.run(init)

'''构造数据集'''
x_pure = np.random.randint(-10, 100, 32)
x_train = x_pure + np.random.randn(32) / 10  # 为x加噪声
y_train = 3 * x_pure + 2 + np.random.randn(32) / 10  # 为y加噪声

for i in range(20):
    _, gradients, loss = sess.run([train_op, gradients_node, loss_op],
                                  feed_dict={x_input: x_train[i], y_input: y_train[i]})
    print("epoch: {} \t loss: {} \t gradients: {}".format(i, loss, gradients))

sess.close()


### 输出
[<tf.Tensor 'gradients_1/Mul_grad/Reshape_1:0' shape=() dtype=float32>]
epoch: 0 	 loss: 0.06110262870788574 	 gradients: [-0.064300433]
epoch: 1 	 loss: 108.1637191772461 	 gradients: [-212.91476]
epoch: 2 	 loss: 34.10615921020508 	 gradients: [61.373066]
epoch: 3 	 loss: 16.358537673950195 	 gradients: [64.797104]

...
epoch: 18 	 loss: 0.0004277984262444079 	 gradients: [0.28949624]
epoch: 19 	 loss: 0.00298550957813859 	 gradients: [1.0919241]

可以看到梯度逐渐减小，说明模型逐渐收敛

tf.stop_gradients()

stop_gradients 也是一个 list，list 中的元素是 tensorflow graph 中的 op，一旦进入这个list，将不会被计算梯度，更重要的是，在该op之后的BP计算都不会运行。

a = tf.constant(0.)
b = 2 * a
c = a + b
g = tf.gradients(c, [a, b])

# 输出

计算得 g = [3.0, 1.0]。因为 ∂c/∂a=∂a/∂a+∂b/∂a=3.0

但如果冻结operator a和b的梯度计算：

a = tf.constant(0.)
b = 2 * a
g = tf.gradients(a + b, [a, b], stop_gradients=[a, b])

# 输出
计算得g=[1.0, 1.0]

上面的代码也等效于：

1
2
3

a = tf.stop_gradient(tf.constant(0.))
b = tf.stop_gradient(2 * a)
g = tf.gradients(a + b, [a, b])

处理梯度

计算全部gradient

gradient_all = optimizer.compute_gradients(loss)

得到可进行梯度计算的变量

grads_vars = [v for (g,v) in gradient_all if g is not None]

得到所需梯度

gradient = optimizer.compute_gradients(loss, grads_vars)

生成holder

grads_holder = [(tf.placeholder(tf.float32, shape=g.get_shape()), v) for (g,v) in gradient]

继续进行BP算法

train_op = optimizer.apply_gradients(grads_holder)

tf.contrib.slim

tf.contrib.slim.conv2d

convolution(inputs,
          num_outputs,
          kernel_size,
          stride=1,
          padding='SAME',
          data_format=None,
          rate=1,
          activation_fn=nn.relu,
          normalizer_fn=None,
          normalizer_params=None,
          weights_initializer=initializers.xavier_initializer(),
          weights_regularizer=None,
          biases_initializer=init_ops.zeros_initializer(),
          biases_regularizer=None,
          reuse=None,
          variables_collections=None,
          outputs_collections=None,
          trainable=True,
          scope=None)

inputs：同样是指需要做卷积的输入图像
num_outputs：指定卷积核的个数（就是filter的个数）
kernel_size：用于指定卷积核的维度（卷积核的宽度，卷积核的高度）
stride：为卷积时在图像每一维的步长
padding：为padding的方式选择，VALID或者SAME
data_format：是用于指定输入的input的格式
rate：使用atrous convolution的膨胀率
activation_fn：用于激活函数的指定，默认的为ReLU函数
normalizer_fn：用于指定正则化函数
normalizer_params：用于指定正则化函数的参数
weights_initializer：用于指定权重的初始化程序
weights_regularizer：为权重可选的正则化程序
biases_initializer：用于指定biase的初始化程序
biases_regularizer: biases可选的正则化程序
reuse：指定是否共享层或者和变量
variable_collections：指定所有变量的集合列表或者字典
outputs_collections：指定输出被添加的集合
trainable：卷积层的参数是否可被训练
scope：共享变量所指的variable_scope

1

# fc1
layer = tf.layers.dense(
    inputs=self.tf_obs,
    units=10,   # 输出个数
    activation=tf.nn.tanh,  # 激励函数
    kernel_initializer=tf.random_normal_initializer(mean=0, stddev=0.3),
    bias_initializer=tf.constant_initializer(0.1),
    name='fc1'
)
# fc2
all_act = tf.layers.dense(
    inputs=layer,
    units=self.n_actions,   # 输出个数
    activation=None,    # 之后再加 Softmax
    kernel_initializer=tf.random_normal_initializer(mean=0, stddev=0.3),
    bias_initializer=tf.constant_initializer(0.1),
    name='fc2'
)

self.all_act_prob = tf.nn.softmax(all_act, name='act_prob')  # 激励函数 softmax 出概率

2

更新固定网络参数

1 2	with tf.variable_scope('update_oldpi'): self.update_oldpi_op = [oldp.assign(p) for p, oldp in zip(pi_params, oldpi_params)]

待记录

https://github.com/balancap/SSD-Tensorflow/blob/master/datasets/pascalvoc_to_tfrecords.py