batchnorm
– Batch Normalization¶

aesara.tensor.nnet.batchnorm.
batch_normalization_train
(inputs, gamma, beta, axes='peractivation', epsilon=0.0001, running_average_factor=0.1, running_mean=None, running_var=None)[source]¶ Performs batch normalization of the given inputs, using the mean and variance of the inputs.
Parameters:  axes ('peractivation', 'spatial' or a tuple of ints) – The axes along which the input should be normalized.
'peractivation'
normalizes per activation and is equal toaxes=(0,)
.'spatial'
shares normalization factors across spatial dimensions (i.e., all dimensions past the second), which for 4D inputs would be equal toaxes=(0, 2, 3)
.  gamma (tensor) – Learnable scale factors. The shape must match the shape of
inputs
, except for the axes inaxes
. These axes should be set to 1 or be skipped altogether (such thatgamma.ndim == inputs.ndim  len(axes)
).  beta (tensor) – Learnable biases. Must match the tensor layout of
gamma
.  epsilon (float) – Epsilon value used in the batch normalization formula. Minimum allowed value is 1e5 (imposed by cuDNN).
 running_average_factor (float) – Factor for updating the values or
running_mean
andrunning_var
. If the factor is close to one, the running averages will update quickly, if the factor is close to zero it will update slowly.  running_mean (tensor or None) – Previous value of the running mean. If this is given, the new value
running_mean * (1  r_a_factor) + batch mean * r_a_factor
will be returned as one of the outputs of this function.running_mean
andrunning_var
should either both be given or both be None. The shape should match that ofgamma
andbeta
.  running_var (tensor or None) – Previous value of the running variance. If this is given, the new value
running_var * (1  r_a_factor) + (m / (m  1)) * batch var * r_a_factor
will be returned as one of the outputs of this function, wherem
is the product of lengths of the averagedover dimensions.running_mean
andrunning_var
should either both be given or both be None. The shape should match that ofgamma
andbeta
.
Returns:  out (tensor) – Batchnormalized inputs.
 mean (tensor) – Means of
inputs
across the normalization axes.  invstd (tensor) – Inverse standard deviations of
inputs
across the normalization axes.  new_running_mean (tensor) – New value of the running mean (only if both
running_mean
andrunning_var
were given).  new_running_var (tensor) – New value of the running variance (only if both
running_var
andrunning_mean
were given).
Notes
If peractivation or spatial normalization is selected, this operation will use the cuDNN implementation. (This requires cuDNN 5 or newer.)
The returned values are equivalent to:
# for peractivation normalization axes = (0,) # for spatial normalization axes = (0,) + tuple(range(2, inputs.ndim)) mean = inputs.mean(axes, keepdims=True) var = inputs.var(axes, keepdims=True) invstd = at.reciprocal(at.sqrt(var + epsilon)) out = (inputs  mean) * gamma * invstd + beta m = at.cast(ate.prod(inputs.shape) / at.prod(mean.shape), 'float32') running_mean = running_mean * (1  running_average_factor) + \ mean * running_average_factor running_var = running_var * (1  running_average_factor) + \ (m / (m  1)) * var * running_average_factor
 axes ('peractivation', 'spatial' or a tuple of ints) – The axes along which the input should be normalized.

aesara.tensor.nnet.batchnorm.
batch_normalization_test
(inputs, gamma, beta, mean, var, axes='peractivation', epsilon=0.0001)[source]¶ Performs batch normalization of the given inputs, using the given mean and variance.
Parameters:  axes ('peractivation', 'spatial' or a tuple of ints) – The axes along which the input should be normalized.
'peractivation'
normalizes per activation and is equal toaxes=(0,)
.'spatial'
shares normalization factors across spatial dimensions (i.e., all dimensions past the second), which for 4D inputs would be equal toaxes=(0, 2, 3)
.  gamma (tensor) – Scale factors. The shape must match the shape of
inputs
, except for the axes inaxes
. These axes should be set to 1 or be skipped altogether (such thatgamma.ndim == inputs.ndim  len(axes)
).  beta (tensor) – Biases. Must match the tensor layout of
gamma
.  mean (tensor) – Means. Usually these are running averages computed during training.
Must match the tensor layout of
gamma
.  var (tensor) – Variances. Usually these are running averages computed during training.
Must match the tensor layout of
gamma
.  epsilon (float) – Epsilon value used in the batch normalization formula. Minimum allowed value is 1e5 (imposed by cuDNN).
Returns: out – Batchnormalized inputs.
Return type: tensor
Notes
If peractivation or spatial normalization is selected, this operation will use the cuDNN implementation. (This requires cuDNN 5 or newer.)
The returned value is equivalent to:
# for peractivation normalization axes = (0,) # for spatial normalization axes = (0,) + tuple(range(2, inputs.ndim)) gamma, beta, mean, var = (at.addbroadcast(t, *axes) for t in (gamma, beta, mean, var)) out = (inputs  mean) * gamma / at.sqrt(var + epsilon) + beta
 axes ('peractivation', 'spatial' or a tuple of ints) – The axes along which the input should be normalized.
See also
cuDNN batch normalization: aesara.gpuarray.dnn.dnn_batch_normalization_train
, aesara.gpuarray.dnn.dnn_batch_normalization_test>
.

aesara.tensor.nnet.batchnorm.
batch_normalization
(inputs, gamma, beta, mean, std, mode='low_mem')[source]¶ This function will build the symbolic graph for applying batch normalization to a set of activations.
New in version 0.7.1.
Parameters:  inputs (symbolic tensor) – Minibatch of activations
 gamma (symbolic tensor) – BN scale parameter, must be of same dimensionality as inputs and broadcastable against it
 beta (symbolic tensor) – BN shift parameter, must be of same dimensionality as inputs and broadcastable against it
 mean (symbolic tensor) – inputs means, must be of same dimensionality as inputs and broadcastable against it
 std (symbolic tensor) – inputs standard deviation, must be of same dimensionality as inputs and broadcastable against it
 mode ('low_mem' or 'high_mem') – Specify which batch_normalization implementation that will be used. As no intermediate representations are stored for the backpropagation, ‘low_mem’ implementation lower the memory usage, however, it is 510% slower than ‘high_mem’ implementation. Note that 510% computation time difference compare the batch_normalization operation only, time difference between implementation is likely to be less important on the full model fprop/bprop.