gradient
– Symbolic Differentiation¶
Symbolic gradient is usually computed from gradient.grad()
, which offers a
more convenient syntax for the common case of wanting the gradient of some
scalar cost with respect to some input expressions. The grad_sources_inputs()
function does the underlying work, and is more flexible, but is also more
awkward to use when gradient.grad()
can do the job.
Gradient related functions¶
Driver for gradient calculations.

class
aesara.gradient.
ConsiderConstant
[source]¶ 
grad
(args, g_outs)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned
Variable
represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of typeNullType
for that input.Using the reversemode AD characterization given in [1]_, for a representing the function implemented by the
Op
and its two arguments and , given by theVariable
s ininputs
, the values returned byOp.grad
represent the quantities and , for some scalar output term of inParameters:  inputs – The input variables.
 output_grads – The gradients of the output variables.
Returns:  grads – The gradients with respect to each
Variable
ininputs
.  .. [1] Giles, Mike. 2008. “An Extended Collection of Matrix Derivative Results for Forward and Reverse Mode Automatic Differentiation.”


class
aesara.gradient.
DisconnectedGrad
[source]¶ 
R_op
(inputs, eval_points)[source]¶ Construct a graph for the Roperator.
This method is primarily used by
Rop
.Parameters:  inputs – The
Op
inputs.  eval_points – A
Variable
or list ofVariable
s with the same length as inputs. Each element ofeval_points
specifies the value of the corresponding input at the point where the Roperator is to be evaluated.
Return type: rval[i]
should beRop(f=f_i(inputs), wrt=inputs, eval_points=eval_points)
. inputs – The

grad
(args, g_outs)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned
Variable
represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of typeNullType
for that input.Using the reversemode AD characterization given in [1]_, for a representing the function implemented by the
Op
and its two arguments and , given by theVariable
s ininputs
, the values returned byOp.grad
represent the quantities and , for some scalar output term of inParameters:  inputs – The input variables.
 output_grads – The gradients of the output variables.
Returns:  grads – The gradients with respect to each
Variable
ininputs
.  .. [1] Giles, Mike. 2008. “An Extended Collection of Matrix Derivative Results for Forward and Reverse Mode Automatic Differentiation.”


exception
aesara.gradient.
DisconnectedInputError
[source]¶ Raised when grad is asked to compute the gradient with respect to a disconnected input and disconnected_inputs=’raise’.

class
aesara.gradient.
DisconnectedType
(*args, **kwds)[source]¶ A type indicating that a variable is the result of taking the gradient of
c
with respect tox
whenc
is not a function ofx
.It serves as a symbolic placeholder for
0
, but conveys the extra information that this gradient is0
because it is disconnected.
filter
(data, strict=False, allow_downcast=None)[source]¶ Return data or an appropriately wrapped/converted data.
Subclass implementations should raise a TypeError exception if the data is not of an acceptable type.
Parameters:  data (arraylike) – The data to be filtered/converted.
 strict (bool (optional)) – If
True
, the data returned must be the same as the data passed as an argument.  allow_downcast (bool (optional)) – If
strict
isFalse
, andallow_downcast
isTrue
, the data may be cast to an appropriate type. Ifallow_downcast
isFalse
, it may only be upcast and not lose precision. Ifallow_downcast
isNone
(default), the behaviour can be typedependent, but for now it means only Python floats can be downcasted, and only to floatX scalars.


class
aesara.gradient.
GradClip
(clip_lower_bound, clip_upper_bound)[source]¶ 
grad
(args, g_outs)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned
Variable
represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of typeNullType
for that input.Using the reversemode AD characterization given in [1]_, for a representing the function implemented by the
Op
and its two arguments and , given by theVariable
s ininputs
, the values returned byOp.grad
represent the quantities and , for some scalar output term of inParameters:  inputs – The input variables.
 output_grads – The gradients of the output variables.
Returns:  grads – The gradients with respect to each
Variable
ininputs
.  .. [1] Giles, Mike. 2008. “An Extended Collection of Matrix Derivative Results for Forward and Reverse Mode Automatic Differentiation.”


class
aesara.gradient.
GradScale
(multiplier)[source]¶ 
grad
(args, g_outs)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned
Variable
represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of typeNullType
for that input.Using the reversemode AD characterization given in [1]_, for a representing the function implemented by the
Op
and its two arguments and , given by theVariable
s ininputs
, the values returned byOp.grad
represent the quantities and , for some scalar output term of inParameters:  inputs – The input variables.
 output_grads – The gradients of the output variables.
Returns:  grads – The gradients with respect to each
Variable
ininputs
.  .. [1] Giles, Mike. 2008. “An Extended Collection of Matrix Derivative Results for Forward and Reverse Mode Automatic Differentiation.”


exception
aesara.gradient.
GradientError
(arg, err_pos, shape, val1, val2, abs_err, rel_err, abs_tol, rel_tol)[source]¶ This error is raised when a gradient is incorrectly calculated.

aesara.gradient.
Lop
(f: ~typing.Union[~aesara.graph.basic.Variable, ~typing.Sequence[~aesara.graph.basic.Variable]], wrt: ~typing.Union[~aesara.graph.basic.Variable, ~typing.Sequence[~aesara.graph.basic.Variable]], eval_points: ~typing.Union[~aesara.graph.basic.Variable, ~typing.Sequence[~aesara.graph.basic.Variable]], consider_constant: ~typing.Optional[~typing.Sequence[~aesara.graph.basic.Variable]] = None, disconnected_inputs: typing_extensions.Literal[ignore, warn, raise] = 'raise') Union[Variable, None, Sequence[Optional[Variable]]] [source]¶ Computes the Loperator applied to
f
with respect towrt
ateval_points
.Mathematically this stands for the Jacobian of
f
with respect towrt
left muliplied by theeval_points
.Parameters: Returns:  A symbolic expression satisfying
L_op[i] = sum_i (d f[i] / d wrt[j]) eval_point[i]
 where the indices in that expression are magic multidimensional
 indices that specify both the position within a list and all
 coordinates of the tensor elements.
 If
f
is a list/tuple, then return a list/tuple with the results.

aesara.gradient.
Rop
(f: ~typing.Union[~aesara.graph.basic.Variable, ~typing.Sequence[~aesara.graph.basic.Variable]], wrt: ~typing.Union[~aesara.graph.basic.Variable, ~typing.Sequence[~aesara.graph.basic.Variable]], eval_points: ~typing.Union[~aesara.graph.basic.Variable, ~typing.Sequence[~aesara.graph.basic.Variable]], disconnected_outputs: typing_extensions.Literal[ignore, warn, raise] = 'raise', return_disconnected: typing_extensions.Literal[none, zero, disconnected] = 'zero') Union[Variable, None, Sequence[Optional[Variable]]] [source]¶ Computes the Roperator applied to
f
with respect towrt
ateval_points
.Mathematically this stands for the Jacobian of
f
right multiplied by theeval_points
.Parameters:  f – The outputs of the computational graph to which the Roperator is applied.
 wrt – Variables for which the Roperator of
f
is computed.  eval_points – Points at which to evaluate each of the variables in
wrt
.  disconnected_outputs –
Defines the behaviour if some of the variables in
f
have no dependency on any of the variable inwrt
(or if all links are nondifferentiable). The possible values are:'ignore'
: considers that the gradient on these parameters is zero.'warn'
: consider the gradient zero, and print a warning.'raise'
: raiseDisconnectedInputError
.
 return_disconnected –
'zero'
: Ifwrt[i]
is disconnected, return valuei
will bewrt[i].zeros_like()
.'none'
: Ifwrt[i]
is disconnected, return valuei
will beNone
'disconnected'
: returns variables of typeDisconnectedType
Returns:  A symbolic expression such obeying
R_op[i] = sum_j (d f[i] / d wrt[j]) eval_point[j]
, where the indices in that expression are magic multidimensional
 indices that specify both the position within a list and all
 coordinates of the tensor elements.
 If
wrt
is a list/tuple, then return a list/tuple with the results.

class
aesara.gradient.
UndefinedGrad
[source]¶ 
R_op
(inputs, eval_points)[source]¶ Construct a graph for the Roperator.
This method is primarily used by
Rop
.Parameters:  inputs – The
Op
inputs.  eval_points – A
Variable
or list ofVariable
s with the same length as inputs. Each element ofeval_points
specifies the value of the corresponding input at the point where the Roperator is to be evaluated.
Return type: rval[i]
should beRop(f=f_i(inputs), wrt=inputs, eval_points=eval_points)
. inputs – The

grad
(args, g_outs)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned
Variable
represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of typeNullType
for that input.Using the reversemode AD characterization given in [1]_, for a representing the function implemented by the
Op
and its two arguments and , given by theVariable
s ininputs
, the values returned byOp.grad
represent the quantities and , for some scalar output term of inParameters:  inputs – The input variables.
 output_grads – The gradients of the output variables.
Returns:  grads – The gradients with respect to each
Variable
ininputs
.  .. [1] Giles, Mike. 2008. “An Extended Collection of Matrix Derivative Results for Forward and Reverse Mode Automatic Differentiation.”


class
aesara.gradient.
ZeroGrad
[source]¶ 
R_op
(inputs, eval_points)[source]¶ Construct a graph for the Roperator.
This method is primarily used by
Rop
.Parameters:  inputs – The
Op
inputs.  eval_points – A
Variable
or list ofVariable
s with the same length as inputs. Each element ofeval_points
specifies the value of the corresponding input at the point where the Roperator is to be evaluated.
Return type: rval[i]
should beRop(f=f_i(inputs), wrt=inputs, eval_points=eval_points)
. inputs – The

grad
(args, g_outs)[source]¶ Construct a graph for the gradient with respect to each input variable.
Each returned
Variable
represents the gradient with respect to that input computed based on the symbolic gradients with respect to each output. If the output is not differentiable with respect to an input, then this method should return an instance of typeNullType
for that input.Using the reversemode AD characterization given in [1]_, for a representing the function implemented by the
Op
and its two arguments and , given by theVariable
s ininputs
, the values returned byOp.grad
represent the quantities and , for some scalar output term of inParameters:  inputs – The input variables.
 output_grads – The gradients of the output variables.
Returns:  grads – The gradients with respect to each
Variable
ininputs
.  .. [1] Giles, Mike. 2008. “An Extended Collection of Matrix Derivative Results for Forward and Reverse Mode Automatic Differentiation.”


aesara.gradient.
as_list_or_tuple
(use_list: bool, use_tuple: bool, outputs: Union[V, Sequence[V]]) Union[V, List[V], Tuple[V, ...]] [source]¶ Return either a single object or a list/tuple of objects.
If
use_list
is True,outputs
is returned as a list (ifoutputs
is not a list or a tuple then it is converted in a one element list). Ifuse_tuple
is True,outputs
is returned as a tuple (ifoutputs
is not a list or a tuple then it is converted into a one element tuple). Otherwise (if both flags are false),outputs
is returned.

aesara.gradient.
consider_constant
(x)[source]¶ Consider an expression constant when computing gradients.
DEPRECATED: use
zero_grad
ordisconnected_grad
instead.The expression itself is unaffected, but when its gradient is computed, or the gradient of another expression that this expression is a subexpression of, it will not be backpropagated through. In other words, the gradient of the expression is truncated to 0.
Parameters: x – A Aesara expression whose gradient should be truncated. Returns: The expression is returned unmodified, but its gradient is now truncated to 0. New in version 0.7.

aesara.gradient.
disconnected_grad
(x)[source]¶ Consider an expression constant when computing gradients.
It will effectively not backpropagating through it.
The expression itself is unaffected, but when its gradient is computed, or the gradient of another expression that this expression is a subexpression of, it will not be backpropagated through. This is effectively equivalent to truncating the gradient expression to 0, but is executed faster than zero_grad(), which stilll has to go through the underlying computational graph related to the expression.
Parameters: x ( Variable
) – A Aesara expression whose gradient should not be backpropagated through.Returns: An expression equivalent to x
, with its gradient now effectively truncated to 0.Return type: Variable

aesara.gradient.
grad
(cost: ~typing.Optional[~aesara.graph.basic.Variable], wrt: ~typing.Union[~aesara.graph.basic.Variable, ~typing.Sequence[~aesara.graph.basic.Variable]], consider_constant: ~typing.Optional[~typing.Sequence[~aesara.graph.basic.Variable]] = None, disconnected_inputs: typing_extensions.Literal[ignore, warn, raise] = 'raise', add_names: bool = True, known_grads: ~typing.Optional[~typing.Mapping[~aesara.graph.basic.Variable, ~aesara.graph.basic.Variable]] = None, return_disconnected: typing_extensions.Literal[none, zero, disconnected] = 'zero', null_gradients: typing_extensions.Literal[raise, return] = 'raise') Union[Variable, None, Sequence[Optional[Variable]]] [source]¶ Return symbolic gradients of one cost with respect to one or more variables.
For more information about how automatic differentiation works in Aesara, see
gradient
. For information on how to implement the gradient of a certain Op, seegrad()
.Parameters:  cost – Value that we are differentiating (i.e. for which we want the
gradient). May be
None
ifknown_grads
is provided.  wrt – The term(s) with respect to which we want gradients.
 consider_constant – Expressions not to backpropagate through.
 disconnected_inputs ({'ignore', 'warn', 'raise'}) –
Defines the behaviour if some of the variables in
wrt
are not part of the computational graph computingcost
(or if all links are nondifferentiable). The possible values are:'ignore'
: considers that the gradient on these parameters is zero'warn'
: consider the gradient zero, and print a warning'raise'
: raiseDisconnectedInputError
 add_names – If
True
, variables generated bygrad
will be named(d<cost.name>/d<wrt.name>)
provided that bothcost
andwrt
have names.  known_grads – An ordered dictionary mapping variables to their gradients. This is useful in the case where you know the gradients of some variables but do not know the original cost.
 return_disconnected –
'zero'
: Ifwrt[i]
is disconnected, return valuei
will bewrt[i].zeros_like()
'none'
: Ifwrt[i]
is disconnected, return valuei
will beNone
'disconnected'
: returns variables of typeDisconnectedType
 null_gradients –
Defines the behaviour when some of the variables in
wrt
have a null gradient. The possibles values are:'raise'
: raise aNullTypeGradError
exception'return'
: return the null gradients
Returns:  A symbolic expression for the gradient of
cost
with respect to each  of the
wrt
terms. If an element ofwrt
is not differentiable with  respect to the output, then a zero variable is returned.
 cost – Value that we are differentiating (i.e. for which we want the
gradient). May be

aesara.gradient.
grad_clip
(x, lower_bound, upper_bound)[source]¶ This op do a view in the forward, but clip the gradient.
This is an elemwise operation.
Parameters:  x – The variable we want its gradient inputs clipped
 lower_bound – The lower bound of the gradient value
 upper_bound – The upper bound of the gradient value.
Examples
>>> x = aesara.tensor.type.scalar() >>> z = aesara.gradient.grad(grad_clip(x, 1, 1)**2, x) >>> z2 = aesara.gradient.grad(x**2, x) >>> f = aesara.function([x], outputs = [z, z2]) >>> print(f(2.0)) [array(1.0), array(4.0)]
Notes
We register an opt in tensor/opt.py that remove the GradClip. So it have 0 cost in the forward and only do work in the grad.

aesara.gradient.
grad_not_implemented
(op, x_pos, x, comment='')[source]¶ Return an uncomputable symbolic variable of type
x.type
.If any call to
grad
results in an expression containing this uncomputable variable, an exception (e.g.NotImplementedError
) will be raised indicating that the gradient on thex_pos
’th input ofop
has not been implemented. Likewise if any call to aesara.function involves this variable.Optionally adds a comment to the exception explaining why this gradient is not implemented.

aesara.gradient.
grad_scale
(x, multiplier)[source]¶ This op scale or inverse the gradient in the backpropagation.
Parameters:  x – The variable we want its gradient inputs scale
 multiplier – Scale of the gradient
Examples
>>> x = aesara.tensor.fscalar() >>> fx = aesara.tensor.sin(x) >>> fp = aesara.grad(fx, wrt=x) >>> fprime = aesara.function([x], fp) >>> print(fprime(2)) 0.416... >>> f_inverse=grad_scale(fx, 1.) >>> fpp = aesara.grad(f_inverse, wrt=x) >>> fpprime = aesara.function([x], fpp) >>> print(fpprime(2)) 0.416...

aesara.gradient.
grad_undefined
(op, x_pos, x, comment='')[source]¶ Return an uncomputable symbolic variable of type
x.type
.If any call to
grad
results in an expression containing this uncomputable variable, an exception (e.g.GradUndefinedError
) will be raised indicating that the gradient on thex_pos
’th input ofop
is mathematically undefined. Likewise if any call to aesara.function involves this variable.Optionally adds a comment to the exception explaining why this gradient is not defined.

aesara.gradient.
hessian
(cost, wrt, consider_constant=None, disconnected_inputs='raise')[source]¶ Parameters:  cost (Scalar (0dimensional) variable.) –
 wrt (Vector (1dimensional tensor) 'Variable' or list of) –
 Variables (vectors (1dimensional tensors)) –
 consider_constant – a list of expressions not to backpropagate through
 disconnected_inputs (string) –
Defines the behaviour if some of the variables in
wrt
are not part of the computational graph computingcost
(or if all links are nondifferentiable). The possible values are: ’ignore’: considers that the gradient on these parameters is zero.
 ’warn’: consider the gradient zero, and print a warning.
 ’raise’: raise an exception.
Returns: The Hessian of the
cost
with respect to (elements of)wrt
. If an element ofwrt
is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type aswrt
: a list/tuple or TensorVariable in all cases.Return type: Variable
or list/tuple of Variables

aesara.gradient.
jacobian
(expression, wrt, consider_constant=None, disconnected_inputs='raise')[source]¶ Compute the full Jacobian, row by row.
Parameters:  expression (Vector (1dimensional)
Variable
) – Values that we are differentiating (that we want the Jacobian of)  wrt (
Variable
or list of Variables) – Term[s] with respect to which we compute the Jacobian  consider_constant (list of variables) – Expressions not to backpropagate through
 disconnected_inputs (string) –
Defines the behaviour if some of the variables in
wrt
are not part of the computational graph computingcost
(or if all links are nondifferentiable). The possible values are: ’ignore’: considers that the gradient on these parameters is zero.
 ’warn’: consider the gradient zero, and print a warning.
 ’raise’: raise an exception.
Returns: The Jacobian of
expression
with respect to (elements of)wrt
. If an element ofwrt
is not differentiable with respect to the output, then a zero variable is returned. The return value is of same type aswrt
: a list/tuple or TensorVariable in all cases.Return type: Variable
or list/tuple of Variables (depending uponwrt
) expression (Vector (1dimensional)

class
aesara.gradient.
numeric_grad
(f, pt, eps=None, out_type=None)[source]¶ Compute the numeric derivative of a scalarvalued function at a particular point.

static
abs_rel_err
(a, b)[source]¶ Return absolute and relative error between a and b.
The relative error is a small number when a and b are close, relative to how big they are.
 Formulas used:
abs_err = abs(a  b)
rel_err = abs_err / max(abs(a) + abs(b), 1e8)
The denominator is clipped at 1e8 to avoid dividing by 0 when a and b are both close to 0.
The tuple (abs_err, rel_err) is returned

abs_rel_errors
(g_pt)[source]¶ Return the abs and rel error of gradient estimate
g_pt
g_pt
must be a list of ndarrays of the same length as self.gf, otherwise a ValueError is raised.Corresponding ndarrays in
g_pt
andself.gf
must have the same shape or ValueError is raised.

max_err
(g_pt, abs_tol, rel_tol)[source]¶ Find the biggest error between g_pt and self.gf.
What is measured is the violation of relative and absolute errors, wrt the provided tolerances (abs_tol, rel_tol). A value > 1 means both tolerances are exceeded.
Return the argmax of min(abs_err / abs_tol, rel_err / rel_tol) over g_pt, as well as abs_err and rel_err at this point.

static

aesara.gradient.
subgraph_grad
(wrt, end, start=None, cost=None, details=False)[source]¶ With respect to
wrt
, computes gradients of cost and/or from existingstart
gradients, up to theend
variables of a symbolic digraph. In other words, computes gradients for a subgraph of the symbolic aesara function. Ignores all disconnected inputs.This can be useful when one needs to perform the gradient descent iteratively (e.g. one layer at a time in an MLP), or when a particular operation is not differentiable in aesara (e.g. stochastic sampling from a multinomial). In the latter case, the gradient of the nondifferentiable process could be approximated by userdefined formula, which could be calculated using the gradients of a cost with respect to samples (0s and 1s). These gradients are obtained by performing a subgraph_grad from the
cost
or previously known gradients (start
) up to the outputs of the stochastic process (end
). A dictionary mapping gradients obtained from the userdefined differentiation of the process, to variables, could then be fed into another subgraph_grad asstart
with any othercost
(e.g. weight decay).In an MLP, we could use subgraph_grad to iteratively backpropagate:
x, t = aesara.tensor.fvector('x'), aesara.tensor.fvector('t') w1 = aesara.shared(np.random.standard_normal((3,4))) w2 = aesara.shared(np.random.standard_normal((4,2))) a1 = aesara.tensor.tanh(aesara.tensor.dot(x,w1)) a2 = aesara.tensor.tanh(aesara.tensor.dot(a1,w2)) cost2 = aesara.tensor.sqr(a2  t).sum() cost2 += aesara.tensor.sqr(w2.sum()) cost1 = aesara.tensor.sqr(w1.sum()) params = [[w2],[w1]] costs = [cost2,cost1] grad_ends = [[a1], [x]] next_grad = None param_grads = [] for i in range(2): param_grad, next_grad = aesara.subgraph_grad( wrt=params[i], end=grad_ends[i], start=next_grad, cost=costs[i] ) next_grad = dict(zip(grad_ends[i], next_grad)) param_grads.extend(param_grad)
Parameters:  wrt (list of variables) – Gradients are computed with respect to
wrt
.  end (list of variables) – Aesara variables at which to end gradient descent (they are considered constant in aesara.grad). For convenience, the gradients with respect to these variables are also returned.
 start (dictionary of variables) – If not None, a dictionary mapping variables to their
gradients. This is useful when the gradient on some variables
are known. These are used to compute the gradients backwards up
to the variables in
end
(they are used as known_grad in aesara.grad).  cost (
Variable
scalar (0dimensional) variable) –Additional costs for which to compute the gradients. For example, these could be weight decay, an l1 constraint, MSE, NLL, etc. May optionally be None if start is provided.
Warning
If the gradients of
cost
with respect to any of thestart
variables is already part of thestart
dictionary, then it may be counted twice with respect towrt
andend
.  details (bool) – When True, additionally returns the list of gradients from
start
and ofcost
, respectively, with respect towrt
(notend
).
Returns: Returns lists of gradients with respect to
wrt
andend
, respectively.Return type: Tuple of 2 or 4 Lists of Variables
New in version 0.7.
 wrt (list of variables) – Gradients are computed with respect to

aesara.gradient.
undefined_grad
(x)[source]¶ Consider the gradient of this variable undefined.
This will generate an error message if its gradient is taken.
The expression itself is unaffected, but when its gradient is computed, or the gradient of another expression that this expression is a subexpression of, an error message will be generated specifying such gradient is not defined.
Parameters: x ( Variable
) – A Aesara expression whose gradient should be undefined.Returns: An expression equivalent to x
, with its gradient undefined.Return type: Variable

aesara.gradient.
verify_grad
(fun: Callable, pt: List[ndarray], n_tests: int = 2, rng: Optional[Union[Generator, RandomState]] = None, eps: Optional[float] = None, out_type: Optional[str] = None, abs_tol: Optional[float] = None, rel_tol: Optional[float] = None, mode: Optional[Union[Mode, str]] = None, cast_to_output_type: bool = False, no_debug_ref: bool = True)[source]¶ Test a gradient by Finite Difference Method. Raise error on failure.
Raises an Exception if the difference between the analytic gradient and numerical gradient (computed through the Finite Difference Method) of a random projection of the fun’s output to a scalar exceeds the given tolerance.
Examples
>>> verify_grad(aesara.tensor.tanh, ... (np.asarray([[2, 3, 4], [1, 3.3, 9.9]]),), ... rng=np.random.default_rng(23098))
Parameters:  fun –
fun
takes Aesara variables as inputs, and returns an Aesara variable. For instance, anOp
instance with a single output.  pt – Input values, points where the gradient is estimated. These arrays must be either float16, float32, or float64 arrays.
 n_tests – Number o to run the test.
 rng – Random number generator used to sample the output random projection
u
, we test gradient ofsum(u * fun)
atpt
.  eps – Step size used in the Finite Difference Method (Default
None
is typedependent). Raising the value ofeps
can raise or lower the absolute and relative errors of the verification depending on theOp
. Raisingeps
does not lower the verification quality for linear operations. It is better to raiseeps
than raisingabs_tol
orrel_tol
.  out_type – Dtype of output, if complex (i.e.,
'complex32'
or'complex64'
)  abs_tol – Absolute tolerance used as threshold for gradient comparison
 rel_tol – Relative tolerance used as threshold for gradient comparison
 cast_to_output_type – If the output is float32 and
cast_to_output_type
isTrue
, cast the random projection to float32; otherwise, it is float64. float16 is not handled here.  no_debug_ref – Don’t use
DebugMode
for the numerical gradient function.
Notes
This function does not support multiple outputs. In
tests.scan.test_basic
there is an experimentalverify_grad
that covers that case as well by using random projections. fun –

aesara.gradient.
zero_grad
(x)[source]¶ Consider an expression constant when computing gradients.
The expression itself is unaffected, but when its gradient is computed, or the gradient of another expression that this expression is a subexpression of, it will be backpropagated through with a value of zero. In other words, the gradient of the expression is truncated to 0.
Parameters: x ( Variable
) – A Aesara expression whose gradient should be truncated.Returns: An expression equivalent to x
, with its gradient truncated to 0.Return type: Variable
List of Implemented R op¶
See the gradient tutorial for the R op documentation.
 list of ops that support Rop:
 with test
 SpecifyShape
 MaxAndArgmax
 Subtensor
 IncSubtensor set_subtensor too
 Alloc
 Dot
 Elemwise
 Sum
 Softmax
 Shape
 Join
 Rebroadcast
 Reshape
 DimShuffle
 Scan [In tests/scan/test_basic.test_rop]
 without test
 Split
 ARange
 ScalarFromTensor
 AdvancedSubtensor1
 AdvancedIncSubtensor1
 AdvancedIncSubtensor
Partial list of ops without support for Rop:
 All sparse ops
 All linear algebra ops.
 PermuteRowElements
 AdvancedSubtensor
 TensorDot
 Outer
 Prod
 MulwithoutZeros
 ProdWithoutZeros
 CAReduce(for max,… done for MaxAndArgmax op)
 MaxAndArgmax(only for matrix on axis 0 or 1)