Gradients

Contents

Gradients#

The module aesara.grad allows to compute the gradient of an Aesara graph.

aesara.gradient.grad(cost: Variable | None, wrt: Variable | Sequence[Variable], consider_constant: Sequence[Variable] | None = None, disconnected_inputs: Literal['ignore', 'warn', 'raise'] = 'raise', add_names: bool = True, known_grads: Mapping[Variable, Variable] | None = None, return_disconnected: Literal['none', 'zero', 'disconnected'] = 'zero', null_gradients: Literal['raise', 'return'] = 'raise') Variable | None | Sequence[Variable | None][source]#

Return symbolic gradients of one cost with respect to one or more variables.

For more information about how automatic differentiation works in Aesara, see gradient. For information on how to implement the gradient of a certain Op, see grad().

Parameters:
  • cost – Value that we are differentiating (i.e. for which we want the gradient). May be None if known_grads is provided.

  • wrt – The term(s) with respect to which we want gradients.

  • consider_constant – Expressions not to backpropagate through.

  • disconnected_inputs ({'ignore', 'warn', 'raise'}) –

    Defines the behaviour if some of the variables in wrt are not part of the computational graph computing cost (or if all links are non-differentiable). The possible values are:

    • 'ignore': considers that the gradient on these parameters is zero

    • 'warn': consider the gradient zero, and print a warning

    • 'raise': raise DisconnectedInputError

  • add_names – If True, variables generated by grad will be named (d<cost.name>/d<wrt.name>) provided that both cost and wrt have names.

  • known_grads – An ordered dictionary mapping variables to their gradients. This is useful in the case where you know the gradients of some variables but do not know the original cost.

  • return_disconnected

    • 'zero' : If wrt[i] is disconnected, return value i will be wrt[i].zeros_like()

    • 'none' : If wrt[i] is disconnected, return value i will be None

    • 'disconnected' : returns variables of type DisconnectedType

  • null_gradients

    Defines the behaviour when some of the variables in wrt have a null gradient. The possibles values are:

Returns:

  • A symbolic expression for the gradient of cost with respect to each

  • of the wrt terms. If an element of wrt is not differentiable with

  • respect to the output, then a zero variable is returned.

This section of the documentation is organized as follows: