Interactive online version:

Named tensor notation with funsors (Part 1)¶

Introduction¶

Mathematical notation with named axes introduced in Named Tensor Notation (Chiang, Rush, Barak 2021) improves the readability of mathematical formulas involving multidimensional arrays. This includes tensor operations such as elementwise operations, reductions, contractions, renaming, indexing, and broadcasting. In this tutorial we translate examples from Named Tensor Notation into funsors to demonstrate the implementation of these operations in funsor library and familiarize readers with funsor syntax. Part 1 covers examples from 2 Informal Overview, 3.4.2 Advanced Indexing, and 5 Formal Definitions.

First, let’s import some dependencies.

[ ]:

!pip install funsor[torch]@git+https://github.com/pyro-ppl/funsor

[1]:

from torch import tensor

import funsor
import funsor.ops as ops
from funsor import Number, Tensor, Variable
from funsor.domains import Bint

funsor.set_backend("torch")

Named Tensors¶

Each tensor axis is given a name:

\[\begin{split}\begin{aligned} A &\in \mathbb{R}^{\mathsf{\vphantom{fg}height}[3] \times \mathsf{\vphantom{fg}width}[3]} = \mathbb{R}^{\mathsf{\vphantom{fg}width}[3] \times \mathsf{\vphantom{fg}height}[3]} \\ A &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \end{bmatrix}\end{array} = \mathsf{\vphantom{fg}width} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}height}\\\begin{bmatrix} 3 & 1 & 2 \\ 1 & 5 & 6 \\ 4 & 9 & 5 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]

[2]:

A = Tensor(tensor([[3, 1, 4], [1, 5, 9], [2, 6, 5]]))["height", "width"]

Access elements of \(A\) using named indices:

\[A_{\mathsf{\vphantom{fg}height}(1), \mathsf{\vphantom{fg}width}(3)} = A_{\mathsf{\vphantom{fg}width}(3), \mathsf{\vphantom{fg}height}(1)} = 4\]

[3]:

# A(height=0, width=2) =
A(width=2, height=0)

[3]:

Tensor(tensor(4))

Partial indexing:

\[\begin{split}\begin{aligned} A_{\mathsf{\vphantom{fg}height}(1)} &= \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} 3 & 1 & 4 \end{bmatrix}\end{array} & A_{\mathsf{\vphantom{fg}width}(3)} &= \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}height}\\ \begin{bmatrix} 4 & 9 & 5 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]

[4]:

A(height=0)

[4]:

Tensor(tensor([3, 1, 4]), {'width': Bint[3]})

[5]:

A(width=2)

[5]:

Tensor(tensor([4, 9, 5]), {'height': Bint[3]})

Named tensor operations¶

Elementwise operations and broadcasting¶

Elementwise operations:

\[\begin{split}\frac1{1+\exp(-A)} = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} \frac 1{1+\exp(-3)} & \frac 1{1+\exp(-1)} & \frac 1{1+\exp(-4)} \\[1ex] \frac 1{1+\exp(-1)} & \frac 1{1+\exp(-5)} & \frac 1{1+\exp(-9)} \\[1ex] \frac 1{1+\exp(-2)} & \frac 1{1+\exp(-6)} & \frac 1{1+\exp(-5)} \end{bmatrix}\end{array}.\end{split}\]

[6]:

# A.sigmoid() =
# ops.sigmoid(A) =
# 1 / (1 + ops.exp(-A)) =
1 / (1 + (-A).exp())

[6]:

Tensor(tensor([[0.9526, 0.7311, 0.9820],
               [0.7311, 0.9933, 0.9999],
               [0.8808, 0.9975, 0.9933]]), {'height': Bint[3], 'width': Bint[3]})

Tensors with different shapes are automatically broadcasted against each other before an operation is applied. Let

\[\begin{split}\begin{aligned} x &\in \mathbb{R}^{\mathsf{\vphantom{fg}height}[3]} & y &\in \mathbb{R}^{\mathsf{\vphantom{fg}width}[3]} \\ x &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\\ \begin{bmatrix} 2 \\ 7 \\ 1 \end{bmatrix}\end{array} & y &= \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 1 & 4 & 1 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]

[7]:

x = Tensor(tensor([2, 7, 1]))["height"]

y = Tensor(tensor([1, 4, 1]))["width"]

Binary addition operation:

\[\begin{split}\begin{aligned} A + x &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3+2 & 1+2 & 4+2 \\ 1+7 & 5+7 & 9+7 \\ 2+1 & 6+1 & 5+1 \end{bmatrix}\end{array} & A + y &= \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3+1 & 1+4 & 4+1 \\ 1+1 & 5+4 & 9+1 \\ 2+1 & 6+4 & 5+1 \end{bmatrix}\end{array}. \end{aligned}\end{split}\]

[8]:

# ops.add(A, x) =
A + x

[8]:

Tensor(tensor([[ 5,  3,  6],
               [ 8, 12, 16],
               [ 3,  7,  6]]), {'height': Bint[3], 'width': Bint[3]})

[9]:

# ops.add(A, y) =
A + y

[9]:

Tensor(tensor([[ 4,  5,  5],
               [ 2,  9, 10],
               [ 3, 10,  6]]), {'height': Bint[3], 'width': Bint[3]})

Binary multiplication operation:

\[\begin{split}A \odot x = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} 3\cdot2 & 1\cdot2 & 4\cdot2 \\ 1\cdot7 & 5\cdot7 & 9\cdot7 \\ 2\cdot1 & 6\cdot1 & 5\cdot1 \end{bmatrix}\end{array}\end{split}\]

[10]:

# ops.mul(A, x) =
A * x

[10]:

Tensor(tensor([[ 6,  2,  8],
               [ 7, 35, 63],
               [ 2,  6,  5]]), {'height': Bint[3], 'width': Bint[3]})

Binary maximum operation:

\[\begin{split}\max(A, y) = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\\begin{bmatrix} \max(3, 1) & \max(1, 4) & \max(4, 1) \\ \max(1, 1) & \max(5, 4) & \max(9, 1) \\ \max(2, 1) & \max(6, 4) & \max(5, 1) \end{bmatrix}\end{array}.\end{split}\]

[11]:

ops.max(A, y)

[11]:

Tensor(tensor([[3, 4, 4],
               [1, 5, 9],
               [2, 6, 5]]), {'height': Bint[3], 'width': Bint[3]})

Reductions¶

Named axes can be reduced over by calling the .reduce method and specifying the reduction operator and names of reduced axes. Note that reduction is defined only for operators that are associative and commutative.

\[\begin{split}\sum\limits_{\substack{\mathsf{\vphantom{fg}height}}} A = \sum_i A_{\mathsf{\vphantom{fg}height}(i)} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} 3+1+2 & 1+5+6 & 4+9+5 \end{bmatrix}\end{array}.\end{split}\]

[12]:

A.reduce(ops.add, "height")

[12]:

Tensor(tensor([ 6, 12, 18]), {'width': Bint[3]})

\[\begin{split}\sum\limits_{\substack{\mathsf{\vphantom{fg}width}}} A = \sum_j A_{\mathsf{\vphantom{fg}width}(j)} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}height}\\ \begin{bmatrix} 3+1+4 & 1+5+9 & 2+6+5 \end{bmatrix}\end{array}.\end{split}\]

[13]:

A.reduce(ops.add, "width")

[13]:

Tensor(tensor([ 8, 15, 13]), {'height': Bint[3]})

Reduction over multiple axes:

\[\begin{split}\sum\limits_{\substack{\mathsf{\vphantom{fg}height}\\ \mathsf{\vphantom{fg}width}}} A = \sum_i \sum_j A_{\mathsf{\vphantom{fg}height}(i),\mathsf{\vphantom{fg}width}(j)} = 3+1+4+1+5+9+2+6+5.\end{split}\]

[14]:

A.reduce(ops.add, {"height", "width"})

[14]:

Tensor(tensor(36))

Multiplication reduction:

\[\begin{split}\prod\limits_{\substack{\mathsf{\vphantom{fg}height}}} A = \prod_i A_{\mathsf{\vphantom{fg}height}(i)} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} 3\cdot1\cdot2 & 1\cdot5\cdot6 & 4\cdot9\cdot5 \end{bmatrix}\end{array}.\end{split}\]

[15]:

A.reduce(ops.mul, "height")

[15]:

Tensor(tensor([  6,  30, 180]), {'width': Bint[3]})

Max reduction:

\[\begin{split}\max\limits_{\substack{\mathsf{\vphantom{fg}height}}} A = \max \{A_{\mathsf{\vphantom{fg}height}(i)} \mid 1 \leq i \leq n\} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width}\\ \begin{bmatrix} \max(3, 1, 2) & \max(1, 5, 6) & \max(4, 9, 5) \end{bmatrix}\end{array}.\end{split}\]

[16]:

A.reduce(ops.max, "height")

[16]:

Tensor(tensor([3, 6, 9]), {'width': Bint[3]})

Contraction¶

Contraction operation can be written as elementwise multiplication followed by summation over an axis:

\[\begin{split}A \mathbin{\underset{\substack{\mathsf{\vphantom{fg}width}}}{\vphantom{fg}\odot}} y = \sum_j A_{\mathsf{\vphantom{fg}width}(j)} \, y_{\mathsf{\vphantom{fg}width}(j)} = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\\\begin{bmatrix} 3\cdot 1 + 1\cdot 4 + 4\cdot 1 \\ 1\cdot 1 + 5\cdot 4 + 9\cdot 1 \\ 2\cdot 1 + 6\cdot 4 + 5\cdot 1 \end{bmatrix}\end{array}.\end{split}\]

[17]:

(A * y).reduce(ops.add, "width")

[17]:

Tensor(tensor([11, 30, 31]), {'height': Bint[3]})

Some other operations from linear algebra:

\[x \mathbin{\underset{\substack{\mathsf{\vphantom{fg}height}}}{\vphantom{fg}\odot}} x = \sum_i x_{\mathsf{\vphantom{fg}height}(i)} \, x_{\mathsf{\vphantom{fg}height}(i)} \qquad \text{inner product}\]

[18]:

(x * x).reduce(ops.add, "height")

[18]:

Tensor(tensor(54))

\[[x \odot y]_{\mathsf{\vphantom{fg}height}(i), \mathsf{\vphantom{fg}width}(j)} = x_{\mathsf{\vphantom{fg}height}(i)} \, y_{\mathsf{\vphantom{fg}width}(j)} \qquad \text{outer product}\]

[19]:

x * y

[19]:

Tensor(tensor([[ 2,  8,  2],
               [ 7, 28,  7],
               [ 1,  4,  1]]), {'height': Bint[3], 'width': Bint[3]})

\[A \mathbin{\underset{\substack{\mathsf{\vphantom{fg}width}}}{\vphantom{fg}\odot}} y = \sum_i A_{\mathsf{\vphantom{fg}width}(i)} \, y_{\mathsf{\vphantom{fg}width}(i)} \qquad \text{matrix-vector product}\]

[20]:

(A * y).reduce(ops.add, "width")

[20]:

Tensor(tensor([11, 30, 31]), {'height': Bint[3]})

\[\begin{split}x \mathbin{\underset{\substack{\mathsf{\vphantom{fg}height}}}{\vphantom{fg}\odot}} A = \sum_i x_{\mathsf{\vphantom{fg}height}(i)} \, A_{\mathsf{\vphantom{fg}height}(i)} \qquad \text{vector-matrix product} \\\end{split}\]

[21]:

(x * A).reduce(ops.add, "height")

[21]:

Tensor(tensor([15, 43, 76]), {'width': Bint[3]})

\[A \mathbin{\underset{\substack{\mathsf{\vphantom{fg}width}}}{\vphantom{fg}\odot}} B = \sum_i A_{\mathsf{\vphantom{fg}width}(i)} \odot B_{\mathsf{\vphantom{fg}width}(i)} \qquad \text{matrix-matrix product}~(B \in \mathbb{R}^{\mathsf{\vphantom{fg}width}\times \mathsf{\vphantom{fg}width2}})\]

[22]:

B = Tensor(
    tensor([[3, 2, 5], [5, 4, 0], [8, 3, 6]]),
)["width", "width2"]

(A * B).reduce(ops.add, "width")

[22]:

Tensor(tensor([[ 46,  22,  39],
               [100,  49,  59],
               [ 76,  43,  40]]), {'height': Bint[3], 'width2': Bint[3]})

Contraction can be generalized to other binary and reduction operations:

\[\begin{split}\max_{\mathsf{\vphantom{fg}width}} (A + y) = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\\\begin{bmatrix} \max(3+1, 1+4, 4+1) \\ \max(1+1, 5+4, 9+1) \\ \max(2+1, 6+4, 5+1) \end{bmatrix}\end{array}.\end{split}\]

[23]:

(A + y).reduce(ops.max, "width")

[23]:

Tensor(tensor([ 5, 10, 10]), {'height': Bint[3]})

Renaming and reshaping¶

Renaming funsors is simple:

\[\begin{split}A_{\mathsf{\vphantom{fg}height}\rightarrow\mathsf{\vphantom{fg}height2}} = \mathsf{\vphantom{fg}height2} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width} \\\begin{bmatrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \\ \end{bmatrix}\end{array}.\end{split}\]

[24]:

# A(height=Variable("height2", Bint[3]))
A(height="height2")

[24]:

Tensor(tensor([[3, 1, 4],
               [1, 5, 9],
               [2, 6, 5]]), {'height2': Bint[3], 'width': Bint[3]})

\[\begin{split}A_{(\mathsf{\vphantom{fg}height},\mathsf{\vphantom{fg}width})\rightarrow\mathsf{\vphantom{fg}layer}} = \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}layer}\\ \begin{bmatrix} 3 & 1 & 4 & 1 & 5 & 9 & 2 & 6 & 5 \end{bmatrix}\end{array}\end{split}\]

[25]:

layer = Variable("layer", Bint[9])

A_layer = A(height=layer // Number(3, 4), width=layer % Number(3, 4))
A_layer

[25]:

Tensor(tensor([3, 1, 4, 1, 5, 9, 2, 6, 5]), {'layer': Bint[9]})

\[\begin{split}A_{\mathsf{\vphantom{fg}layer}\rightarrow(\mathsf{\vphantom{fg}height},\mathsf{\vphantom{fg}width})} = \mathsf{\vphantom{fg}height} \begin{array}[b]{@{}c@{}}\mathsf{\vphantom{fg}width} \\\begin{bmatrix} 3 & 1 & 4 \\ 1 & 5 & 9 \\ 2 & 6 & 5 \\ \end{bmatrix}\end{array}.\end{split}\]

[26]:

height = Variable("height", Bint[3])
width = Variable("width", Bint[3])

A_layer(layer=height * Number(3, 4) + width % Number(3, 4))

[26]:

Tensor(tensor([[3, 1, 4],
               [1, 5, 9],
               [2, 6, 5]]), {'height': Bint[3], 'width': Bint[3]})

Advanced indexing¶

All of advanced indexing can be achieved through name substitutions in funsors.

\[\begin{split}\mathop{\underset{\substack{\mathsf{\vphantom{fg}ax}}}{\vphantom{fg}\mathrm{index}}} \colon \mathbb{R}^{\mathsf{\vphantom{fg}ax}[n]} \times [n] \rightarrow \mathbb{R}\\ \mathop{\underset{\substack{\mathsf{\vphantom{fg}ax}}}{\vphantom{fg}\mathrm{index}}}(A, i) = A_{\mathsf{\vphantom{fg}ax}(i)}.\end{split}\]

\[\begin{split}\begin{aligned} E &\in \mathbb{R}^{\mathsf{\vphantom{fg}vocab}[n] \times \mathsf{\vphantom{fg}emb}} \\ i &\in [n] \\ I &\in [n]^{\mathsf{\vphantom{fg}seq}} \\ P &\in \mathbb{R}^{\mathsf{\vphantom{fg}seq}\times \mathsf{\vphantom{fg}vocab}[n]} \end{aligned}\end{split}\]

Partial indexing \(\mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(E,i)\):

[27]:

E = Tensor(
    tensor([[2, 1, 5], [3, 4, 2], [1, 3, 7], [1, 4, 3], [5, 9, 2]]),
)["vocab", "emb"]

E(vocab=2)

[27]:

Tensor(tensor([1, 3, 7]), {'emb': Bint[3]})

Integer array indexing \(\mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(E,I)\):

[28]:

I = Tensor(tensor([3, 2, 4, 0]), dtype=5)["seq"]

E(vocab=I)

[28]:

Tensor(tensor([[1, 4, 3],
               [1, 3, 7],
               [5, 9, 2],
               [2, 1, 5]]), {'seq': Bint[4], 'emb': Bint[3]})

Gather operation \(\mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(P,I)\):

[29]:

P = Tensor(
    tensor([[6, 2, 4, 2], [8, 2, 1, 3], [5, 5, 7, 0], [1, 3, 8, 2], [5, 9, 2, 3]]),
)["vocab", "seq"]

P(vocab=I)

[29]:

Tensor(tensor([1, 5, 2, 2]), {'seq': Bint[4]})

Indexing with two integer arrays:

\[\begin{split}\begin{aligned} |\mathsf{\vphantom{fg}seq}| &= m \\ I_1 &= [m]^\mathsf{\vphantom{fg}subseq}\\ I_2 &= [n]^\mathsf{\vphantom{fg}subseq}\\ S &= \mathop{\underset{\substack{\mathsf{\vphantom{fg}vocab}}}{\vphantom{fg}\mathrm{index}}}(\mathop{\underset{\substack{\mathsf{\vphantom{fg}seq}}}{\vphantom{fg}\mathrm{index}}}(P, I_1), I_2) \in \mathbb{R}^{\mathsf{\vphantom{fg}subseq}} \\ S_{\mathsf{\vphantom{fg}subseq}(i)} &= P_{\mathsf{\vphantom{fg}seq}(I_{\mathsf{\vphantom{fg}subseq}(i)}), \mathsf{\vphantom{fg}vocab}(I_{\mathsf{\vphantom{fg}subseq}(i)})}. \end{aligned}\end{split}\]

[30]:

I1 = Tensor(tensor([1, 2, 0]), dtype=4)["subseq"]
I2 = Tensor(tensor([3, 0, 4]), dtype=5)["subseq"]

P(seq=I1, vocab=I2)

[30]:

Tensor(tensor([3, 4, 5]), {'subseq': Bint[3]})