composer.algorithms.factorize.factorize_core#
composer.algorithms.factorize.factorize_core
Functions
Approximates a \(K \times K\) convolution by factorizing it into a \(K \times K\) convolution with fewer channels followed by a \(1 \times 1\) convolution. |
|
Approximates a matrix by factorizing it into a product of two smaller matrices. |
Classes
Bundles tensors used by a factorized linear operator. |
Attributes
Optional
Tuple
Union
- class composer.algorithms.factorize.factorize_core.LowRankSolution(Wa=None, Wb=None, bias=None, rank=- 1, nmse=0)[source]#
Bundles tensors used by a factorized linear operator.
The factorization always splits the operator into two smaller linear operators. The first takes in input of the original shape and embeds it in a lower-dimensional space. The second maps this lower-dimensional space to the original output space.
- Parameters
Wa (Tensor, optional) โ First linear operation in the factorized approximation. For a factorized linear operation,
Wa
is a matrix. For a factorized convolution,Wa
matches the shape of the convolutionโs original weight parameter, except along the channel axis.Wb (Tensor, optional) โ Second linear operation in the factorized approximation. Shape is such that composing
Wb
withWb
yields an output of the same size as the original operation.bias (Tensor, optional) โ Vector added to the output of the second linear operation.
rank (int, optional) โ Output dimensionality (channels or features) of the first linear operation, and input dimensionality of the second input operation. Default:
-1
.nmse (float, optional) โ Normalized mean squared error obtained during the optimization procedure used to derive
Wa
,Wb
, andbias
. This is equal to the raw mean squared error between the factorized approximationโs output and the original output, divided by the variance of the original output. A value of 0 means no error was introduced, and a value of 1 corresponds to capturing the output no better than chance. Default:0.0
.
- composer.algorithms.factorize.factorize_core.factorize_conv2d(X, Wa, Wb=None, rank=0.25, biasA=None, biasB=None, n_iters=3, **conv2d_kwargs)[source]#
Approximates a \(K \times K\) convolution by factorizing it into a \(K \times K\) convolution with fewer channels followed by a \(1 \times 1\) convolution.
Given a convolutional weight tensor
W
for a 2d convolution of shape[out_channels, in_channels, k_h, k_w]
and a vectorbias
of lengthout_channels
, returns a triple(Wa, Wb, new_bias)
of tensors with shapes[rank, in_channels, k_h, k_w]
,[out_channels, rank, 1, 1]
, and[out_channels]
, respectively.Wa
,Wb
, andnew_bias
are chosen so as to minimize:\(||\)
(W * X + bias) - (Wb * (Wa * X) + new_bias)
\(||_F\),where \(*\) denotes convolution,
bias
broadcasts along all non-channel dimensions, and \(||\cdot||_F\) denotes the sum of squared elements.Similar to
factorize_matrix()
, this function allows passing in an already-factorized weight tensor in order to enable progressive factorization. In this case, the single tensorW
is replaced with a similar(Wa, Wb)
pair as the output, though not necessarily with the same rank.- Parameters
X (Tensor) โ A tensor of shape
[N, in_channels, H, W]
, for someN
,H
, andW
.Wa (Tensor) โ The first weight tensor to convolve with
X
. IfWb
is not provided, must be of shape[out_channels, in_channels, k_h, k_w]
. Otherwise, must be of shape[original_rank, in_channels, k_h, k_w]
for someoriginal_rank < min(in_channels, out_channels)
.Wb (Tensor, optional) โ The second weight tensor to convolve. with the input. If provided, must be of shape
[out_channels, original_rank, 1, 1]
.rank (int | float, optional) โ number of channels in the latent representation of
X
. Default:.25
.biasA (Tensor, optional) โ Optional vector of biases. If
Wb
isNone
, must have lengthout_channels
. Otherwise must have lengthoriginal_rank
.biasB (Tensor, optional) โ If provided, must have length
out_channels
.n_iters (int, optional) โ number of iterations used in the optimization process. Higher numbers yield lower mean squared error, though there are usually diminishing returns after a handful of iterations. Default:
3
.**conv2d_kwargs โ Arguments such as
padding
,stride
,dilation
,groups
, etc used in the original convolution. If these are not provided, the factorized tensors might not preserve the function computed by the original weight tensor as well. Note that not all combinations of arguments are supported.
- Returns
LowRankSolution โ A solution of rank
rank
that approximates the original convolution operation.- Raises
RuntimeError โ If
biasB
is provided but notWb
is not.NotImplementedError โ if
conv2d_kwargs['dilation'] != 1
orconv2d_kwargs['groups'] != 1
.
- composer.algorithms.factorize.factorize_core.factorize_matrix(X, Y, Wa, Wb=None, bias=None, rank=0.25, n_iters=3)[source]#
Approximates a matrix by factorizing it into a product of two smaller matrices.
Given a matrix
W
of shape[D, M]
, a bias vector of lengthM
, and a target rankrank < D
, returns a solution(Wa, Wb, new_bias)
of tensors of shapes[N, rank]
,[rank, D]
, andM
, respectively. These tensors are chosen so as to minimize:\(||\)
Y - (X @ Wa @ Wb + new_bias)
\(||_F\),where
Y = X @ W + bias
,@
denotes matrix multiplication,new_bias
broadcasts along the row dimension, and \(||\cdot||_F\) denotes the sum of squared elements. In the case that rows ofX
correspond to samples from some distribution, this amounts to minimizing the expected mean squared error in the output.The input matrix can either be a single matrix
W
or a pair of matrices(Wa, Wb)
. The latter case corresponds to using a matrixW = Wa @ Wb
that has already been factorized and is supported in order to facilitate progressively decreasing the rank of the matrix.- Parameters
X (Tensor) โ Input used to evaluate the quality of the approximation. Shape is
[N, D]
, whereN
is often the number of input samples andD
is the dimensionality of each sample.Y (Tensor) โ Output of applying the original matrix to
X
. Must have shape[N, M]
for someM
.Wa (Tensor) โ Either the matrix to be factorized, or the first of the two smaller matrices in the already-factorized representation of this matrix. Must be of shape
[D, M]
in the former case and shape[D, d]
in the latter, for somed < D
.Wb (Tensor, optional) โ If present,
Wa
is interpreted as the first of two smaller matrices, andWb
is taken to be the second. Must be of shape[d, M]
.bias (Tensor, optional) โ A vector added to the output after performing the matrix product with X.
rank (int | float, optional) โ the number of columns in the latent representation of X. Default:
.25
.n_iters (int, optional) โ number of iterations used in the optimization process. Higher numbers yield lower mean squared error, though there are usually diminishing returns after a handful of iterations. Default:
3
.
- Returns
LowRankSolution โ A solution of rank
rank
that approximates the original convolution operation.