Capsule Networks (CapsNet)

Capsnet are good at scene parsing (disentangling hierarchy) and viewpoint equivariance unlike CNNs.

Capsules

  • Each capsule is a vector of N dimensions that represents an entity in the input as a "coordinate frame".
  • Orientation of the capsule vector represents the properties of the entity, also called "instantiation parameters". The basis of capsule vector space represent each factor of this coordinate frame.
  • The length (norm) of the capsule vector represents existence probability of the entity.
    • But this can be problematic because if the coordinate frame is representing scaling, it should have a larger norm as well.
    • Alternatively, we can separate the notion of existence into a different parameter. This brings complications with optimization though.

Squashed Capsnet with Dynamic routing

  • Introduced in Dynamic routing between capsules, Frosst and Hinton, NeurIPS 2017
  • Only the transformation matrices are learned.

Agreement

  • Use cosine distance as the agreement measure.
  • To keep the norm i.e. length as proper probability, renormalize and squash them to be below 1.

Assignment (Dynamic Routing)

  • For each parent capsule:
    • From the set of child capsules, find the capsule with highest dot product (equivalent to cosine similarity for normalized vectors).
    • Sum the parent capsule and the child capsule.
    • Remove this child capsule from the set of child capsules.

Gaussian Capsnet with EM routing

  • Introduced in Matrix Capsules with EM routing, Frosst and Hinton, ICLR 2018
  • Separate the existence probability from the norm of the capsule vector into a separate scalar.
    • Capsules can then be thought of blobs where their position is determined by the vectors and spread by existence probabilities.
  • Similar to Gaussian Mixture Model > Expectation Maximization for Gaussian mixtures

Agreement (M step)

  • Use euclidean distance as measure of agreement
  • If the predicted coordinate frame have agreement, they form a cluster.
  • Fit a gaussian - center of gaussians is the average of the points and spread is

Assignment (E step)

  • Each parent capsule is initialized with the centers of the gaussian from M step.
  • For each parent capsule:
    • From the set of child capsules, find the capsule that has the maximum probability under the gaussian of parent capsule.
    • Add the parent gaussian with the child gaussian.
    • Remove the child capsule from the set of child capsules.
  • Gaussians with few children and large SD gets deactivated.

Stacked Capsule Autoencoders

  • Introduced by Stacked Capsule Autoencoders, Kosiorek et al. NeurIPS 2019
  • In the EM step, there's a loop to figure out centers of capsule gaussians.
  • Figure out parent of the capsules using an MLP.
  • Optimize mixture model log-likelihood.
  • Can be trained in an unsupervised way.

Issues with capsule networks

Insights and Possible improvements

  • The number of iterations in dynamic routing controls the sparsity of connections from lower capsules to higher level capsules. Zhao at al introduced a $\lambda$ parameter with softmax to adjust the sparse extent of connections without doing large number of iterations. Large value would act like max pooling but smaller values act like average pooling. They set it to 5 using cross validation.

References

  1. Introduction to Capsules by Sara Sabour https://www.youtube.com/watch?v=zRg3IuxaJ6I&t=607
  2. Kevin Duarte Dissertation Defense https://www.youtube.com/watch?v=BBbrzwLxJBI