× Note: This article has been migrated from my previous website without modification; some images might be missing or be disproportionately large.

GSoC‘20 Final Report

Major Pull Requests

  • pymc-devs/pymc4#235: Implemented the basic API structure of Gaussian Processes. Implemented Latent Gaussian Process Model. Created a notebook explaining them.
  • pymc-devs/pymc4#272: Implemented/Refactored the covariance functions API for Gaussian Processes. Introduced multiple new features on top of Tensorflow Probability’s PSD API.
  • pymc-devs/pymc4#285: Implemented 16 new covariance functions. Created a notebook explaining each of them.
  • pymc-devs/pymc4#309: Implemented Marginal Gaussian Process. Created a tutorial notebook for it.

The above PRs are core to my GSoC project. I started with pymc-devs/pymc4#235 before the official coding round. This PR proposed a basic API for performing GP Modelling in PyMC4. I also implemented a Latent GP model on top of it. It closely follows PyMC3’s GP API.

During the first few weeks, I worked on refactoring the Covariance/Kernel functions API in pymc-devs/pymc4#272. This PR introduced multiple features on top of Tensorflow Probability’s PSD kernels API.

By the end of Phase 1 and the commencement of Phase 2, I proposed pymc-devs/pymc4#285. This PR implemented 8 stationary, 2 periodic, 5 non-stationary, and 2 special kernel functions. I also implemented a huge notebook explaining each of these kernel functions with the help of Bill Engels and Alex Andorra. It was completed and merged by the end of the second phase.

During Phase 3, I started working on the Marginal GP Model in pymc-devs/pymc4#309. I also implemented a notebook explaining the Marginal GP Model using the GP-LVM example (Gaussian Process Latent Variable Model) and got good results with the Variational Inference API.

Other PRs

Blogs Written

Blog Page : https://tirthasheshpatel.github.io/gsoc2020/

Tutorials Written

Some things I noticed

  • GPs work best with Variational Inference.
  • Always use float64 datatype!
  • Sampling fails on large datasets and large models! (probably because tensorflow probability doesn’t do mass matrix adaptation)
  • Marginal GP is hard to infer using sampling…

Goals accomplished (as per proposal)

What’s implemented

  • Constant Kernel
  • White Noise Kernel
  • Exponential Quadratic Kernel
  • Rational Quadratic Kernel
  • Matern 1/2 Kernel
  • Matern 3/2 Kernel
  • Matern 5/2 Kernel
  • Linear Kernel
  • Polynomial Kernel
  • Exponential Kernel
  • Exponential Sine Squared Kernel
  • Scaled Covariance Kernel
  • Gibbs Kernel
  • Warped Input Kernel
  • Additive Kernels
  • Multiplicative Kernels
  • Docs and Tests for all the Kernels
  • Notebook explaining all the Kernel functions
  • Latent GP Model
  • Latent GP example notebook
  • Marginal GP Model
  • Marginal GP example notebook
  • Docs and Tests for GP Models

What’s left

  • Kronecker Kernels
  • ARD API for Kernel functions
  • Co-region Kernels (for Multi-Output GPs)
  • Student’s T Process (WIP)
  • Sparse Marginal GP
  • Kronecker GPs
  • Some more GP examples present in PyMC3

Some Potential Post GSoC Projects

  • Multi-Output GPs
  • Bayes Optimization Example Notebook
  • Black Box Matrix Multiplication GP