× Note: This article has been migrated from my previous website without modification; some images might be missing or be disproportionately large.

GSoC Week 1 - Latent GP model and Covariance functions!

Week 1 with PyMC4!

This has been a really good week for me! A major pull request I proposed, implementing a basic Gaussian Process Interface on March 1, got merged onto master this June. It gave me the essence of the development environment I was working in and time to complete my tasks. It also helped me make some design changes for my next PRs.

This week I was able to do some doc fixes that I introduced in the former PRs and refactored the covariance class to introduce several new features. I summarize them in the below sections.

ENH: Add Basic Gaussian Process Interface

#235: ENH: Add Basic Gaussian Process Interface

It was nice to see this get merged in. This was a very basic proposal, hence, not matured enough to use. I dedicated this week to identify and fix all the bugs or flaws shipped onto master with this PR.

Specifically, I fixed all the documentation errors and rewrote the covariance function base class, making it easier to build on it and porting in new features for arbitrarily shaped tensors. I also added loads of tests and assure everything works on point.

I was successfully able to complete my goals I listed in the proposal for Week 1!

  • Week 1: June 1 - June 7
  • [x] Create a few mean and covariance functions with their base classes.
  • [x] Implement a fully functional Latent Gaussian Process Model and a base for other models.

ENH: add constant and white noise kernel, fix docs and tests

#272: ENH: add constant and white noise kernel, fix docs and tests

This PR implements all the new features I mentioned in the above section. Specifically, I rewrote the Covariance base class and added more covariance functions on that base. The new features include:

  • Changes in the Covariance base class:
  • add diag parameter: If true, only evaluates the diagonal of the full covariance matrix.
  • add active_dims parameter in the covariance function base class: A list of (list of) numbers of dimensions in each feature_ndims columns to operate on. If None, defaults to using all the dimensions of each feature_ndims column. If a single integer is present at i'th entry of the list, the leftmost len(active_dims[i]) are considered for evaluation and not the rightmost dims.
  • add scale_diag parameter: Scaling parameter of the lenght_scale parameter of stationary kernels for performing Automatic Relevence Detection (ARD). Ignored if keyword argument ARD=False.
  • Add Constant kernel: Constant kernel just evaluates to a constant value in each entry of the covariance matrix and point evaluations irrespective of the input. It is very useful as a lightweight kernel when speed and performance is a primary goal. It doesn’t evaluate a complex function and so its gradients are faster and easier to compute.
  • Add WhiteNoise kernel: This kernel adds some noise to the covariance functions and is mostly used to stabilize other PSD kernels. This helps them become non-singular and makes Cholesky decomposition possible for sampling from the MvNormalCholesky distribution with high numerical accuracy. It is recommended to use this kernel in combination with other covariance/kernel function when working with GP on large data.
  • Changes in the covariance combination class (_Sum and _Prod):
  • These are all private changes not visible to the user. Previously, nested function calls were made to evaluate the kernel which made it in-efficient to combine multiple covariance functions. I refactored the class to efficiently combine covariance functions.
  • Tests:
  • Add a test suite for mean and covariance functions.
  • Add a test suite for GP models implemented till now.
  • Benchmarking:
  • Benchmarking is not officially done by the PyMC team and so I made some unofficial (my own) benchmarks. They are available on this colab notebook or this notebook on github.

MAINT: add reparameterize to gp and fix mvnormal

#269: MAINT: add reparameterize to gp and fix mvnormal

This is a small PR adding the functionality to reparameterize the GP prior and conditional distributions for better numerical accuracy. I will work more on Gaussian Processes in Week 3. I want to finish up with the mean and covariance functions in Week 2 before I proceed to GP Models in Week 3.

Goals for Week 2

I want to increase some of my goals for week 2 and complete my work on as many covariance (kernel) functions and mean functions as much as I can. This way, I will be able to start working on more interesting GP Models by Week 3, or at least I hope so!

  • [x] Implement all the covariance functions.
  • [x] Exponential Quadratic kernel.
  • [x] Constant kernel.
  • [x] WhiteNoise kernel.
  • [x] Matern 1/2 kernel.
  • [x] Matern 3/2 kernel.
  • [x] Matern 5/2 kernel.
  • [x] Rational Quadratic kernel.
  • [x] Polynomial kernel.
  • [x] Linear kernel.
  • [x] Gibbs kernel.
  • [x] Coregion kernel.
  • [x] Kronecker kernel.
  • [x] Cosine kernel.
  • [x] Implement all the mean functions.
  • [x] Zero Mean function.
  • [x] Constant Mean function.
  • [x] Linear mean function.

Conclusion

This week has been a successful and enjoyable for me. I was also able to go for a small road trip after about a 3 month of quaruntine and still able to deliver my goals. Pretty good start for me, I guess!