Learning Mixtures of Linear Dynamical Systems

Introduction

Time series data is ubiquitous in most machine learning domains - financial market data, user behavior in shopping/entertainment apps, robotics, etc. Unfortunately, representing time-series data using raw observations leads to very high-dimensional, hard to deal with data.
So, learning low-dimensional or generally parsimonious representations of time-series data is regarded as an important problem. One reasonable assumption often made to achieve this is that the time series trajectories are generated by a fixed but unknown set of models. That is, each trajectory is generated under one of the models in this unknown set. This is known as a mixture model.
A popular class of time series models is a linear dynamical system - the next datapoint is a linear transformation of the previous datapoint, with some noise added. This is the “natural linear model” for time series, much like how linear regression works under the “natural linear model” for scalar data. The parameters of the linear transformation and the noise define a given linear dynamical system model.
This paper by Chen and Poor provides a principled method for clustering time series trajectories from linear dynamical systems based on whether they come from the same model, only using the raw trajectory data and without previous knowledge of the underlying set of models governing the dataset.
This is a hard chicken-and-egg problem - if one knew the clusters of trajectories coming from the same model, one could learn the underlying set of models. If one knew the set of models, one could determine clusters easily. Doing both without knowing either is a hard task solved by this paper, for which they essentially develop a principled dimensionality reduction method to reduce the dimension of the time series data before performing usual clustering.

What I did

Choosing the problem and communicating theoretical intuition: This was a team project - I chose this problem because of how mathematically and empirically rich it was. To start off, I explained the theoretical intuition to my teammates and familiarized everyone with the paper.
Developed repository for time-series clustering: The repository developed can now be used to perform time series clustering under a linear dynamical system model. I wrote about a third of the code.
Discovered discrepancies between theory and code: I found important discrepancies between theory and experiments (especially in regard to the numerical linear algebra choices made) and corresponded with the authors about it.
New principled heuristics for choosing hyperparameters: I designed some new theory-informed heuristics to determine hyperparameters like the number of models underlying the dataset, which is also a priori unknown in practice. We then implemented them and had satisfactory results, recovering the initial number of models chosen by the authors in their experiments.
Ablation study and comparison to alternatives: We performed an ablation study to test the performance of this method against using random projections for dimensionality reduction. These are known to perform well in some scenarios, but do much worse than Chen and Poor’s method in our case.
Inspired original published research: This also related to my own work on clustering time series data under a Markov model assumption, which was selected for a live oral presentation at ICML 2023, along with 4 other papers in our category.

Code

As mentioned above, the repository can be used to perform time series clustering under a linear dynamical system model. Link to the repository here. You can find a report on this project here.