Department of Mathematical Sciences

Carnegie Mellon University

7126 Wean Hall

Pittsburgh, PA 15213

Tel: (412) 268 5617

Email: mthorpe "at" andrew "dot" cmu "dot" edu

** I have recently moved to the University of Cambridge as a Research Fellow in the Department of Applied Mathematics and Theoretical Physics. My new webpage can be found here.**

I joined CMU in September 2015 as a postdoctoral associate to work on optimal transportation problems with Dejan Slepcev and Gustavo Rohde as part of an NSF funded project (grant number 1421502).

Before CMU I completed my PhD in the maths and statistics doctoral training centre (MASDOC) at the University of Warwick under the supervision of Florian Theil and Adam Johansen. My PhD focussed on applying variational methods to statistical inference problems. In particular I am interested in statistical inference problems with a combinatorial structure.

I am teaching 21-470 Sepected Topics in Analysis: Introduction to Mathematical Biology in the Fall 2016. For course updates, lecture notes and homeworks please see the backboard course page. The syllabus can be found here.

**[10]** M. Thorpe and D. Slepcev, *Analysis of p-Laplacian Regularization in Semi-Supervised Learning*, 2017. Arxiv.

**[9]** M. Thorpe and D. Slepcev, *Transportation L ^{p} Distances: Properties and Extensions*, 2017. Preprint available here.

**[8]** M. Thorpe, S. Park, S. Kolouri, G. K. Rohde and D. Slepcev, *A Transportation L ^{p} Distance for Signal Analysis*, Journal of Mathematical Imaging and Vision, 2017. Article available here, preprint avalable here.

**[7]** M. Thorpe and A. Johansen, *Weak Convergence of General Smoothing Splines*, the Annals of the Institute of Statistical Mathematics, 2017. Article available here, preprint available here.

**[6]** M. Thorpe and F. Theil, *Asymptotic Analysis of the Ginzburg-Landau Functional on Point Clouds*, to appear in the Proceedings of the Royal Society of Edinburgh Section A: Mathematics, 2017. Arxiv.

**[5]** S. Kolouri, S. Park, M. Thorpe, D. Slepcev and G. K. Rohde, *Optimal Mass Transport: Signal processing and machine-learning applications*, IEEE Signal Processing Magazine, 34(4):43-59, 2017. Article available here, preprint available here.

**[4]** M. Thorpe and A. Johansen, *Convergence and Rates for Fixed-Interval Multiple-Track Smoothing Using k-Means Type Optimization*, Electronic Journal of Statistics, 10(2):3693-3722, 2016. Article available here, preprint available here.

**[3]** M. Thorpe, *Variational Methods for Geometric Statistical Inference*, 2015. PhD thesis available here.

**[2]** M. Thorpe, F. Theil, A. Johansen and N. Cade, *Convergence of the k-Means Minimization Problem Using Gamma-Convergence*, SIAM Journal on Applied Mathematics, 75(6):2444-2474, 2015. Article available here or here, prepint available here.

**[1]** A. Gkiokas, A. Cristea and M. Thorpe, *Self-reinforced Meta Learning for Belief Generation*, Research and Development in Intelligent Systems XXXI, Springer International Publishing, pp. 185-190, 2014. Article available here.

My research interests lie within applied mathematics covering analysis, statistics and probability. I am particularly interested in (1) the asymptotic behaviour of variational problems that arise within statistics and (2) developing optimal transport type distances. For (1) I am typically interested in unlabelled ill-posed inverse problems. That is where one wishes to make estimates of multiple infinite dimensional objects from a finite, but potentially large, unlabelled data set. In (2) I develop and analyse optimal transport distances, similar to the Wasserstein or earth mover distance, for applications in signal and image analysis. A more thorough description on my work is outlined below.

Many problems within statistics can be posed as minimization problems, for example maximum likelihood or maximum-a-posterior estimates. As such one can expect variational methods to be a valuable tool. The power of variational methods is their ability to deal with high frequency oscillations. For example, the smoothing spline problem, which is the problem of fitting a smooth curve through data, can be written as the 'best' curve (in some suitable space) that fits the data. In the non-parametric setting this is an ill-posed minimization problem which requires regularization. A common method is to penalise the *L ^{p}* norm of the

In general variational methods define an estimator to be the minimizer of an energy. Common examples of energies are the the negative log posterior in a Bayesian framework or the likelihood in a frequentist framework. One could also design custom made energies that are designed with certain behaviours in mind.

My own research in this area has focussed on establishing the convergence of an estimator for fitting multiple curves through unlabelled data. A data association hypothesis labels the data. Given the correct data association hypothesis the problem reduces to independently solving multiple smoothing spline problems. In general, one does not know the correct data association hypothesis which leads to a coupled data association and smoothing spline problem. Furthermore the number of possible data association hypothesis grows combinatorially with the number of data points.

The estimator we studied was a generalisation of the *k*-means method. In Euclidean spaces data is labelled by the closest 'cluster centre'. The objective is to optimally place the cluster centres. One defines an energy

where *{x _{i}}* is the data and

Our estimator in the data association - smoothing spline problem was to treat the multiple unknown curves as the cluster centres and define a cost function by *c(u,x) = *min* _{j=1,..., k} |u_{j}(t)-y|^{2}* where

where *R* is some regularization. Our results concerned the convergence of minimizers *û* of *E* with the number of data points [2] and the associated rates [4].

We pose our convergence theory in a more general setting that can include more complex relationships between data points and cluster centres. Our convergence theorem holds when *c* is lower semi-continuous and *R* is lower semi-continuous and coercive. For example, this would include data in time-range-angle space (e.g. data from radar).

A side project that emerged from this work was to show that smoothing splines converge weakly [7], to our knowledge previous convergence results used strong convergence in weaker topologies. We made use of variational methods in our proofs.

My collaborators on this project are Neil Cade (Selex ES Ltd.), Adam Johansen (Warwick) and Florian Theil (Warwick).

Graphical modelling is one way to develop a 'data driven classification method'. Given a data set *{x _{i}} ⊂ X* and a measure of dissimilarity

An important question that has been driven by applications in machine learning is how to partition the data set. This is a classification problem; if there are 2 classes then the problem is to find the labels *u:{x _{i}} → {0,1}* that define the classes

Restricting the labels to binary values is known as hard classification. In this case *u(x _{i})∈ {0,1}*. The soft relaxation is to allow

My research has concerned estimating the boundaries of the classes. In this case the classification should approximate a hard classification. The approximation is meant in the sense that for *0<ε ≪ 1* the labels satisfy *u(x _{i}) ∈ [0,ε] ∪ [1-ε,1]*. The implication being that whilst we allow for uncertainty (particularly on the boundaries between classes) each data point should satisfy

A popular phase-transition model from material science is the Ginzburg-Landau functional. In [6] we define an analogue of the Ginzburg-Landau functional on graphs, minimizers of which we use as estimators to the classification problem. There are two terms, one represents data fidelity and the other penalises soft classifications:

we briefly explain these terms below.

We interpret data fidelity as being faithful to the graph geometry. Therefore, we want nodes that are connected by an edge to be similar, if two nodes *x _{i},x_{j}* are connected than we expect

The second term penalises classifications that are not close to *{0,1}*. The motivation being that, rather than estimate the classification at each node, one really wants to estimate the class boundary. Therefore we include the penalisation term which prefers hard decisions by choosing *V* to be, for example, *V(x) = x ^{2}(1-x)^{2}*.

Our contribution [6] was to show that as the number of data points increases the estimators converge to a function defined on a continuous domain (note that the class estimators for finite data sets are defined on a graph which is a discrete domain). Furthermore the limiting classifier is a hard classifier, i.e. it takes only the values *{0,1}*, and minimizes a weighted total variation over all hard classifiers. Our results allowed for anisotropic weights *W _{ij}*.

My collaborators on this project are Dejan Slepcev (CMU) and Florian Theil (Warwick).

The optimal transport theory, dating back to Monge in the 1700's, is extremely widely used with applications in astronomy, computer vision and graphics, fluid mechanics, machine learning, operational research and many more as well in traditional mathematical subjects such as partial differential equations, fluid mechanics, geometry, probability theory and functional analysis. My motivation for the subject stems from applied uses in signal and image processing. The Lagrangian nature makes optimal transport distances very desirable as images are compared based on the cost of spatial rearrangement, unlike *L ^{p}* distances where one makes comparisons at a point. As such, optimal transport distances are better able to represent spatial variations in signals and images than non-Lagrangian distances.

The optimal transport problem (in the Kantorovich formulation) between two probability measures *μ* and *ν* on a domain *X* is defined as

where *Π(μ,ν)* is the set of coupled measures on *X ^{2}* where the first marginal is

There is substantial theory which has encouraged practitioners to adopt these distances, such as, existence of minimizers, fast computational methods and the Riemannian structure (in particular the existence of geodesics). Of course there are numerous extensions to optimal transport, such as partial transport, adding constraints, adapting the formulation to model networks and many many more. I was involved with a review paper written for practitioners [5].

Optimal transport does not directly measure the intensity of a signal. This means that the OT distance is insensitive to high frequency perturbations. Other drawbacks of optimal transport is the restrictions it places on the images/signals. One can only compare non-negative images, they must be normalised (integrate to the same value) and cannot be multi-valued. The reason for this is that images/signals are represented as measures. A project I am involved in has been to develop what we call the *TL ^{p}* distance that directly measures the intensity of a signal whilst retaining many of the key features of optimal transport and without the restrictions stated above.

We consider the coupling *(f,μ)* where *f∈ L ^{p}(μ)*. The function

With collaborators I demonstrated the improvement in performance of *TL ^{p}* over distances such as

My collaborators on this project are Soheil Kolouri (CMU), Serim Park (CMU), Gustavo Rhode (CMU) and Dejan Slepcev (CMU).