Center for                           Nonlinear Analysis
CNA Home People Seminars Publications Workshops and Conferences CNA Working Groups CNA Comments Form Summer Schools Summer Undergraduate Institute PIRE Cooperation Graduate Topics Courses SIAM Chapter Seminar Positions Contact

## 2013 Summer Undergraduate Applied Mathematics Institute

### Projects:

• Introduction to Subword Pattern Avoidance, Munira Shahir
Abstract: The student defined correlation polynomials and autocorrelation polynomials of words. She used them to find two generating functions that count words avoiding given patterns. She then used one of these generating functions to find the expected time of occurrence of a given word in a probabilistic experiment.

• Building Words from Exactly the Right Set of Subwords, Carson Sestili
Abstract: The student worked on a problem that was both a containment and avoidance problem. He had to find the number of words of any given lengths than avoided a given pattern and was built from other patterns. This problem was completely solved using a result of Guibas and Odlysko. Then he worked on the distribution of words with the same subword complexity. He found the generating function of this distribution using two different methods. The first method was the result of Guibas and Odlysko combined with the inclusion-exclusion method. The second method used de Bruijn graphs and its subgraphs.

• Asymptotic Self-Similar 2D Coarsening Models, Daniel Chupin, Zachary Greenberg, Jourdain Lamperski, Alexander Voet
Abstract: The goal of this project was to investigate the existence of self-similar scaling limits for random coarsening and redistribution processes. More specifically, the group considered a coarsening process induced by the removal and repositioning of nodes in a two-dimensional, periodic Voronoi diagram. At each step in the process, a random pair of adjacent nodes are removed and replaced with a single node located at the midpoint of the removed pair. The Voronoi diagram is then reconstructed for the new set of the nodes and the process is repeated. In effect, adjacent cells merge, but then shed some fraction of their total area, which is redistributed to the surrounding cells.

The first part of the project was to conduct numerical simulations of the process using MATLAB. The simulations suggested that the distribution of cell sizes in the Voronoi diagram approached a self-similar form as the number of steps in the process became very large. The second part of the project was to develop a mean-field model for the process which exhibited scaling limits with the same qualitative properties as the Voronoi process. The idea was to model a simplified process without the geometric constraints of a Voronoi diagram, but with similar redistribution properties. For the new model, we established, analytically, the existence of self-similar solutions which deviate only slightly from the self-similar form observed in the numerical Voronoi computations. A new and interesting feature of the analysis is the derivation of mean-field equations which have a structure connected with delay differential equations.

• Using Regression Models to Predict Redshifts, Cathleen Gillette, Cary Morgan, Elizabeth Spencer
Abstract: In this project, we developed and deployed statistical regression models for predicting the redshift of a galaxy using only five images of that galaxy, taken through a set of five filters.

Redshift estimation using such data (called the "photometric redshift estimation problem") is a critical challenge facing researchers in astronomy and cosmology. The redshift of a galaxy is a proxy for distance to that galaxy, and also a proxy for the time at which that galaxy is being observed. By studying the changes of galaxies with redshift, one is studying the evolution of the Universe with time. It is clear that this can only occur with sufficiently accurate redshift estimates.

We worked with three well-established regression techniques: linear models, generalized additive models, and regression trees. Linear models are classic tools, and their consideration for this purpose is largely to illustrate the basic concepts; it was immediately clear that these simple models are inadequate for the complexity of the relationship between redshift and the photometric data. Generalized additive models are built on regression splines, and hence are a flexible, nonparametric approach to estimation. Regression trees arise largely from the machine learning literature, and are another nonparametric approach to regression.

A big issue in this work was the construction of relevant predictor variables. The raw data comes in the form of five images per galaxy; these images must be translated into real-valued quantities prior to their use in the models. This required careful consideration of the most important information encoded in each image. For example, it is natural to first consider that the image intensity towards the center of the galaxy holds more important information than the edge of the image. But, the irregular shapes of galaxies complicates the construction of these predictors.

Another issue in this work is varying data quality. Looking through the thousands of galaxy images that comprise that data set, it is immediately clear that some of the images are of higher quality than others. Some of the images appear to be purely noise, with little signal from the galaxy itself. Other images appear to actually include two galaxies. It is important that low quality images be downweighted in the analysis. Hence, a significant amount of effort was put into quantifying the quality of an image, translating quality into a weight that could be attached to each galaxy.

A final theme that was stressed throughout the work was the risk of overfitting regression models. It is critical that any model that is constructed perform well at predicting redshifts of galaxies that were not included in the fitting of the model. This is, after all, the point of constructing the regression model. Overfitting was avoided by the use of cross-validation techniques, and separating the original data into training and test sets.

Finally, the results showed that the generalized additive models were the most effective at predicting redshifts. Predictors that were functions of higher wavelength bands seemed to be the most important to the models.

• Searching for Combinatorial Interpretations Using the Bhargava Defined Factorial, Joseph Gault, Samantha Davies
Abstract: The goal of this project was to understand Bhargava's generalization of the standard factorial function and to investigate the corresponding generalization of the binomial coefficient. Bhargava showed that the binomial coefficient was integral using any Bhargava factorial, but there is no known significance of the binomial coefficient itself. We focused on the case where the factorial is defined over a geometric progression with initial term 1 and common ratio a power of a prime and showed that n!, in this case, was equal to the order of $GL_n(F_q)$ (where $q$ is the common ratio). They then proceeded to investigate Pascal's triangle of the generalized binomial coefficients and discovered (and proved) several interesting properties about the binomial coefficients.