Designer Algorithms for Astronomy
Large archives of astronomical data (images, spectra and catalogues) are being assembled into a database that will soon be accessible worldwide as part of the Virtual Observatory. The necessitates the development of techniques that will allow fast, automated classification and extraction of key physical properties for very large datasets, and the ability to visualise the structure of highly multi-dimensional data, and extract and study substructures in a flexible way. Since 2003, I have coordinated an effort between the School of Physics and Astronomy and the School of Computer Science at the University of Birmingham to develop a number of innovative algorithms for Astronomical data analysis and data mining. We have employed various machine learning techniques, including kernel methods, support vector machines, independent components and latent variable analysis. We have also worked on interesting astronomical applications of genetic programming and evolutionary computation and of research in computer vision.
At the the School of Physics and Astronomy, members involved in this activity are Somak Raychaudhury, Trevor Ponman, Alastair Sanderson, Ian Stevens and Bill Chaplin, and graduate students Matt Lazell and Aurelia Pascut. Past members involved in this activity include Louisa Nolan, Rowan Temple and Habib Khosroshahi. The principal collaborators from the School of Computer Science are Peter Tino, Ata Kabán, Ela Claridge and Xin Yao. This activity has yielded an E-science award from PPARC (Jianyong Sun, 2005-08), and PhD studentship from STFC (Matt Lazell, 2009-12). Also, this activity has produced the PhD theses of Juan Cuevas-Tello and Xiaoxia Wang, and parts of the theses of Nikolaos Gianniotis and Steve Spreckley.
Latent variable and Bayesian modelling- the use of Kernel methods and Support Vector machines: Measuring time-delays between multiple images in a gravitational lens, using long-term monitoring with high-resolution radio or optical images, is difficult in the presence of correlated noise on various scales. We are investigating various methods of doing this. Bayesian methods developed with Markus Harva (Helsinki) have met with some success , and a kernel-fitting approach using latent variables is the subject of a PhD thesis that I am jointly supervising (Juan Cuevas Tello) with Dr Peter Tino. In principle, such modelling could lead us to measure time-delayed signals from unresolved images. This method has much wider application, which will be our next goal, in applying it to automatically finding redshifts from millions of galaxy spectra and quantifying the spectral widths of emissions and absorption lines.
Independent Component analysis of galaxy spectra: Elliptical galaxies were once believed to consist of a single population of old stars formed coevally at high redshift, followed by predominantly passive evolution. However, more recent hierarchical structure formation models suggest that they are formed from the low redshift merging of disk galaxies, with associated significant star formation, and recent analyses of galaxy spectra seem to indicate the presence of significant younger populations of stars in at least some elliptical galaxies. The detailed physical modelling of such populations via spectral fitting, is computationally expensive, inhibiting the detailed analysis of the several million galaxy spectra which will become available over the next few years. Together with Ata Kaban, Markus Harva and Louisa Nolan. I have developed a data-driven application aimed at decomposing the spectra of galaxies into that of several stellar populations, without the use of detailed physical models. This method includes a Bayesian way of filling in missing data in an ensemble of spectra, and the interpretation of the independent components is terms of old and young stellar populations has already yielded spectacular results [1,5].
Hierarchical visualization of high dimensional data: work with Jianyong Sun, Peter Tino and Ata Kaban (more to come)
Inversion techniques for spectral mapping: An inversion technique for the recovery of physical parameters from multi-colour images, already successfully applied in medical imaging, has been applied to X-ray images, extracted in a set of optimal energy bands, to map the spectral properties of hot gas in clusters of galaxies. This effort, in collaboration with Ela Claridge, Mark O'Dwyer and Trevor Ponman, will facilitate extensive statistical studies of physical properties of galaxies and clusters from large X-ray archives without detailed model-fitting .
Genetic algorithms for model discovery: As data improve, the analytical forms traditionally used to model galaxies and their clusters prove to be inadequate, where departures from such simple forms may contain important information on structure and evolution. To provide a more flexible and sophisticated suite of models, in collaboration with Prof. X Yao and Dr H Khosroshahi, we are examining the use of genetic algorithms, which allow models themselves to evolve, in a fashion modelled on biological evolution, to fit photometric observations of both galaxies and clusters .
Publications in this field
Full List of publications