Supermassive black holes can be found in the centre of most galaxies. These black holes pull in material from the surrounding galaxy and, in doing so, produce jets of material that we can see using radio telescopes. To properly investigate these black holes, we need to know which galaxy a given black hole belongs to. We observe galaxies in the infrared wavelengths, so we essentially want to match the jets observed in radio wavelengths to the corresponding infrared galaxy. This matching is currently performed manually by "citizen scientists" in the Radio Galaxy Zoo project, but due to the large amount of astronomical data that needs to be matched, we would like computers to be able to match the jets and galaxies automatically. We can interpret the matching as a binary classification problem, where each possible infrared source can either be associated with an observed jet, or not associated with the jet. Data from Radio Galaxy Zoo can then be interpreted as a labelled training data set for this problem. Using the Radio Galaxy Zoo data in this way raises two questions. Firstly, what examples of the data should we show to citizen scientists to maximise the usefulness of their labels? This is the problem of "active learning". Secondly, citizen scientists may mislabel the data — how should we handle labels that are sometimes incorrect? This is the problem of "label noise".
This project is initially to work on the classification problem. It can then be extended to address either active learning or label noise.
All code can be found on GitHub.