Simon Overell
I completed a PhD in Information Retrieval at Imperial College London between 2005 and 2008. Since then I have kept close to my research routes, working in early stage startups on new technologies. Prior to my PhD I received an MEng in Computer Science with Artificial Intelligence from Imperial College London.
My PhD aims to augment the Geographic Information Retrieval process with information extracted from world knowledge. This aim is approached from three directions: classifying world knowledge, disambiguating placenames and modelling users. Geographic information is becoming ubiquitous across the Internet with a significant proportion of web documents and web searches containing geographic entities, and the proliferation of Internet enabled mobile devices. Traditional information retrieval treats these geographic entities in the same way as any other textual data. I augment the retrieval process with geographic information and show that methods built upon world knowledge outperform methods based on heuristic rules.
I employ Wikipedia as source of world knowledge. Wikipedia has become a phenomenon of the Internet age and needs little introduction. As a linked corpus of semi-structured data, it is unsurpassed. Two approaches to mining information from Wikipedia are rigorously explored: initially I classify Wikipedia articles into broad categories; this is followed by much finer classification where Wikipedia articles are disambiguated as specific locations.
My thesis concludes with the proposal of the Steinberg hypothesis. By analysing a range of wikipedias in different languages I demonstrate that a fish-eye localised view of the world is ubiquitous and inherently part of human nature.
The core contributions of my work are in the areas of extracting information from Wikipedia, supervised placename disambiguation, and providing a quantitative model for how people view the world. The findings clearly have a direct impact for applications such as geographically aware search engines, but in a broader context documents can be automatically annotated with machine readable meta-data and dialogue can be enhanced with a model of how people view the world. This could potentially reduce ambiguity and confusion in dialogue between people or computers.
Supervisors: Stefan Ruger
My PhD aims to augment the Geographic Information Retrieval process with information extracted from world knowledge. This aim is approached from three directions: classifying world knowledge, disambiguating placenames and modelling users. Geographic information is becoming ubiquitous across the Internet with a significant proportion of web documents and web searches containing geographic entities, and the proliferation of Internet enabled mobile devices. Traditional information retrieval treats these geographic entities in the same way as any other textual data. I augment the retrieval process with geographic information and show that methods built upon world knowledge outperform methods based on heuristic rules.
I employ Wikipedia as source of world knowledge. Wikipedia has become a phenomenon of the Internet age and needs little introduction. As a linked corpus of semi-structured data, it is unsurpassed. Two approaches to mining information from Wikipedia are rigorously explored: initially I classify Wikipedia articles into broad categories; this is followed by much finer classification where Wikipedia articles are disambiguated as specific locations.
My thesis concludes with the proposal of the Steinberg hypothesis. By analysing a range of wikipedias in different languages I demonstrate that a fish-eye localised view of the world is ubiquitous and inherently part of human nature.
The core contributions of my work are in the areas of extracting information from Wikipedia, supervised placename disambiguation, and providing a quantitative model for how people view the world. The findings clearly have a direct impact for applications such as geographically aware search engines, but in a broader context documents can be automatically annotated with machine readable meta-data and dialogue can be enhanced with a model of how people view the world. This could potentially reduce ambiguity and confusion in dialogue between people or computers.
Supervisors: Stefan Ruger
less
InterestsView All (10)
Uploads
Papers by Simon Overell
I have evaluated three approaches to applying co-occurrence to place name disambiguation:
1. Assign a co-occurrence index to place triplets.
2. Infer co-occurrence classifiers from the ground truth.
3. Represent the places occurring in the training data as vectors in a high dimensional space. The talk will begin with a description of place name disambiguation techniques and the use of Wikipedia as a corpus. Then a description of my probabilistic models, using first and higher orders of co-occurrence. The talk will conclude with my intended future work: expansion beyond just place names to looking at all named entities.