These proceedings contain the papers presented at ADCS 2013, the Eighteenth Australasian Document Computing Symposium, hosted by the Queensland University of Technology and held in Brisbane, Queensland, Australia.
The quality of submissions was again very high this year. Of the 23 papers submitted, 12 were accepted for full presentation at the symposium and 5 were accepted for short presentation The full written version of each submission received at least three anonymous reviews by independent, qualified international experts in the area. Dual submissions were explicitly prohibited. The accepted contributions cover a diverse range of topics in document computing, including efficiency, evaluation, enterprise search, use of social media for crisis management, information retrieval models and text classifications.
The symposium includes many formal presentations which allow a long time for questions and discussions, but also many opportunities to share ideas during interactive sessions and informal gatherings. This benefit is certainly the greatest for document computing practitioners. Once again we have collocated with the Australasian Language Technology Workshop (ALTA), sharing a keynote talk, a joint paper session, a poster session, and social events.
Proceeding Downloads
Economic models of search
Searching is inherently an interactive process usually requiring a number of queries to be submitted and a number of documents to be assessed in order to find the desired amount of relevant information. While numerous models of search have been proposed,...
Using eye tracking for evaluating web search interfaces
Using eye tracking in the evaluation of web search interfaces can provide rich information on users' information search behaviour, particularly in the matter of user interaction with different informative components on a search results screen. One of ...
Efficient top-k retrieval with signatures
This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure ...
An enterprise search paradigm based on extended query auto-completion: do we still need search and navigation?
Enterprise query auto-completion (QAC) can allow website or intranet visitors to satisfy a need more efficiently than traditional searching and browsing. The limited scope of an enterprise makes it possible to satisfy a high proportion of information ...
Classifying microblogs for disasters
Monitoring social media in critical disaster situations can potentially assist emergency and media personnel to deal with events as they unfold, and focus their resources where they are most needed. We address the issue of filtering massive amounts of ...
ADCS reaches adulthood: an analysis of the conference and its community over the last eighteen years
- Bevan Koopman,
- Guido Zuccon,
- Lance De Vine,
- Aneesha Bakharia,
- Peter Bruza,
- Laurianne Sitbon,
- Andrew Gibson
How influential is the Australian Document Computing Symposium (ADCS)? What do ADCS articles speak about and who cites them? Who is the ADCS community and how has it evolved?
This paper considers eighteen years of ADCS, investigating both the conference ...
Merging algorithms for enterprise search
Effective enterprise search must draw on a number of sources---for example web pages, telephone directories, and databases. Doing this means we need a way to make a single sorted list from results of very different types.
Many merging algorithms have ...
Power walk: revisiting the random surfer
Measurement of graph centrality provides us with an indication of the importance or popularity of each vertex in a graph. When dealing with graphs that are not centrally controlled (such as the Web, social networks and academic citation graphs), ...
Exploring the magic of WAND
Web search services process thousands of queries per second, and filter their answers from collections containing very large amounts of data. Fast response to queries is a critical service expectation. The well-known WAND processing strategy is one way ...
Integrated instance- and class-based generative modeling for text classification
Statistical methods for text classification are predominantly based on the paradigm of class-based learning that associates class variables with features, discarding the instances of data after model training. This results in efficient models, but ...
Choices in batch information retrieval evaluation
Web search tools are used on a daily basis by billions of people. The commercial providers of these services spend large amounts of money measuring their own effectiveness and benchmarking against their competitors; nothing less than their corporate ...
Conditional collocation in Japanese
Analysis of Collocation is targeted for Natural Language Processing (NLP). From a linguistic perspective, collocation provides us with a way to place words close together in a natural manner. By this approach, we can examine deep structure of semantics ...
Visual summarisation of text for surveillance and situational awareness in hospitals
Nosocomial infections (NIs, any infection that a patient contracts in a healthcare institution) cost 100, 000 lives and five billion dollars per year for 300 million Americans alone. Surveillance in hospitals holds the potential of reducing NI rates by ...
Quality biased thread retrieval using the voting model
Thread retrieval is an essential tool in knowledge-based forums. However, forum content quality varies from excellent to mediocre and spam; thus, search methods should find not only relevant threads but also those with high quality content. Some studies ...
Malformed UTF-8 and spam
In this paper we discuss some of the document encoding errors that were found when scaling our indexer and search engine up to large collections crawled from the web, such as ClueWeb09. In this paper we describe the encoding errors, what effect they ...
Crisis management knowledge from social media
More and more crisis managers, crisis communicators and laypeople use Twitter and other social media to provide or seek crisis information. In this paper, we focus on retrospective conversion of human-safety related data to crisis management knowledge. ...
Towards information retrieval evaluation with reduced and only positive judgements
This paper proposes a document distance-based approach to automatically expand the number of available relevance judgements when those are limited and reduced to only positive judgements. This may happen, for example, when the only available judgements ...
Managing short postings lists
Previous work has examined space saving and throughput increasing techniques for long postings lists in an inverted file search engine. In this contribution we show that highly sporadic terms (terms that occur in 1 or 2 documents) are a high proportion ...
Index Terms
- Proceedings of the 18th Australasian Document Computing Symposium
Recommendations
The seventeenth australasian document computing symposium
The Seventeenth Australian Document Computing Symposium was held in Dunedin, New Zealand on the 5th and 6th of December 2012. In total twenty four papers were submitted. From those eleven were accepted for full presentation and 8 for short presentation. ...