Nothing Special   »   [go: up one dir, main page]

(FREE PDF Sample) Transactions On Large Scale Data and Knowledge Centered Systems XL Abdelkader Hameurlain Ebooks

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Full download test bank at ebook textbookfull.

com

Transactions on Large Scale Data


and Knowledge Centered Systems XL

CLICK LINK TO DOWLOAD

https://textbookfull.com/product/transactions
-on-large-scale-data-and-knowledge-centered-
systems-xl-abdelkader-hameurlain/

textbookfull
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Transactions on Large Scale Data and Knowledge Centered


Systems XXV 1st Edition Abdelkader Hameurlain

https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxv-1st-edition-abdelkader-
hameurlain/

Transactions on Large Scale Data and Knowledge Centered


Systems XXXIX Special Issue on Database and Expert
Systems Applications Abdelkader Hameurlain

https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxxix-special-issue-on-
database-and-expert-systems-applications-abdelkader-hameurlain/

Transactions on Large Scale Data and Knowledge Centered


Systems XXXI Special Issue on Data and Security
Engineering 1st Edition Abdelkader Hameurlain

https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxxi-special-issue-on-data-
and-security-engineering-1st-edition-abdelkader-hameurlain/

Transactions on Large Scale Data and Knowledge Centered


Systems XXIII Selected Papers from FDSE 2014 1st
Edition Abdelkader Hameurlain

https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxiii-selected-papers-from-
fdse-2014-1st-edition-abdelkader-hameurlain/
Transactions on Large Scale Data and Knowledge Centered
Systems XXXIV Special Issue on Consistency and
Inconsistency in Data Centric Applications 1st Edition
Abdelkader Hameurlain
https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxxiv-special-issue-on-
consistency-and-inconsistency-in-data-centric-applications-1st-
edition-abdelkader-hameurlain/

Large Scale Data Analytics Chung Yik Cho

https://textbookfull.com/product/large-scale-data-analytics-
chung-yik-cho/

Large Scale Data Handling in Biology Karol Kozak

https://textbookfull.com/product/large-scale-data-handling-in-
biology-karol-kozak/

Big Data Analytics for Large Scale Multimedia Search


Stefanos Vrochidis

https://textbookfull.com/product/big-data-analytics-for-large-
scale-multimedia-search-stefanos-vrochidis/

Large Scale Integrated Energy Systems Planning and


Operation Qing-Hua Wu

https://textbookfull.com/product/large-scale-integrated-energy-
systems-planning-and-operation-qing-hua-wu/
Franck Morvan • Lynda Tamine
LNCS 11360 Journal Subline Guest Editors

Transactions on
Large-Scale
Data- and Knowledge-
Centered Systems XL
Abdelkader Hameurlain • Roland Wagner
Editors-in-Chief

123
Lecture Notes in Computer Science 11360
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
More information about this series at http://www.springer.com/series/8637
Abdelkader Hameurlain Roland Wagner

Franck Morvan Lynda Tamine (Eds.)


Transactions on
Large-Scale
Data- and Knowledge-
Centered Systems XL

123
Editors-in-Chief
Abdelkader Hameurlain Roland Wagner
IRIT, Paul Sabatier University FAW, University of Linz
Toulouse, France Linz, Austria
Guest Editors
Franck Morvan Lynda Tamine
IRIT, Paul Sabatier University IRIT, Paul Sabatier University
Toulouse, France Toulouse, France

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISSN 1869-1994 ISSN 2510-4942 (electronic)
Transactions on Large-Scale Data- and Knowledge-Centered Systems
ISBN 978-3-662-58663-1 ISBN 978-3-662-58664-8 (eBook)
https://doi.org/10.1007/978-3-662-58664-8

Library of Congress Control Number: 2018966822

© Springer-Verlag GmbH Germany, part of Springer Nature 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer-Verlag GmbH, DE


part of Springer Nature
The registered company address is: Heidelberger Platz 3, 14197 Berlin, Germany
Preface

This volume contains five fully revised selected regular papers, covering a wide range
of very hot topics in the fields of social networks, data stream systems, and linked data.
These include personalized social query expansion approaches, continuous query on
social media streams, elastic processing systems, and semantic interoperability for
smart grids and NoSQL environments. We would like to sincerely thank the editorial
board and the external reviewers for their thorough reviews of the submitted papers and
ensuring the high quality of this volume.
Special thanks go to Gabriela Wagner for her availability and her valuable work in
the realization of this TLDKS volume.

October 2018 Abdelkader Hameurlain


Franck Morvan
Lynda Tamine
Roland Wagner
Organization

Editorial Board
Reza Akbarinia Inria, France
Bernd Amann LIP6 – UPMC, France
Dagmar Auer FAW, Austria
Djamal Benslimane Lyon 1 University, France
Stéphane Bressan National University of Singapore, Singapore
Mirel Cosulschi University of Craiova, Romania
Dirk Draheim Tallin University of Technology, Estonia
Johann Eder Alpen Adria University Klagenfurt, Austria
Anastasios Gounaris Aristotle University of Thessaloniki, Greece
Theo Härder Technical University of Kaiserslautern, Germany
Sergio Ilarri University of Zaragoza, Spain
Petar Jovanovic Universitat Politècnica de Catalunya, BarcelonaTech, Spain
Dieter Kranzlmüller Ludwig-Maximilians-Universität München, Germany
Philippe Lamarre INSA Lyon, France
Lenka Lhotská Technical University of Prague, Czech Republic
Vladimir Marik Technical University of Prague, Czech Republic
Jorge Martinez Gil Software Competence Center Hagenberg, Austria
Franck Morvan Paul Sabatier University, IRIT, France
Torben Bach Aalborg University, Denmark
Pedersen
Günther Pernul University of Regensburg, Germany
Soror Sahri LIPADE, Descartes Paris University, France
A Min Tjoa Vienna University of Technology, Austria
Shaoyi Yin Paul Sabatier University, France
Osmar Zaiane University of Alberta, Edmonton, Cananda

External Reviewers
José María De Universidad Carlos III de Madrid, Spain
Fuentes
Tamer Elsayed Qatar University, Qatar
Evangelos Kanoulas University of Amsterdam, The Netherlands
Riad Mokadem Paul Sabatier University, France
Eric Pardede La Trobe University, Australia
Contents

Personalized Social Query Expansion Using Social Annotations . . . . . . . . . . 1


Mohamed Reda Bouadjenek, Hakim Hacid, and Mokrane Bouzeghoub

A Data Services Composition Approach for Continuous Query


on Social Media Streams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Guiling Wang, Xiaojiang Zuo, Marc Hesenius, Yao Xu, Yanbo Han,
and Volker Gruhn

DABS-Storm: A Data-Aware Approach for Elastic Stream Processing . . . . . . 58


Roland Kotto Kombi, Nicolas Lumineau, Philippe Lamarre,
Nicolo Rivetti, and Yann Busnel

SSG: An Ontology-Based Information Model for Smart Grids . . . . . . . . . . . 94


Khouloud Salameh, Richard Chbeir, and Haritza Camblong

Bridging the Semantic Web and NoSQL Worlds: Generic SPARQL Query
Translation and Application to MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . 125
Franck Michel, Catherine Faron-Zucker, and Johan Montagnat

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167


Personalized Social Query Expansion
Using Social Annotations

Mohamed Reda Bouadjenek1 , Hakim Hacid2(B) , and Mokrane Bouzeghoub3


1
Department of Mechanical and Industrial Engineering, University of Toronto,
Toronto, Canada
mrb@mie.utoronto.ca
2
Zayed University, Dubai, United Arab Emirates
hakim.hacid@zu.ac.ae
3
University of Versailles, Versailles, France
mokrane.bouzeghoub@uvsq.fr

Abstract. Query expansion is a query pre-processing technique that


adds to a given query, terms that are likely to occur in relevant docu-
ments in order to improve information retrieval accuracy. A key problem
to solve is “how to identify the terms to be added to a query? ” While con-
sidering social tagging systems as a data source, we propose an approach
that selects terms based on (i) the semantic similarity between tags com-
posing a query, (ii) a social proximity between the query and the user
for a personalized expansion, and (iii) a strategy for expanding, on the
fly, user queries. We demonstrate the effectiveness of our approach by an
intensive evaluation on three large public datasets crawled from delicious,
Flickr, and CiteULike. We show that the expanded queries built by our
method provide more accurate results as compared to the initial queries,
by increasing the MAP in a range of 10 to 16% on the three datasets.
We also compare our method to three state of the art baselines, and we
show that our query expansion method allows significant improvement
in the MAP, with a boost in a range between 5 to 18%.

Keywords: Personalization · Social Information Retrieval


Social networks · Query expansion

CR Subject Classification: H.3.3 [Information Systems]: Information


Storage and Retrieval · Information Search and Retrieval

1 Introduction

Web 2.0 has strengthened end-users position in the Web through their inte-
gration in the heart of the content generation ecosystem. This has been made
possible mainly through the availability of tools such as social networks, social
bookmarking systems, social news sites, etc., impacting the way information is
produced, processed, and consumed by both humans and machines. As a result,
c Springer-Verlag GmbH Germany, part of Springer Nature 2019
A. Hameurlain et al. (Eds.): TLDKS XL, LNCS 11360, pp. 1–25, 2019.
https://doi.org/10.1007/978-3-662-58664-8_1
2 M. R. Bouadjenek et al.

on the one hand, the user is no longer able to digest the large quantity of infor-
mation he has access to and is generally overwhelmed by it. On the other hand,
most of popular Information Retrieval (IR) systems lack in offering efficient per-
sonalization techniques, which provide users only with the necessary information
that fulfill their needs. Two types of constraints make the situation more com-
plex: information-dependent constraints and user-dependent constraints. The
first class of constraints includes (i) the large scale due to the continuous activi-
ties of users and their ability to generate new content, (ii) information diversity
or heterogeneity, since different types of media are used to communicate, e.g.,
text, image, video, etc. (iii) versatility, since information is dynamic and is con-
tinuously updated (confirmed, contradicted, etc.), (iv) its disparity, since it can
be in different places, and as a result (v) the variation in the quality of infor-
mation. The second class of constraints is mainly related to users’ diversity and
the high dynamics in their profiles.
To improve the IR process and reduce the amount of irrelevant documents,
there are mainly three possible improvement tracks: (i) query reformulation using
extra knowledge, i.e., expansion or refinement of the user query, (ii) post filtering
or re-ranking of the retrieved documents (based on the user profile or context),
and (iii) improvement of the IR model, i.e., reengineering of the IR process to
integrate contextual information and relevant ranking functions. In this paper,
we focus on query reformulation, especially on personalized query expansion for
personalized search, i.e., personalizing the reformulation of queries.
Query expansion consists of enriching the user’s initial query with additional
information so that the IR system may propose suitable results that better satisfy
user’s needs [14,15,19]. We explore the possibility of using the data available in
social networks, and more precisely data of social bookmarking systems, as a
source of explicit feedback information. These latter enable users to freely add,
annotate, edit, and share bookmarks of web resources, e.g., web pages. Basically,
we propose an approach which reuses the users vocabulary (the terms used to
annotate web pages) in order to expand their queries in a personalized way and
thus, increase their satisfaction regarding the quality of search. Exploiting social
knowledge for improving web search has a number of advantages:
– Feedback information in social networks is provided directly by the user,
so users interests accurate information can be harvested as people actively
express their opinions on social platforms. Thus, this user interest can be
easily modeled to provide personalized services.
– A huge amount of social information is published and available with the
agreement of the publishers. Exploiting these information should not violate
user privacy, in particular social tagging information, which doesn’t contain
sensitive information about users.
– Finally, social resources are often publicly accessible, as most of social net-
works provide APIs to access their data (even if often, a contract must be
established before any use of the data).
Personalized Social Query Expansion Using Social Annotations 3

Our approach in this work1 consists of three main steps: (i) determining
similar and related tags to a given query term through their co-occurrence over
resources and users, (ii) constructing a profile of the query issuer based on his
tagging activities, which is maintained and used to compute expansions, and
finally, (iii) expanding the query terms, where each term is enriched with the
most interesting tags based on their similarities and their interest to the user.
The problem we are tackling in this paper is strongly related to personaliza-
tion since we want to expand queries in a personalized way and consequently
propose adapted search results. Personalization allows to differentiate between
individuals by emphasizing on their specific domains of interest and their prefer-
ences. It is a key point in IR and its demand is constantly increasing by various
users for adapting their results [3]. Several techniques exist to provide person-
alized services among which the user profiling. The user profile is a collection
of personal information associated to a specific user that enables to capture his
interests. Details of how we model user profiles are given in Sects. 2 and 3.1.4.
The main contributions of this work can be summarized as follows:

1. We propose an approach in which we use social knowledge as explicit feedback


information for the expansion process. Reusing such a social knowledge aims
at expanding user queries with their own vocabularies instead of using a public
thesaurus, which is made by people who are not aware of the individual users
needs and expectations.
2. We propose a Personalized Social Query Expansion framework called PSQE.
This latter provides a user-dependent query expansion based on social knowl-
edge, i.e., for the same query of two different users, PSQE will provide two
different expanded queries, which will be processed by a search engine.
3. Using an evaluation on real data gathered from three different large book-
marking systems, we demonstrate the effectiveness of our framework for
socially driven query expansion compared to many state of the art approaches.

The rest of this paper is organized as follows: in Sect. 2 we introduce all the
concepts that we use throughout this paper. Section 3 introduces our method
of query expansion using folksonomy. In Sect. 4, we discuss the different experi-
ments that evaluate the performance of our approach. Related work is discussed
in Sect. 5. Finally, we conclude and provide some future directions in Sect. 6.

2 Background and Notations

In this section, we formally define the basic concepts that we use throughout this
paper namely, a bookmarks, a folksonomy, and a user profile. We also provide a
formal definition of the problem we are intending to solve.

1
This is an extended and revised version of a preliminary conference report that was
presented in [12].
4 M. R. Bouadjenek et al.

2.1 Background

Social bookmarking websites are based on the techniques of social tagging or


collaborative tagging. The principle behind social bookmarking platforms is to
provide the user with a means to annotate resources on the Web, e.g., URIs in
delicious 2 , videos in youtube 3 , images in flickr 4 , or academic papers in CiteU-
Like 5 . These annotations (also called tags) can be shared with others. This
unstructured (or better, free structured) approach to classification with users
assigning their own labels is variously referred to as a folksonomy [21,28]. A
folksonomy is based on the notion of bookmark, which is formally defined as
follows:

Definition 1 (Bookmark). Let U , T , R be respectively the set of Users, Tags


and Resources. A bookmark is a triplet (u, t, r) such as u ∈ U , t ∈ T , r ∈ R,
which represents a user u who used a tag t to annotate a resource r.

Then, a group of bookmarks which forms a folksonomy is formally defined


as follows:
Definition 2 (Folksonomy). Let U , T , R be respectively the set of Users,
Tags and Resources. A folksonomy F(U, T, R) is a subset of the cartesian product
U × T × R such that each triple (u, t, r) ∈ F is a bookmark.
A folksonomy can then be naturally represented by a tripartite-graph where
each ternary edge represents a bookmark. In particular, the graph representation
of the folksonomy F is defined as a tripartite graph G(V, E) where V = U ∪ T ∪ R
and E = {(u, t, r)|(u, t, r) ∈ F}. Figure 1 shows seven bookmarks provided by
two users on three resources using three tags.

Fig. 1. Example of a folksonomy. The triples (u, t, r) are represented as ternary-edges


connecting users, resources and tags.

2
http://www.delicious.com/.
3
http://www.youtube.com/.
4
http://www.flickr.com/.
5
http://www.citeulike.org/.
Personalized Social Query Expansion Using Social Annotations 5

Folksonomies have proven to be a valuable knowledge for user profiling [17,


35,41,43]. Especially, because users tag interesting and relevant information to
them with keywords that may be a good summary of their interest. Hence, in
this paper, and in the context of folksonomies, the profile includes all the terms
used as tags along with their weights to capture user’s tagging activities. It is
formally defined as follows:
Definition 3 (User Profile). Let U , T , R be respectively the set of Users, Tags
and Resources of a folksonomy F(U, T, R). A profile pu assigned to a user u ∈ U ,
is modeled as a weighted vector pu of m dimensions, where each dimension
represents a tag the user employed in his tagging actions. More formally, pu =
{wt1 , wt2 , ..., wtm } such that wtm is the weight of tm , such as tm ∈ T ∧ (∃r ∈ R |
(u, tm , r) ∈ F).
Thus, the profile includes the most relevant terms for the user and not all his
activities, i.e., the documents that he has tagged. A value is associated to each
term of the profile expressing its strength and importance for the given user.
Later in Sect. 3.1.4, we propose a method to assign weights to each term in
the user profile in order to better define his interests.

2.2 Problem Definition


As mentioned before, query expansion consists of enriching the initial query
with additional information. This expansion is generally expected to provide
better search results. However, providing merely a uniform expansion to all users
is, from our point of view, not really suitable nor efficient since relevance of
documents is relative for each user. Thus, a simple and uniform query expansion
is not enough to provide satisfactory search results for each user. Hence, having
a folksonomy F(U, T, R), the problem we are addressing can be formalized as
follows:
For a given user u ∈ U who issued a query q = {t1 , t2 , ..., tn }, how to provide
for each term ti ∈ q a ranked list of related terms L = {ti1 , ti2 , ..., tik }, such that
when expanding the term ti with the top k of L, the most relevant documents
are put earlier in the ranking?

3 Social Query Expansion Approach


The approach we are proposing aims at expanding user’s queries in a personalized
way. It can be decomposed into two parts: (i) an offline and (ii) an online part.
The offline part performs the heavy computation which consists of transforming
the whole social graph of a folksonomy F into a graph of tags where two tags
are related if they are semantically related. This part is also responsible for the
construction and the update of the users’ profiles, for serving the online part. The
online part of the approach is responsible for computing the concrete expansion
using the graph of tags and the user’ profiles constructed in the offline part. In
the following, we describe in more details each part and we explicitly highlight
our contributions.
6 M. R. Bouadjenek et al.

3.1 Offline Part

The offline part is also decomposed into two facets: (i) the transformation of
the social graph of a folksonomy F into a graph of tags, representing similarities
between tags that either occur on the same resources or are shared by the same
users, and (ii) the computation of the users’ profiles to highlight their interests
for personalizing their queries.
The approach is based on the creation and the maintenance of a graph of
tags that represents all the similarities that exist between the tags of F. There
exist two kinds of approaches that propose to achieve that: (i) an approach based
on the co-occurrence of tags over resources, and (ii) an approach based on their
co-occurrence over users.

3.1.1 Extracting Semantics from Resources


In the first category of approaches, [24,30,33] state that semantically related
tags are expected to occur over the same resources. For example, tags that most
occur for google.com on delicious are: search, google, engine, web, internet.
Thus, extracting semantically related tags can be carried out by computing
similarities. There exist many similarity measures [30], but all of them need pre-
processing that consists of reducing the dimensionality of the tripartite graph F
into a bipartite graph. This reduction is generally performed through aggrega-
tion methods. From the study of existing aggregation methods proposed in [30],
we have chosen the projectional aggregation along with the Jaccard, the Dice,
and the Overlap similarity measures to compute the similarity between tags.
We choose this aggregation method because its simplicity, and it is one which
gives better results in semantic information extraction [30]. Hence, we follow the
same process as [30] to extract a graph of related tags from F according to their
co-occurrence over resources:

1. Using a function F on the whole folksonomy F performs a projectional aggre-


gation over the user dimension, resulting in a bipartite graph Tag-Resource.
2. Then, using a function G on the resulting bipartite graph Tag-Resource pro-
vides a graph of tags TR , in which each link is weighted with the simi-
larity between tags according using the Jaccard, the Dice or the Overlap
metrics [30].

Therefore, we may obtain either a graph of tags TR using the Jaccard, the
Dice, or the Overlap. Note that we do not merge the similarity measures in a
same graph of tags, meaning that a graph of tags is constructed using only one
similarity measure.
We end-up with an undirected weighted graph in which nodes represent tags,
and an edge between two tags represents the fact that these tags occur together
at least on one resource. The weights associated to edges are computed from
similarities between tags as explained beforehand. This first step is illustrated
in the left upper part of Fig. 2.
Personalized Social Query Expansion Using Social Annotations 7

Fig. 2. Summary of the graph reduction process, which transform the whole folksonomy
F into a graph of tags TU R . The similarity values on the Figure are computed using
the Jaccard measure on both graphs TR and TU , and using α = 0.5 on the graph TU R .

3.1.2 Extracting Semantics from Users


In the second category, [4,33] state that correlated tags are also used by the
same users to annotate resources. For example, the tags Collaborative and Blog
have been used 13,557 times together by users in our delicious dataset.
This observation is more expected to happen in certain folksonomies, where
users are encouraged to upload their personal resources which leads to generate
private bookmarks, e.g., a folksonomy such as CiteULike, Flickr, or YouTube
where users are expected to upload respectively their research papers, images,
and videos. Therefore, similarly to the previous approach, [33] proposes to
extract semantically related tags using the following process:
8 M. R. Bouadjenek et al.

1. Using a function G  on the folksonomy F performs a projectional aggregation


over the resource dimension for obtaining a bipartite graph Tag-User.
2. Then the function F  is used to get another graph of tags TU where simi-
larities between tags are computed using one of the three previous similarity
measures.

This process is illustrated in the right upper part of Fig. 2. Notice that the
structure of the graph of tags TR is different from the one of the graph of
tags TU .

3.1.3 Construction of the Graph of Tag Similarities


Using only one of the two previous methods to construct a graph representing
similarities between tags leads to a loss of information on one side or the other.
For example, if we choose to extract related tags according to their co-occurrence
over resources, we neglect the fact that there are some tags which are expected
to be shared by the same users and vice versa.
Therefore, we propose to use a function M which is applied on the graphs of
tags TR and TU to merge them and to get a unique graph of tags TU R where the
new similarity values are computed by merging the values using the Weighted
Borda Fuse (WBF ) [18]. This merge is summarized in Eq. 1, where 0 ≤ α ≤ 1:

SimTU R (ti , tj ) = α × SimTR (ti , tj ) + (1 − α) × SimTU (ti , tj ) (1)

Where, SimTU R (ti , tj ) calculates the similarity between two tags relying on the
two other types of nodes, i.e., users and resources. The parameter α represents
the importance one wants to give to the two types of graphs, i.e., resources or
users, in the consideration of the similarity calculation. In fact, depending on the
context, when computing the similarity between two tags, one may want to give
a higher importance to users sharing these two tags than documents having these
tags as a common tags. Another user may want to give more importance to their
co-occurrence over resources than to the users sharing these tags. Depending on
the nature of the folksonomy, we set α to its optimal value in order to maximize
the tags semantics extraction. Finally, it should be noted that the merge is
performed between graphs generated with the same similarity measure.
This step of the offline part extracts semantics from the whole social graph
of F without a loss of information, i.e., by exploiting the co-occurrences of tags
over resources and users. This step leads to the creation of a graph of tags, where
edges represent semantic relations between tags. This graph will be further used
to extract terms that are semantically related to a given term of a query to
perform the query expansion. The contribution at this stage is the combination
of the graphs resulting from resources and users to construct a better graph of
tag similarities without loss of information. This is different from the existing
approaches where only one graph is used.
In the following, we introduce our method of constructing and weighting the
user profiles in order to personalize the expansions.
Personalized Social Query Expansion Using Social Annotations 9

3.1.4 Construction of the User Profile


To achieve a personalized expansion, we also propose to build a user profile
that consists of capturing information regarding real user interests. There are
different ways to build user profiles [23,40,41]. For example, a person may be
modeled as a vector of attributes of his online personal profiles including the
name, affiliation, and interests. Such simple factual data provides an inadequate
description of the individual, as they are often incomplete, mostly subjective and
do not reflect dynamic changes [23].
Since we focus on folksonomies, the user feedback is expected to be mostly
explicit (because of the tagging action, where the user explicitly assigns tags to
resources).
Thus, in a folksonomy, users are expected to tag and annotate resources
that are interesting to them using tags that summarize their understanding of
resources. In other words, these tags are in turn expected to be a good summary
of the user’s topics of interests as also discussed in [2,17,23,35,37,43]. Hence,
each user can be modeled as a set of tags and their weights.
The definition of a user profile is given in Definition 3. The main challenge
here is how to define the weight of each tag in the user profile? We propose
to use an adaptation of the well known tf-idf measure to estimate this weight.
Hence, we define the weight wti of the term ti in the user profile as the user
term frequency, inverse user frequency (utf-iuf ), which is computed as follows:
 
nt ,u |U |
utf − iufti ,uj =  i j × log (2)
ntk ,uj |Uti |
tk ∈pm
u

where nti ,uj is the number of time the user uj used the tag ti .
A high value of utf-iuf is reached by a high user term frequency and a low
user frequency of the term in the whole set of users. Note that we perform a
stemming on tags before computing the profiles, to eliminate the differences
between terms having the same root to better estimate the weight of each term.
User profiles are created offline and maintained incrementally. This is moti-
vated by the fact that profiles and tagging actions are not evolving as quickly
as query formulation on the system. As an analogy, it is well known that 90%
of users in the social Web consume the content (i.e., query formulation), 9%
update content, and 1% generate new content (profile updates) [34]. Thus, we
have decided to handle the profile construction as an offline task while providing
a maintenance process for keeping it up to date.
In summary, at the end of the offline part, we build two assets: (i) a graph
of tags similarities which is used to represent semantically relatedness of terms,
and (ii) user profiles which are leveraged in the personalization step.

3.2 Online Part


The online part of the approach is responsible for computing the concrete expan-
sion using the graph TU R and the profiles constructed in the offline part. Before
10 M. R. Bouadjenek et al.

presenting our algorithm of query expansion, we propose a method to compute,


on the fly, the interest of a user to a given tag.

3.2.1 Interest Measure to Tag


Having computed the similarity graph between tags and built users’ profiles
containing the degree to which a set of tags are representative of a user, it
becomes possible to compute a degree of interest a user may have to other tags,
e.g., query tags. This is useful in our approach to compute, in real time, the
suitable expansions of a tag w.r.t. a given user. In our approach, this interest
is seen as a similarity between the user profile pu and a tag ti . Intuitively, the
computed similarity captures the interest of the user u in the query term ti
denoted Itui :

Iu (ti ) = (SimTU R (ti , tj ) × wj ) (3)
tj ∈pu

where Sim(ti , tj ) is the similarity between the term ti and tj , the j th term of the
user profile, and wj is the weight of the term tj in the profile computed during
the previous process. Notice that any similarity measure can be used for com-
puting Sim(ti , tj ), as discussed in [30]. In this work, we consider the Jaccard, the
Overlap, and the Dice similarity measures, as discussed in the previous sections.

3.2.2 Effective Query Expansion


In this step of query expansion, we consider that the similarity between two
terms ti (a query term) and tj (a potential candidate for the expansion of ti ), to
be influenced by two main features: (i) the semantic similarity between ti and
tj (the semantic strength between the two terms), and (ii) the extent to which
the tag tj is likely to be interesting to the considered user.
Once these two similarities are computed, a merge operation is necessary to
obtain a final ranking value that indicates the similarity of tj with ti w.r.t. the
user u. For this, several aggregation methods and algorithms exist. We choose
the Weighted Borda Fuse (WBF) as summarized in Eq. 4, where 0 ≤ γ ≤ 1 is
a parameter that controls the strength of the semantic and social parts of our
approach. Using Eq. 4, we can rank a list of terms L, which are semantically
related to a given term ti from a user perspective.
Semantic Part
  
Ranktu (tj ) = γ × SimTU R (t, tj ) + (1 − γ) × Ituj (4)
  
Social Part

The effective social query expansion is summarized in Algorithm 1. Hence,


for a query q = t1 ∧ t2 ∧ ... ∧ tm issued by a user u, we first get the user’s profile,
which is computed as explained above (Sect. 3.1.4 and Line 1 in Algorithm 1).
At this stage, the purpose is to enrich each term ti of q with related terms (line
2). Then, the objective is to get all the neighboring tags tj of ti in the tag graph
TU R (line 3). After that (in line 4), we compute for each tj , the ranking value
Personalized Social Query Expansion Using Social Annotations 11

that indicates its similarity with ti w.r.t. the user u using formula 4 (line 5).
Next, the neighbor list has to be sorted according to the computed values and
we keep only the k top tags (line 7). Finally, ti and its remaining neighbors must
be linked with the OR (∨) logical connector (line 8) and updated in q  .

Algorithm 1. Effective Social Query Expansion


Require: A folksonomy F
u : a User. q = {t1 , t2 , ..., tn } : a Query.
1: pu [m] ← extract profile of u from F
2: for all ti ∈ q do
3: L ← list of neighbor of ti in tag graph TU R
4: for all tj ∈ l do
5: tj .V alue ← Compute the ranking score Ranktui (tj )
6: end for
7: Sort L according to tj .V alue and keep only the top k terms in L
8: Make a logical OR (∨) connection between ti and all terms of L
9: Set the weight of the new terms tj as the tj .V alue or the TF-IDF value, depend-
ing on the choosed strategy (See Section 3.2.3)
10: Insert L in q 
11: end for
12: return q 

Example 1. If a user issues a query q = t1 ∧ t2 ∧ ... ∧ tm , it will be expanded to


q  = {(t1 ∨ t11 ∨ ...∨ t1l ) ∧ (t2 ∨ t21 ∨ ...∨ t2k ) ∧ ...∧ (tm ∨ tm1 ∨ ...∨ tmr )}, where
tij is a term that is semantically related to ti ∈ q and socially to u.

It should be noted that in this paper, we consider that the selection of each
query term is determined independently, without considering latent term rela-
tions. Most past work on modeling term dependencies has analyzed three dif-
ferent underlying dependency assumptions: full independence, sequential depen-
dence [39], and full dependence [32]. Taking into account terms dependency is
part of our future works.

3.2.3 Terms Weighting


Term weighting in query expansion is challenging since there is no formal method
for assigning weights to new terms. Indeed, appropriately weighting terms should
result in better retrieval performance. Thus, we experiment the following two
strategies for weighting new terms:

– Using the ranking values of Formula 4 as the weight of the new expanded
terms. This strategy provides personalized term weight assignment while con-
sidering both semantic strength and user interest.
12 M. R. Bouadjenek et al.

– Using the Term Frequency-Inverse Document Frequency (TF-IDF) [1] as the


weight of the new expanded terms as follows:
 
|D|
tf − idfti ,q = tfti × log (5)
|Dti |

where tfti denotes the term frequency of ti in the query q. This strategy
provides a uniform term weight to the query while keeping the personalizing
aspect in choosing terms. Notice that weights are assigned to terms in the
line 9 of Algorithm 1.

4 Evaluations
In this section, we describe the two types of evaluations we performed on our
approach: (i) an estimation of the parameters of our approach to provide insights
regarding their potential impact on the system, and (ii) a comparison study,
where our approach is compared to the closest state of the art approaches to
provide insights about the obtained results and position the proposal.

4.1 Datasets
A number of social bookmarking systems exist [21]. We have selected three
datasets to perform an offline evaluation: delicious, flickr and CiteULike. These
datasets are available and public. The interest of using such data instead of
crawled data is to work on widely accepted data sets, reduce the risk of noise,
and an ability to reproduce the evaluations by others as well as the ability to
compare our approach to other approaches on “standardized datasets”. Hereafter
is the description of the different datasets.
– Delicious: a social bookmarking web service for storing, sharing, and dis-
covering web bookmarks. We have used a dataset which is described and
analyzed in [42]6 .
– Flickr: an image hosting, tagging and sharing website. The Flickr dataset is
the one used and studied in [38]7 .
– CiteULike: an online bookmarking service that allows users to bookmark
academic articles. This dataset is the one provided by the CiteULike website8 .
Before the experiments, we performed three data preprocessing tasks: (1) Sev-
eral annotations are too personal or meaningless, such as “toread”, “Imported
IE Fa-vorites”, “system:imported”, etc. We remove some of them manually. (2)
Although the annotations from delicious are easy for users to read and under-
stand, they are not designed for machine use. For example, some users may
concatenate several words to form an annotation such as “java.programming”
6
http://data.dai-labor.de/corpus/delicious/.
7
http://www.tagora-project.eu/data/#flickrphotos.
8
http://static.citeulike.org/data/2007-05-30.bz2.
Personalized Social Query Expansion Using Social Annotations 13

or “java/programming”. We split this kind of annotations before using them


in the experiments. (3) The list of terms undergoes a stemming by means of
the Porter’s algorithm [36] in such a way to eliminate the differences between
terms having the same root. In the same time, the system records the relations
between stemmed terms and original terms. As for the delicious dataset, we
add two other data preprocessing tasks: (i) we downloaded all the available web
pages while removing those which are no longer available, and (ii) we removed
all the non-english web pages. This operation was performed using Apache Tika
toolkit. Table 1 gives a description of these datasets.

Table 1. Corpus details

Bookmarks Users Resources Tags


Delicious 9,675,294 318,769 425,183 1,321,039
Flickr 22,140,211 112,033 327,188 912,102
CiteULike 16,164,802 107,066 3,508,847 712,912

4.2 Evaluation Methodology


Making evaluations for personalized search is a challenge in itself since relevance
judgements can only be assessed by end-users themselves [17]. This is difficult to
achieve at a large scale. Different contributions [5,8,25,31] state that the tagging
behavior of a user of folksonomies closely reflects his behavior of search on the
Web. In other words, if a user u tags a resource r with a tag t, he will choose
to access the resource r if it appears in the result obtained by submitting t as a
query to the search engine. Thus, we can easily state that any bookmark (u, t, r)
can be used as a test query for evaluations. The main idea of the experiments is
based on the following assumption:

Proposition 1. For a personalized query q = {t} issued by a user u with a


query term t, the relevant documents are those tagged by u with t.

Hence, in the off-line study, for each evaluation, we randomly select 2, 000
pairs (u, t), which are considered to form a personalized query set. For each
corresponding pair (u, t), we remove all the bookmarks (u, t, r) ∈ F, ∀r ∈ R in
order to not promote the resource r in the obtained results. For each pair, the
user u sends the query q = {t} to the system. Then, the query q is enriched and
transformed into q  following our approach. For the delicious dataset, documents
that match q  are retrieved, ranked and sorted using the Apache Lucene. For the
Flickr and CiteULike datasets, we retrieve all resources that are annotated with
tags of q  while representing them according to the Vector Space Model (VSM).
Then, the cosine similarity is used to compute similarity between a query q  and
a resource rj .
For the Flickr and CiteULike datasets, we rank all the retrieved resources
using values of the cosine similarity and we consider that relevant resources are
14 M. R. Bouadjenek et al.

those tagged by u using tags of q  to assess the obtained results. The random
selection was carried out 10 times independently, and we report the average
results.
A query expansion is expected to provide more resources as an answer to a
query because of its enrichment, which generally causes an increase in the total
recall. In our evaluation, we are more interested in studying the ability of the
method to push relevant documents to the top of the ranking. Thus, we use
the Mean Average Precision (MAP) and the Mean Reciprocal Rank (MRR), two
performance measures that take into account the ranking of relevant resources.

4.3 Study of the Parameters


We intend here to observe the parameters of our approach and estimate their
optimal values. These parameters are:
– γ, which controls the semantic part and the social part in the ranking of
tags for an expansion (see Eq. 4). The higher its value is, the stronger is the
semantic part in tag similarity ranking, and vice versa.
– The number of tags which are suitable for the expansion.
– α, which gives either a higher importance to resources or to users, when
computing the graph of tags TU R . We set this parameter such that: the higher
its value is, the stronger are the resources’ links, and thus weaker the users
links are, and vice versa (see Eq. 1).
– We evaluate two strategies for weighting the expanded terms (see Sect. 3.2.3).
– Finally, we observe the impact of the similarity measures over the search
results.
We refer to our approach in Figs. 3, 4, 5, and 6 as Personalized Social Query
Expansion (PSQE). Also, all the Figures contain the results according to each
similarity measure, and for each similarity measure, the results of the two weight-
ing strategies are shown (this results in six curves per graph).

4.3.1 Impact of the Social Interest (γ)


The results showing the impact of the user interest w.r.t. the semantic similarity
is given in Fig. 3. This latter shows the evolution of the MAP and the MRR
for different values of γ, while fixing α = 0.5 and query size to 4 for our three
datasets, and using the three similarity measures. We note that the smaller the
value of γ is, the better is the performance. This can be explained by the fact
that the higher the value of the user interest part, the more resources that the
user tags are highlighted (probably other users tag them with the same tags),
and the higher is the value of the MAP and the MRR. However, we consider that
neglecting the semantic part of Eq. 4 is not suitable for the following reasons: (i)
First, if we fix γ to 0, we are going to neglect the semantic part, and perhaps
lose the query sense (even if the potential terms to expand the query are those
related to the query terms); (ii) Second, if we fix γ to 0 we are going to face
cold start problems, since new users don’t have an initial profile that allows us
to rank terms. Thus, we choose to fix γ to 0.5 for the rest of the evaluations.
Personalized Social Query Expansion Using Social Annotations 15

Fig. 3. Measuring the impact of the social interest (γ). For different values of γ, we fix
α = 0.5, query size = 4 and we use the three similarity measures and the two weighting
strategies for new terms averaged over 1000 queries, using the VSM.

4.3.2 Impact of the Query Size


The objective here is to check if the length of a query impacts the obtained
results. The results are illustrated in Fig. 4. Through all the experiments we
have performed, it comes out that the maximum performance is achieved while
adding 4 to 6 related terms to the query. Adding more than 6 related terms
has no impact on the quality of the results when using values of Eq. 4 as weight

Fig. 4. Evaluating the impact of the query size on the expansion. For different values
of the query size, we use γ = 0.5, α = 0.5 and our two strategies of weighting new
terms.
16 M. R. Bouadjenek et al.

for new term. This has even a negative impact when using TF-IDF values for
term weighting as Fig. 4 shows. For the first case, this is due to the fact that the
weight of the added terms is close to 0 (we remind that the weight of the added
terms is the value of Eq. 4). Hence, this makes it natural and intuitive to pick a
value in the provided interval, between 4 and 6.

4.3.3 Impact of the Users and Resources (α)


The importance of users and resources on the way the expansion is performed
can be tuned by the parameter α of Eq. 1. Fixing α = 0 considers only links
between tags based on common users while fixing α = 1 considers only links
between tags based on common resources. The results regarding this parameter
are illustrated in Fig. 5, where the MAP and the MRR’s behaviors are quite
different on the three datasets.

Fig. 5. Evaluating the impact of the users/resources on the expansion. For values of α,
using the three similarity measures, γ = 0.5, query size = 4 and for our two strategies
of weighting new terms.

Indeed, in the delicious dataset, the values of the MAP and MRR increase
by increasing the value of α using both the Jaccard and the Dice similarities
achieving an optimal performance at α = 1. As for Flickr and CiteULike, the
optimal performance is achieved for α = 0.2 and α = 0.5 respectively. We believe
that this is due to the fact that in social bookmarking systems like delicious,
users are expected to share and annotate the same resources (URLs in delicious)
to give rise to less private resources. Therefore, annotations are expected to
occur more on resources than on users. However, in social bookmarking systems
like Flickr and CiteULike, users are expected to upload their own resources
(images and papers) resulting in more private resources. Thus, annotations are
Personalized Social Query Expansion Using Social Annotations 17

expected to occur more on users than on resources, a property which has been
also observed and reported in [16].

4.3.4 Impact of the Weight of Terms


In Sect. 3.2.3, we explain that we experiment two strategies for weighting the
new expanded terms by either (i) using value of Formula 4, or (ii) the TF-IDF
value using Formula 5. We note that the performances follow almost the same
distribution while varying γ and α in Fig. 3 and 5, and for our three similarity
measures over our three datasets. However, we report that each time, the TF-
IDF weighting strategy provides better performance. Hence, we conclude that
personalizing the term weighting is less advantageous and less efficient comparing
to a uniform weighting approach as used in the second strategy.

4.3.5 Impact of the Similarity Measures


The behavior of the performance seem to be the same for the three similarity
measures with each time a small advantage to the Dice measure. Hence, taking
into account the ratio between all the entities to which two tags are associated
together versus the union of these entities leads to a better estimation of the
similarity in folksonomies.

4.4 Comparison with Existing Approaches


Our objective here is to estimate how well our approach meets the users’ infor-
mation needs and compare its retrieval quality to that of other approaches,
objectively. Our approach is evaluated using the optimal values computed in the
previous section and using our two strategies of term weighting as explained in
Sect. 3.2.3. The results are illustrated in Fig. 6 as “PSQE-W = Ranking” for the
first strategy and “PSQE-W = TFIDF” for the second strategy, where we select
four baselines for comparison as described in the following. Note that we choose
the parameters that give the optimal performance for each of these baselines.

4.4.1 PSQE vs NoQE


The first approach for comparison is that with no query expansion or personal-
ization. Documents that match queries are retrieved, and ranked as explained
above. We report the following improvements:

Delicious dataset: we obtain an improvement of almost 13% of the MAP


and 18% of the MRR for our first strategy of term weighting using the Overlap
similarity measure, and an improvement of almost 16% of the MAP and 24%
of the MRR for our second strategy of term weighting using the Dice similarity
measure.
Flickr dataset: we obtain an improvement of almost 13% of the MAP and
21% of the MRR for our first strategy of term weighting using the Overlap
similarity measure, and an improvement of almost 14% of the MAP and 21%
18 M. R. Bouadjenek et al.

Fig. 6. Comparison with the different baselines of the MAP and MRR, while fixing
γ = 0.5 and query size = 4, using the delicious, Flickr, and CiteULike datasets. We
choose the optimal value of α for each similarity measure.

of the MRR for our second strategy of term weighting using the Dice similarity
measure.
CiteULike dataset: we obtain an improvement of almost 10% of the MAP
and 7% of the MRR for our first strategy of term weighting using the Jaccard
similarity measure, and an improvement of almost 15% of the MAP and 14%
of the MRR for our second strategy of term weighting using the Overlap
similarity measure.

Thus, it is clear that the query expansion has an evident advantage compared
to a strategy with no expansion. We refer to this approach as NoQE in Fig. 6.

4.4.2 PSQE vs N-BasedExp


The second approach is the neighborhood based approach, which is based on
the co-occurrence of terms over resources. This approach consists of enriching
the query q with the most related terms without considering the user profile.
Thus, queries are enriched similarly for each user. Our approach significantly
outperform the neighborhood based approach as follows:
Personalized Social Query Expansion Using Social Annotations 19

Delicious dataset: we obtain an improvement of almost 12% of the MAP


and 19% of the MRR for our first strategy of term weighting using the Overlap
similarity measure, and an improvement of almost 14% of the MAP and 22%
of the MRR for our second strategy of term weighting using the Dice similarity
measure.
Flickr dataset: we obtain an improvement of almost 8% of the MAP and
12% of the MRR for our first strategy of term weighting using the Overlap
similarity measure, and an improvement of almost 9% of the MAP and 12% of
the MRR for our second strategy of term weighting using the Dice similarity
measure.
CiteULike dataset: we obtain an improvement of almost 8% of the MAP
and 5% of the MRR for our first strategy of term weighting using the Jaccard
similarity measure, and an improvement of almost 13% of the MAP and 12%
of the MRR for our second strategy of term weighting using the Overlap
similarity measure.

Therefore, we conclude that our personalized query expansion efforts bring


a considerable contribution according to an approach based on the most related
terms. We refer to this approach as N-BasedExp in Fig. 6.

4.4.3 PSQE vs ExSemSe


The third approach is an approach proposed in [4], which is a strategy that
uses semantic search with query expansion named Expanded Semantic Search.
In summary, this strategy consists of adding to the query q, k possible expansion
tags with the largest similarity to the original tags in order to enrich its results.
For each query, the query initiator u, ranks results using BM25 and tag similarity
scores. We implemented this strategy and evaluated it over our datasets. We refer
to this approach as ExSemSe in Fig. 6. We report the following improvements:

Delicious dataset: we obtain an improvement of almost 5% of the MAP


and 7% of the MRR for our first strategy of term weighting using the Overlap
similarity measure, and an improvement of almost 7% of the MAP and 10% of
the MRR for our second strategy of term weighting using the Dice similarity
measure.
Flickr dataset: we obtain an improvement of almost 11% of the MAP and
16% of the MRR for our first strategy of term weighting using the Overlap
similarity measure, and an improvement of almost 12% of the MAP and 16%
of the MRR for our second strategy of term weighting using the Dice similarity
measure.
CiteULike dataset: we obtain an improvement of almost 12% of the MAP
and 10% of the MRR for our first strategy of term weighting using the Jaccard
similarity measure, and an improvement of almost 17% of the MAP and 17%
of the MRR for our second strategy of term weighting using the Overlap
similarity measure.
20 M. R. Bouadjenek et al.

4.4.4 PSQE vs TagRank


The fourth approach is an approach proposed in [6], which is an algorithm called
TagRank that automatically determines which tags best expand a list of tags in a
given query. We implemented this strategy and evaluated it over our datasets. We
refer to this approach as TagRank in Fig. 6. We report the following improve-
ments:

Delicious dataset: we obtain an improvement of almost 18.10% of the MAP


and 21, 79% of the MRR for our first strategy of term weighting using the
Overlap similarity measure, and an improvement of almost 20.83% of the
MAP and 26.42% of the MRR for our second strategy of term weighting
using the Dice similarity measure.
Flickr dataset: we obtain an improvement of almost 12.20% of the MAP and
16, 67% of the MRR for our first strategy of term weighting using the Overlap
similarity measure, and an improvement of almost 12.94% of the MAP and
17.58% of the MRR for our second strategy of term weighting using the Dice
similarity measure.
CiteULike dataset: we obtain an improvement of almost 10.23% of the
MAP and 8, 79% of the MRR for our first strategy of term weighting using
the Jaccard similarity measure, and an improvement of almost 16.49% of the
MAP and 18.35% of the MRR for our second strategy of term weighting using
the Overlap similarity measure.

In summary, the obtained results show that our approach of personalization


in query expansion using social knowledge may significantly improve web search.
By comparing the PSQE framework to the closest state of the art approaches, we
show that it is a very competitive approach that mays provide high quality results
whatever the dataset used. Finally, we notice that the better performance are
obtained with the Dice similarity measure and using TF-IDF for term weighting
over our three datasets.

5 Related Work
Current models of information retrieval are blind to the social context that
surrounds information resources, e.g., the authorship and usage of information
sources, and the social context of the user that issues the query, i.e., his social
activities of commenting, rating and sharing resources in social platforms. There-
fore, recently, the fields of Information Retrieval (IR) and Social Networks Anal-
ysis (SNA) have been bridged resulting in Social Information Retrieval (SIR)
models [20]. These models are expected to extend conventional IR models to
incorporate social information [11].
In this paper, we are mainly interested in how to use social information to
improve classic web search, in particular the query expansion process. Hence, we
cite in the following, the main works that deal with social query expansion:
Biancalana et al. [7] proposed Nereau, a Query expansion strategy where
the co-occurrence matrix of terms in documents is enhanced with meta-data
Personalized Social Query Expansion Using Social Annotations 21

retrieved from social bookmarking services. The system can record and interpret
users’ behavior, in order to provide personalized search results, according to their
interests in such a way that allows the selection of terms that are candidates of
the expansion based on original terms inserted by the user.
Bender et al. [4] consider SIR from both the query expansion and results
ranking and propose a model that deals more with ranking results than query
expansion. Lioma et al. [27] provide Social-QE by considering the query expan-
sion (QE) as a logical inference and by considering the addition of tags as an
extra deduction to this process. In the same spirit, Jin et al. [24] propose a
method in which the used expansion terms are selected from a large amount of
social tags in folksonomy. A tag co-occurrence method for similar terms selec-
tion is used to choose good expansion terms from the candidate tags directly
according to their potential impact on the retrieval effectiveness. The work in
[29] proposes a unified framework to address complex queries on multi-modal
“social” collections. The approach they proposed includes a query expansion
strategy that incorporates both textual and social elements. Finally, Lin et al.
[26] propose this to enrich the source of terms expansion initially composed of
relevant feedback data with social annotations. In particular, they propose a
learning term ranking approach based on this source in order to enhance and
boost the IR performances. Note that in these works, there is no personalization
of the expansion process.
Bertier et al. [6] propose TagRank algorithm, an adaptation of the celebrated
PageRank algorithm, which automatically determines which tags best expand
a list of tags in a given query. This is achieved by creating and maintaining a
TagMap matrix, a central abstraction that captures the personalized relation-
ships between tags, which is constructed by dynamically computing the estima-
tion of a distance between taggers, based on cosine similarity between tags and
items. From our point of view, the proposed solution is not really suitable, since
it needs the creation and the maintenance of a TagMap matrix for each user and
the execution of an algorithm for determining close users with a high complexity.
Finally, a more recent work by Zhou et al. [44] proposes first a model to con-
struct user profiles using tags and annotations together with documents retrieved
from an external corpus. The model integrates the word embeddings text repre-
sentation, with topic models in two groups of pseudo-aligned documents. Based
on user profiles, the authors built two query expansion techniques based on:
(i) topical weights-enhanced word embeddings, and (ii) the topical relevance
between the query and the terms inside a user profile.

6 Conclusion and Future Work


This paper discusses a contribution to the area of query expansion leveraging
the social context of the Web. We proposed a new approach based on social
personalization to transform an initial query q to another query q  enriched
with close terms that are mostly used by not only a given user but also by
his social relatives. Given a social graph (folksonomy), the proposed approach
Another random document with
no related content on Scribd:
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status by
the Internal Revenue Service. The Foundation’s EIN or federal
tax identification number is 64-6221541. Contributions to the
Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500


West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws


regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or
determine the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states


where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot


make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current


donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several


printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.

You might also like