(FREE PDF Sample) Transactions On Large Scale Data and Knowledge Centered Systems XL Abdelkader Hameurlain Ebooks
(FREE PDF Sample) Transactions On Large Scale Data and Knowledge Centered Systems XL Abdelkader Hameurlain Ebooks
(FREE PDF Sample) Transactions On Large Scale Data and Knowledge Centered Systems XL Abdelkader Hameurlain Ebooks
com
https://textbookfull.com/product/transactions
-on-large-scale-data-and-knowledge-centered-
systems-xl-abdelkader-hameurlain/
textbookfull
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxv-1st-edition-abdelkader-
hameurlain/
https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxxix-special-issue-on-
database-and-expert-systems-applications-abdelkader-hameurlain/
https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxxi-special-issue-on-data-
and-security-engineering-1st-edition-abdelkader-hameurlain/
https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxiii-selected-papers-from-
fdse-2014-1st-edition-abdelkader-hameurlain/
Transactions on Large Scale Data and Knowledge Centered
Systems XXXIV Special Issue on Consistency and
Inconsistency in Data Centric Applications 1st Edition
Abdelkader Hameurlain
https://textbookfull.com/product/transactions-on-large-scale-
data-and-knowledge-centered-systems-xxxiv-special-issue-on-
consistency-and-inconsistency-in-data-centric-applications-1st-
edition-abdelkader-hameurlain/
https://textbookfull.com/product/large-scale-data-analytics-
chung-yik-cho/
https://textbookfull.com/product/large-scale-data-handling-in-
biology-karol-kozak/
https://textbookfull.com/product/big-data-analytics-for-large-
scale-multimedia-search-stefanos-vrochidis/
https://textbookfull.com/product/large-scale-integrated-energy-
systems-planning-and-operation-qing-hua-wu/
Franck Morvan • Lynda Tamine
LNCS 11360 Journal Subline Guest Editors
Transactions on
Large-Scale
Data- and Knowledge-
Centered Systems XL
Abdelkader Hameurlain • Roland Wagner
Editors-in-Chief
123
Lecture Notes in Computer Science 11360
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology Madras, Chennai, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
More information about this series at http://www.springer.com/series/8637
Abdelkader Hameurlain Roland Wagner
•
Transactions on
Large-Scale
Data- and Knowledge-
Centered Systems XL
123
Editors-in-Chief
Abdelkader Hameurlain Roland Wagner
IRIT, Paul Sabatier University FAW, University of Linz
Toulouse, France Linz, Austria
Guest Editors
Franck Morvan Lynda Tamine
IRIT, Paul Sabatier University IRIT, Paul Sabatier University
Toulouse, France Toulouse, France
This volume contains five fully revised selected regular papers, covering a wide range
of very hot topics in the fields of social networks, data stream systems, and linked data.
These include personalized social query expansion approaches, continuous query on
social media streams, elastic processing systems, and semantic interoperability for
smart grids and NoSQL environments. We would like to sincerely thank the editorial
board and the external reviewers for their thorough reviews of the submitted papers and
ensuring the high quality of this volume.
Special thanks go to Gabriela Wagner for her availability and her valuable work in
the realization of this TLDKS volume.
Editorial Board
Reza Akbarinia Inria, France
Bernd Amann LIP6 – UPMC, France
Dagmar Auer FAW, Austria
Djamal Benslimane Lyon 1 University, France
Stéphane Bressan National University of Singapore, Singapore
Mirel Cosulschi University of Craiova, Romania
Dirk Draheim Tallin University of Technology, Estonia
Johann Eder Alpen Adria University Klagenfurt, Austria
Anastasios Gounaris Aristotle University of Thessaloniki, Greece
Theo Härder Technical University of Kaiserslautern, Germany
Sergio Ilarri University of Zaragoza, Spain
Petar Jovanovic Universitat Politècnica de Catalunya, BarcelonaTech, Spain
Dieter Kranzlmüller Ludwig-Maximilians-Universität München, Germany
Philippe Lamarre INSA Lyon, France
Lenka Lhotská Technical University of Prague, Czech Republic
Vladimir Marik Technical University of Prague, Czech Republic
Jorge Martinez Gil Software Competence Center Hagenberg, Austria
Franck Morvan Paul Sabatier University, IRIT, France
Torben Bach Aalborg University, Denmark
Pedersen
Günther Pernul University of Regensburg, Germany
Soror Sahri LIPADE, Descartes Paris University, France
A Min Tjoa Vienna University of Technology, Austria
Shaoyi Yin Paul Sabatier University, France
Osmar Zaiane University of Alberta, Edmonton, Cananda
External Reviewers
José María De Universidad Carlos III de Madrid, Spain
Fuentes
Tamer Elsayed Qatar University, Qatar
Evangelos Kanoulas University of Amsterdam, The Netherlands
Riad Mokadem Paul Sabatier University, France
Eric Pardede La Trobe University, Australia
Contents
Bridging the Semantic Web and NoSQL Worlds: Generic SPARQL Query
Translation and Application to MongoDB . . . . . . . . . . . . . . . . . . . . . . . . . 125
Franck Michel, Catherine Faron-Zucker, and Johan Montagnat
1 Introduction
Web 2.0 has strengthened end-users position in the Web through their inte-
gration in the heart of the content generation ecosystem. This has been made
possible mainly through the availability of tools such as social networks, social
bookmarking systems, social news sites, etc., impacting the way information is
produced, processed, and consumed by both humans and machines. As a result,
c Springer-Verlag GmbH Germany, part of Springer Nature 2019
A. Hameurlain et al. (Eds.): TLDKS XL, LNCS 11360, pp. 1–25, 2019.
https://doi.org/10.1007/978-3-662-58664-8_1
2 M. R. Bouadjenek et al.
on the one hand, the user is no longer able to digest the large quantity of infor-
mation he has access to and is generally overwhelmed by it. On the other hand,
most of popular Information Retrieval (IR) systems lack in offering efficient per-
sonalization techniques, which provide users only with the necessary information
that fulfill their needs. Two types of constraints make the situation more com-
plex: information-dependent constraints and user-dependent constraints. The
first class of constraints includes (i) the large scale due to the continuous activi-
ties of users and their ability to generate new content, (ii) information diversity
or heterogeneity, since different types of media are used to communicate, e.g.,
text, image, video, etc. (iii) versatility, since information is dynamic and is con-
tinuously updated (confirmed, contradicted, etc.), (iv) its disparity, since it can
be in different places, and as a result (v) the variation in the quality of infor-
mation. The second class of constraints is mainly related to users’ diversity and
the high dynamics in their profiles.
To improve the IR process and reduce the amount of irrelevant documents,
there are mainly three possible improvement tracks: (i) query reformulation using
extra knowledge, i.e., expansion or refinement of the user query, (ii) post filtering
or re-ranking of the retrieved documents (based on the user profile or context),
and (iii) improvement of the IR model, i.e., reengineering of the IR process to
integrate contextual information and relevant ranking functions. In this paper,
we focus on query reformulation, especially on personalized query expansion for
personalized search, i.e., personalizing the reformulation of queries.
Query expansion consists of enriching the user’s initial query with additional
information so that the IR system may propose suitable results that better satisfy
user’s needs [14,15,19]. We explore the possibility of using the data available in
social networks, and more precisely data of social bookmarking systems, as a
source of explicit feedback information. These latter enable users to freely add,
annotate, edit, and share bookmarks of web resources, e.g., web pages. Basically,
we propose an approach which reuses the users vocabulary (the terms used to
annotate web pages) in order to expand their queries in a personalized way and
thus, increase their satisfaction regarding the quality of search. Exploiting social
knowledge for improving web search has a number of advantages:
– Feedback information in social networks is provided directly by the user,
so users interests accurate information can be harvested as people actively
express their opinions on social platforms. Thus, this user interest can be
easily modeled to provide personalized services.
– A huge amount of social information is published and available with the
agreement of the publishers. Exploiting these information should not violate
user privacy, in particular social tagging information, which doesn’t contain
sensitive information about users.
– Finally, social resources are often publicly accessible, as most of social net-
works provide APIs to access their data (even if often, a contract must be
established before any use of the data).
Personalized Social Query Expansion Using Social Annotations 3
Our approach in this work1 consists of three main steps: (i) determining
similar and related tags to a given query term through their co-occurrence over
resources and users, (ii) constructing a profile of the query issuer based on his
tagging activities, which is maintained and used to compute expansions, and
finally, (iii) expanding the query terms, where each term is enriched with the
most interesting tags based on their similarities and their interest to the user.
The problem we are tackling in this paper is strongly related to personaliza-
tion since we want to expand queries in a personalized way and consequently
propose adapted search results. Personalization allows to differentiate between
individuals by emphasizing on their specific domains of interest and their prefer-
ences. It is a key point in IR and its demand is constantly increasing by various
users for adapting their results [3]. Several techniques exist to provide person-
alized services among which the user profiling. The user profile is a collection
of personal information associated to a specific user that enables to capture his
interests. Details of how we model user profiles are given in Sects. 2 and 3.1.4.
The main contributions of this work can be summarized as follows:
The rest of this paper is organized as follows: in Sect. 2 we introduce all the
concepts that we use throughout this paper. Section 3 introduces our method
of query expansion using folksonomy. In Sect. 4, we discuss the different experi-
ments that evaluate the performance of our approach. Related work is discussed
in Sect. 5. Finally, we conclude and provide some future directions in Sect. 6.
In this section, we formally define the basic concepts that we use throughout this
paper namely, a bookmarks, a folksonomy, and a user profile. We also provide a
formal definition of the problem we are intending to solve.
1
This is an extended and revised version of a preliminary conference report that was
presented in [12].
4 M. R. Bouadjenek et al.
2.1 Background
2
http://www.delicious.com/.
3
http://www.youtube.com/.
4
http://www.flickr.com/.
5
http://www.citeulike.org/.
Personalized Social Query Expansion Using Social Annotations 5
The offline part is also decomposed into two facets: (i) the transformation of
the social graph of a folksonomy F into a graph of tags, representing similarities
between tags that either occur on the same resources or are shared by the same
users, and (ii) the computation of the users’ profiles to highlight their interests
for personalizing their queries.
The approach is based on the creation and the maintenance of a graph of
tags that represents all the similarities that exist between the tags of F. There
exist two kinds of approaches that propose to achieve that: (i) an approach based
on the co-occurrence of tags over resources, and (ii) an approach based on their
co-occurrence over users.
Therefore, we may obtain either a graph of tags TR using the Jaccard, the
Dice, or the Overlap. Note that we do not merge the similarity measures in a
same graph of tags, meaning that a graph of tags is constructed using only one
similarity measure.
We end-up with an undirected weighted graph in which nodes represent tags,
and an edge between two tags represents the fact that these tags occur together
at least on one resource. The weights associated to edges are computed from
similarities between tags as explained beforehand. This first step is illustrated
in the left upper part of Fig. 2.
Personalized Social Query Expansion Using Social Annotations 7
Fig. 2. Summary of the graph reduction process, which transform the whole folksonomy
F into a graph of tags TU R . The similarity values on the Figure are computed using
the Jaccard measure on both graphs TR and TU , and using α = 0.5 on the graph TU R .
This process is illustrated in the right upper part of Fig. 2. Notice that the
structure of the graph of tags TR is different from the one of the graph of
tags TU .
Where, SimTU R (ti , tj ) calculates the similarity between two tags relying on the
two other types of nodes, i.e., users and resources. The parameter α represents
the importance one wants to give to the two types of graphs, i.e., resources or
users, in the consideration of the similarity calculation. In fact, depending on the
context, when computing the similarity between two tags, one may want to give
a higher importance to users sharing these two tags than documents having these
tags as a common tags. Another user may want to give more importance to their
co-occurrence over resources than to the users sharing these tags. Depending on
the nature of the folksonomy, we set α to its optimal value in order to maximize
the tags semantics extraction. Finally, it should be noted that the merge is
performed between graphs generated with the same similarity measure.
This step of the offline part extracts semantics from the whole social graph
of F without a loss of information, i.e., by exploiting the co-occurrences of tags
over resources and users. This step leads to the creation of a graph of tags, where
edges represent semantic relations between tags. This graph will be further used
to extract terms that are semantically related to a given term of a query to
perform the query expansion. The contribution at this stage is the combination
of the graphs resulting from resources and users to construct a better graph of
tag similarities without loss of information. This is different from the existing
approaches where only one graph is used.
In the following, we introduce our method of constructing and weighting the
user profiles in order to personalize the expansions.
Personalized Social Query Expansion Using Social Annotations 9
where nti ,uj is the number of time the user uj used the tag ti .
A high value of utf-iuf is reached by a high user term frequency and a low
user frequency of the term in the whole set of users. Note that we perform a
stemming on tags before computing the profiles, to eliminate the differences
between terms having the same root to better estimate the weight of each term.
User profiles are created offline and maintained incrementally. This is moti-
vated by the fact that profiles and tagging actions are not evolving as quickly
as query formulation on the system. As an analogy, it is well known that 90%
of users in the social Web consume the content (i.e., query formulation), 9%
update content, and 1% generate new content (profile updates) [34]. Thus, we
have decided to handle the profile construction as an offline task while providing
a maintenance process for keeping it up to date.
In summary, at the end of the offline part, we build two assets: (i) a graph
of tags similarities which is used to represent semantically relatedness of terms,
and (ii) user profiles which are leveraged in the personalization step.
where Sim(ti , tj ) is the similarity between the term ti and tj , the j th term of the
user profile, and wj is the weight of the term tj in the profile computed during
the previous process. Notice that any similarity measure can be used for com-
puting Sim(ti , tj ), as discussed in [30]. In this work, we consider the Jaccard, the
Overlap, and the Dice similarity measures, as discussed in the previous sections.
that indicates its similarity with ti w.r.t. the user u using formula 4 (line 5).
Next, the neighbor list has to be sorted according to the computed values and
we keep only the k top tags (line 7). Finally, ti and its remaining neighbors must
be linked with the OR (∨) logical connector (line 8) and updated in q .
It should be noted that in this paper, we consider that the selection of each
query term is determined independently, without considering latent term rela-
tions. Most past work on modeling term dependencies has analyzed three dif-
ferent underlying dependency assumptions: full independence, sequential depen-
dence [39], and full dependence [32]. Taking into account terms dependency is
part of our future works.
– Using the ranking values of Formula 4 as the weight of the new expanded
terms. This strategy provides personalized term weight assignment while con-
sidering both semantic strength and user interest.
12 M. R. Bouadjenek et al.
where tfti denotes the term frequency of ti in the query q. This strategy
provides a uniform term weight to the query while keeping the personalizing
aspect in choosing terms. Notice that weights are assigned to terms in the
line 9 of Algorithm 1.
4 Evaluations
In this section, we describe the two types of evaluations we performed on our
approach: (i) an estimation of the parameters of our approach to provide insights
regarding their potential impact on the system, and (ii) a comparison study,
where our approach is compared to the closest state of the art approaches to
provide insights about the obtained results and position the proposal.
4.1 Datasets
A number of social bookmarking systems exist [21]. We have selected three
datasets to perform an offline evaluation: delicious, flickr and CiteULike. These
datasets are available and public. The interest of using such data instead of
crawled data is to work on widely accepted data sets, reduce the risk of noise,
and an ability to reproduce the evaluations by others as well as the ability to
compare our approach to other approaches on “standardized datasets”. Hereafter
is the description of the different datasets.
– Delicious: a social bookmarking web service for storing, sharing, and dis-
covering web bookmarks. We have used a dataset which is described and
analyzed in [42]6 .
– Flickr: an image hosting, tagging and sharing website. The Flickr dataset is
the one used and studied in [38]7 .
– CiteULike: an online bookmarking service that allows users to bookmark
academic articles. This dataset is the one provided by the CiteULike website8 .
Before the experiments, we performed three data preprocessing tasks: (1) Sev-
eral annotations are too personal or meaningless, such as “toread”, “Imported
IE Fa-vorites”, “system:imported”, etc. We remove some of them manually. (2)
Although the annotations from delicious are easy for users to read and under-
stand, they are not designed for machine use. For example, some users may
concatenate several words to form an annotation such as “java.programming”
6
http://data.dai-labor.de/corpus/delicious/.
7
http://www.tagora-project.eu/data/#flickrphotos.
8
http://static.citeulike.org/data/2007-05-30.bz2.
Personalized Social Query Expansion Using Social Annotations 13
Hence, in the off-line study, for each evaluation, we randomly select 2, 000
pairs (u, t), which are considered to form a personalized query set. For each
corresponding pair (u, t), we remove all the bookmarks (u, t, r) ∈ F, ∀r ∈ R in
order to not promote the resource r in the obtained results. For each pair, the
user u sends the query q = {t} to the system. Then, the query q is enriched and
transformed into q following our approach. For the delicious dataset, documents
that match q are retrieved, ranked and sorted using the Apache Lucene. For the
Flickr and CiteULike datasets, we retrieve all resources that are annotated with
tags of q while representing them according to the Vector Space Model (VSM).
Then, the cosine similarity is used to compute similarity between a query q and
a resource rj .
For the Flickr and CiteULike datasets, we rank all the retrieved resources
using values of the cosine similarity and we consider that relevant resources are
14 M. R. Bouadjenek et al.
those tagged by u using tags of q to assess the obtained results. The random
selection was carried out 10 times independently, and we report the average
results.
A query expansion is expected to provide more resources as an answer to a
query because of its enrichment, which generally causes an increase in the total
recall. In our evaluation, we are more interested in studying the ability of the
method to push relevant documents to the top of the ranking. Thus, we use
the Mean Average Precision (MAP) and the Mean Reciprocal Rank (MRR), two
performance measures that take into account the ranking of relevant resources.
Fig. 3. Measuring the impact of the social interest (γ). For different values of γ, we fix
α = 0.5, query size = 4 and we use the three similarity measures and the two weighting
strategies for new terms averaged over 1000 queries, using the VSM.
Fig. 4. Evaluating the impact of the query size on the expansion. For different values
of the query size, we use γ = 0.5, α = 0.5 and our two strategies of weighting new
terms.
16 M. R. Bouadjenek et al.
for new term. This has even a negative impact when using TF-IDF values for
term weighting as Fig. 4 shows. For the first case, this is due to the fact that the
weight of the added terms is close to 0 (we remind that the weight of the added
terms is the value of Eq. 4). Hence, this makes it natural and intuitive to pick a
value in the provided interval, between 4 and 6.
Fig. 5. Evaluating the impact of the users/resources on the expansion. For values of α,
using the three similarity measures, γ = 0.5, query size = 4 and for our two strategies
of weighting new terms.
Indeed, in the delicious dataset, the values of the MAP and MRR increase
by increasing the value of α using both the Jaccard and the Dice similarities
achieving an optimal performance at α = 1. As for Flickr and CiteULike, the
optimal performance is achieved for α = 0.2 and α = 0.5 respectively. We believe
that this is due to the fact that in social bookmarking systems like delicious,
users are expected to share and annotate the same resources (URLs in delicious)
to give rise to less private resources. Therefore, annotations are expected to
occur more on resources than on users. However, in social bookmarking systems
like Flickr and CiteULike, users are expected to upload their own resources
(images and papers) resulting in more private resources. Thus, annotations are
Personalized Social Query Expansion Using Social Annotations 17
expected to occur more on users than on resources, a property which has been
also observed and reported in [16].
Fig. 6. Comparison with the different baselines of the MAP and MRR, while fixing
γ = 0.5 and query size = 4, using the delicious, Flickr, and CiteULike datasets. We
choose the optimal value of α for each similarity measure.
of the MRR for our second strategy of term weighting using the Dice similarity
measure.
CiteULike dataset: we obtain an improvement of almost 10% of the MAP
and 7% of the MRR for our first strategy of term weighting using the Jaccard
similarity measure, and an improvement of almost 15% of the MAP and 14%
of the MRR for our second strategy of term weighting using the Overlap
similarity measure.
Thus, it is clear that the query expansion has an evident advantage compared
to a strategy with no expansion. We refer to this approach as NoQE in Fig. 6.
5 Related Work
Current models of information retrieval are blind to the social context that
surrounds information resources, e.g., the authorship and usage of information
sources, and the social context of the user that issues the query, i.e., his social
activities of commenting, rating and sharing resources in social platforms. There-
fore, recently, the fields of Information Retrieval (IR) and Social Networks Anal-
ysis (SNA) have been bridged resulting in Social Information Retrieval (SIR)
models [20]. These models are expected to extend conventional IR models to
incorporate social information [11].
In this paper, we are mainly interested in how to use social information to
improve classic web search, in particular the query expansion process. Hence, we
cite in the following, the main works that deal with social query expansion:
Biancalana et al. [7] proposed Nereau, a Query expansion strategy where
the co-occurrence matrix of terms in documents is enhanced with meta-data
Personalized Social Query Expansion Using Social Annotations 21
retrieved from social bookmarking services. The system can record and interpret
users’ behavior, in order to provide personalized search results, according to their
interests in such a way that allows the selection of terms that are candidates of
the expansion based on original terms inserted by the user.
Bender et al. [4] consider SIR from both the query expansion and results
ranking and propose a model that deals more with ranking results than query
expansion. Lioma et al. [27] provide Social-QE by considering the query expan-
sion (QE) as a logical inference and by considering the addition of tags as an
extra deduction to this process. In the same spirit, Jin et al. [24] propose a
method in which the used expansion terms are selected from a large amount of
social tags in folksonomy. A tag co-occurrence method for similar terms selec-
tion is used to choose good expansion terms from the candidate tags directly
according to their potential impact on the retrieval effectiveness. The work in
[29] proposes a unified framework to address complex queries on multi-modal
“social” collections. The approach they proposed includes a query expansion
strategy that incorporates both textual and social elements. Finally, Lin et al.
[26] propose this to enrich the source of terms expansion initially composed of
relevant feedback data with social annotations. In particular, they propose a
learning term ranking approach based on this source in order to enhance and
boost the IR performances. Note that in these works, there is no personalization
of the expansion process.
Bertier et al. [6] propose TagRank algorithm, an adaptation of the celebrated
PageRank algorithm, which automatically determines which tags best expand
a list of tags in a given query. This is achieved by creating and maintaining a
TagMap matrix, a central abstraction that captures the personalized relation-
ships between tags, which is constructed by dynamically computing the estima-
tion of a distance between taggers, based on cosine similarity between tags and
items. From our point of view, the proposed solution is not really suitable, since
it needs the creation and the maintenance of a TagMap matrix for each user and
the execution of an algorithm for determining close users with a high complexity.
Finally, a more recent work by Zhou et al. [44] proposes first a model to con-
struct user profiles using tags and annotations together with documents retrieved
from an external corpus. The model integrates the word embeddings text repre-
sentation, with topic models in two groups of pseudo-aligned documents. Based
on user profiles, the authors built two query expansion techniques based on:
(i) topical weights-enhanced word embeddings, and (ii) the topical relevance
between the query and the terms inside a user profile.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.