Nothing Special   »   [go: up one dir, main page]

skip to main content
article

MiTexCube: MicroTextCluster Cube for online analysis of text cells and its applications

Published: 01 June 2013 Publication History

Abstract

A fundamental problem of multidimensional text database analysis is efficient and effective support of various kinds of online applications, such as summarizing the content of a text cell or comparing the contents across multiple text cells. In this paper, we propose a new infrastructure called MicroTextCluster Cube or MiTexCube to support efficient online text analysis on multidimensional text databases by introducing micro-clusters of text documents as a compact representation of text content. Experimental results on real multidimensional text databases show that i MiTexCube can be materialized efficiently with reasonable overhead in space, and ii applications based on the proposed materialized MiTexCube are more efficient than the baseline method of direct analysis based on document units in each cell, without sacrificing much quality of analysis, and MiTexCube naturally accommodates flexible trade-off between efficiency and quality of analysis. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 6: 243-259, 2013

References

[1]
<label>1</label> Aviation safety reporting system, "http://asrs.arc.nasa.gov/", 2012.
[2]
<label>2</label> C. X.Lin, B.Ding, J.Han, F.Zhu, and B.Zhao, Text cube: computing IR measures for multidimensional text database analysis, ICDM, 2008, pp.905-910.
[3]
<label>3</label> D.Zhang, C.Zhai, and J.Han, Topic cube: topic modeling for OLAP on multidimensional text databases, SDM 2009.
[4]
<label>4</label> T.Zhang, R.Ramakrishnan, and M.Livny, Birch: an efficient data clustering method for very large databases, SIGMOD Rec Volume 25 2 1996, pp.103-114.
[5]
<label>5</label> D. L.Davies and D. W.Bouldin, A cluster separation measure, IEEE Trans Pattern Anal Mach Intell Volume 1 1979, pp.224-227.
[6]
<label>6</label> The dblp computer science bibliography, "http://www.informatik.uni-trier.de/~ley/db/", 2012.
[7]
<label>7</label> S.Agarwal, R.Agrawal, P.Deshpande, A.Gupta, J. F.Naughton, R.Ramakrishnan, and S.Sarawagi, On the computation of multidimensional aggregates, VLDB'96, pp.506-521.
[8]
<label>8</label> S.Chaudhuri and U.Dayal, An overview of data warehousing and olap technology, SIGMOD Rec Volume 26 1 1997, pp.65-74.
[9]
<label>9</label> J.Gray, S.<?givenNamesStart ?>Chaudhuri<?givenNamesEnd ?>, A.Bosworth, A.Layman, D. Reichart, M.Venkatrao, F.Pellow, and H.Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery, Vol. 1, pp.29-53, Kluwer Academic Publishers, Hingham, MA, USA, 1997.
[10]
<label>10</label> F. M.fei Jiang, J.Pei, and A. W.chee Fu, Ix-cubes: iceberg cubes for data warehousing and olap on xml data, CIKM'07, Lisbon, Portugal, 2007, pp.905-908.
[11]
<label>11</label> E.Lo, B.Kao, W.-S.Ho, S. D.Lee, C. K.Chui, and D. W.Cheung, Olap on sequence data, SIGMOD'08, Vancouver, Canada, 2008, pp.649-660.
[12]
<label>12</label> Y.Tian, R. A.Hankins, and J. M.Patel, Efficient aggregation for graph summarization, SIGMOD'08, Vancouver, Canada, 2008, pp.567-580.
[13]
<label>13</label> W. F.Cody, J. T.Kreulen, V.Krishna, and W. S.Spangler, The integration of business intelligence and knowledge management, IBM Syst J Volume 41 4 2002, pp.697-713.
[14]
<label>14</label> Megaputer's polyanalyst, "http://www.megaputer.com/", 2011.
[15]
<label>15</label> A.Simitsis, A.Baid, Y.Sismanis, and B.Reinwald, Multidimensional content exploration, Proc VLDB Endow Volume 1 1 2008, pp.660-671.
[16]
<label>16</label> J.Han and M.Kamber, Data Mining: Concepts and Techniques, San Francisco, CA, Morgan Kaufmann, 2000.
[17]
<label>17</label> G.Salton and M.McGill, Introduction to Modern Information Retrieval, New York, McGraw-Hill, 1983.
[18]
<label>18</label> J.Carbonell and J.Goldstein, The use of mmr, diversity-based reranking for reordering documents and producing summaries, SIGIR '98, Melbourne, Australia, 1998, pp.335-336.
[19]
<label>19</label> E.Rendón, I. M.Abundez, C.Gutierrez, S. D.Zagal, A.Arizmendi, E. M.Quiroz, and H. E.Arzate, A comparison of internal and external cluster validation indexes, In Proceedings of the 2011 American Conference on Applied Mathematics and the 5th WSEAS International Conference on Computer Engineering And Applications, AMERICAN-MATH'11/CEA'11, 2011, pp.158-163.
[20]
<label>20</label> J.Lafferty and C.Zhai, Document language models, query models, and risk minimization for information retrieval, SIGIR '01, New Orleans, Louisiana, 2001, pp.111-119.

Cited By

View all
  1. MiTexCube: MicroTextCluster Cube for online analysis of text cells and its applications

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Statistical Analysis and Data Mining
    Statistical Analysis and Data Mining  Volume 6, Issue 3
    June 2013
    114 pages
    ISSN:1932-1864
    EISSN:1932-1872
    Issue’s Table of Contents

    Publisher

    John Wiley & Sons, Inc.

    United States

    Publication History

    Published: 01 June 2013

    Author Tags

    1. MiTexCube
    2. multidimensional text database
    3. text mining

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media