Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Aggregating inconsistent information: Ranking and clustering

Published: 05 November 2008 Publication History

Abstract

We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the extent of disagreement with the respective inputs. Specifically, the problems we address are rank aggregation, the feedback arc set problem on tournaments, and correlation and consensus clustering. We show that for all these problems (and various weighted versions of them), we can obtain improved approximation factors using essentially the same remarkably simple algorithm. Additionally, we almost settle a long-standing conjecture of Bang-Jensen and Thomassen and show that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments.

References

[1]
Ailon, N. 2008. Aggregation of partial rankings, p-ratings and top-m lists. Algorithmica. DOI 10.1007/s00453-008-9211-1.
[2]
Ailon, N., and Charikar, M. 2005. Fitting tree metrics: Hierarchical clustering and phylogeny. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Pittsburgh, PA). IEEE Computer Society Press, Los Alamitos, CA, 73--82.
[3]
Ailon, N., Charikar, M., and Newman, A. 2005. Aggregating inconsistent information: Ranking and clustering. In Proceedings of the 37th Annual Symposium on the Theory of Computing (STOC) (Boston, MA). ACM, New York, 684--693.
[4]
Ailon, N., and Mohri, M. 2008. An efficient reduction of ranking to classification. In Conference on Learning Theory (COLT) (Helsinki, Finland).
[5]
Alon, N. 2006. Ranking tournaments. SIAM J. Disc. Math. 20, 1, 137--142.
[6]
Alon, N., and Spencer, J. H. 1992. The Probabilistic Method. Wiley, New York.
[7]
Arora, S., Frieze, A., and Kaplan, H. 1996. A new rounding procedure for the assignment problem with applications to dense graph arrangement problems. In Proceedings of the 37th Annual Symposium on the Foundations of Computer Science (FOCS) (Burlington, VT). IEEE Computer Society Press, Los Alamitos, CA, 24--33.
[8]
Balcan, M.-F., Bansal, N., Beygelzimer, A., Coppersmith, D., Langford, J., and Sorkin, G. B. 2007. Robust reductions from ranking to classification. In Proceedings of the Conference on Learning Theory (COLT). Lecture Notes in Computer Science, vol. 4539. Springer-Verlag, New York, 604--619.
[9]
Bang-Jensen, J., and Thomassen, C. 1992. A polynomial algorithm for the 2-path problem in semicomplete graphs. SIAM J. Disc. Math. 5, 3, 366--376.
[10]
Bansal, N., Blum, A., and Chawla, S. 2004. Correlation clustering. Mach. Learn. J. (Special Issue on Theoretical Advances in Data Clustering) 56, 1--3, 89--113. (Extended abstract appeared in FOCS 2002, pages 238--247.)
[11]
Bartholdi, J., Tovey, C. A., and Trick, M. 1989. Voting schemes for which it can be difficult to tell who won the election. Social Choice Welf. 6, 2, 157--165.
[12]
Borda, J. C. 1781. Mémoire sur les élections au scrutin. Histoire de l'Académie Royale des Sciences.
[13]
Cai, M.-C., Deng, X., and Zang, W. 2001. An approximation algorithm for feedback vertex sets in tournaments. SIAM J. Comput. 30, 6, 1993--2007.
[14]
Charikar, M., Guruswami, V., and Wirth, A. 2003. Clustering with qualitative information. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (Boston, MA). IEEE Computer Society Press, Los Alamitos, CA, 524--533.
[15]
Chaudhuri, K., Chen, K., Mihaescu, R., and Rao, S. 2006. On the tandem duplication-random loss model of genome rearrangement. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). ACM, New York, 564--570.
[16]
Condorcet, M.-J. 1785. Éssai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix.
[17]
Coppersmith, D., Fleischer, L., and Rudra, A. 2006. Ordering by weighted number of wins gives a good ranking for weighted tournaments. In Proceedings of the 17th Annual ACM-SIAM Symposium on Discrete Algorithm (SODA'06). ACM, New York, 776--782.
[18]
Diaconis, P., and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Stat. Soc., Ser. B 39, 2, 262--268.
[19]
Dinur, I., and Safra, S. 2002. On the importance of being biased. In Proceedings of the 34th Annual Symposium on the Theory of Compututing (STOC). ACM, New York, 33--42.
[20]
Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001a. Rank aggregation methods for the web. In Proceedings of the 10th International Conference on the World Wide Web (WWW10) (Hong Kong, China), 613--622.
[21]
Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001b. Rank aggregation revisited. Manuscript. (Available from: http://www.eecs.harvard.edu/~michaelm/CS222/rank2.pdf.)
[22]
Even, G., Naor, J. S., Sudan, M., and Schieber, B. 1998. Approximating minimum feedback sets and multicuts in directed graphs. Algorithmica 20, 2, 151--174.
[23]
Fagin, R., Kumar, R., and Sivakumar, D. 2003. Efficient similarity search and classification via rank aggregation. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (San Diego, CA). ACM, New York, 301--312.
[24]
Filkov, V., and Skiena, S. 2003. Integrating microarray data by consensus clustering. In Proceedings of International Conference on Tools with Artificial Intelligence (ICTAI) (Sacramento, CA). 418--425.
[25]
Frieze, A., and Kannan, R. 1999. Quick approximations to matrices and applications. Combinatorica 19, 2, 175--220.
[26]
Gionis, A., Mannila, H., and Tsaparas, P. 2005. Clustering aggregation. In Proceedings of the 21st International Conference on Data Engineering (ICDE) (Tokyo, Japan).
[27]
Hästad, J. 2001. Some optimal inapproximability results. J. ACM 48, 798--859.
[28]
Karp, R. M. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Plenum Press, New York, 85--104.
[29]
Kemeny, J. G. 1959. Mathematics without numbers. Daedalus 88, 571--591.
[30]
Kemeny, J., and Snell, J. 1962. Mathematical Models in the Social Sciences. Blaisdell, New York. (Reprinted by MIT Press, Cambridge, 1972.)
[31]
Kenyon-Mathieu, C., and Schudy, W. 2007. How to rank with few errors. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC) (New York, NY). ACM, New York, 95--103.
[32]
Newman, A. 2000. Approximating the maximum acyclic subgraph. M.S. thesis, Massachusetts Institute of Technology, Cambridge, MA.
[33]
Newman, A., and Vempala, S. 2001. Fences are futile: On relaxations for the linear ordering problem. In Proceedings of the 8th Conference on Integer Programming and Combinatorial Optimization (IPCO). 333--347.
[34]
Potts, C. N. 1980. An algorithm for the single machine sequencing problem with precedence constraints. Math. Prog. 13, 78--87.
[35]
Seymour, P. 1995. Packing directed circuits fractionally. Combinatorica 15, 281--288.
[36]
Speckenmeyer, E. 1989. On feedback problems in digraphs. Graph Theoretic Concepts in Computer Science, Lecture Notes in Computer Science, vol. 411, Springer-Verlag, New York, 218--231.
[37]
Strehl, A. 2002. PhD dissertation. Ph.D. thesis, University of Texas at Austin.
[38]
Wakabayashi, Y. 1998. The complexity of computing medians of relations. Resenhas 3, 3, 323--349.
[39]
Williamson, D. P., and van Zuylen, A. 2007. Deterministic algorithms for rank aggregation and other ranking and clustering problems. In Proceedings of the 5th Workshop on Approximation and Online Algorithms (WAOA).

Cited By

View all
  • (2024)Robust Correlation Clustering Problem with Locally Bounded DisagreementsTsinghua Science and Technology10.26599/TST.2023.901002729:1(66-75)Online publication date: Feb-2024
  • (2024)A phase transition in Arrow’s theorem with three alternativesThe Annals of Applied Probability10.1214/24-AAP204934:4Online publication date: 1-Aug-2024
  • (2024)Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant FactorJournal of the ACM10.1145/363945371:2(1-41)Online publication date: 10-Apr-2024
  • Show More Cited By

Index Terms

  1. Aggregating inconsistent information: Ranking and clustering

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Journal of the ACM
      Journal of the ACM  Volume 55, Issue 5
      October 2008
      164 pages
      ISSN:0004-5411
      EISSN:1557-735X
      DOI:10.1145/1411509
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 November 2008
      Accepted: 01 August 2008
      Revised: 01 May 2008
      Received: 01 January 2006
      Published in JACM Volume 55, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Rank aggregation
      2. consensus clustering
      3. correlation clustering
      4. minimum feedback arc-set
      5. tournaments

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)219
      • Downloads (Last 6 weeks)47
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Robust Correlation Clustering Problem with Locally Bounded DisagreementsTsinghua Science and Technology10.26599/TST.2023.901002729:1(66-75)Online publication date: Feb-2024
      • (2024)A phase transition in Arrow’s theorem with three alternativesThe Annals of Applied Probability10.1214/24-AAP204934:4Online publication date: 1-Aug-2024
      • (2024)Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant FactorJournal of the ACM10.1145/363945371:2(1-41)Online publication date: 10-Apr-2024
      • (2024)Scalable Multitask Learning Using Gradient-based Estimation of Task AffinityProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671835(1542-1553)Online publication date: 25-Aug-2024
      • (2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
      • (2024)Understanding the Cluster Linear Program for Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649749(1605-1616)Online publication date: 10-Jun-2024
      • (2024)Combinatorial Correlation ClusteringProceedings of the 56th Annual ACM Symposium on Theory of Computing10.1145/3618260.3649712(1617-1628)Online publication date: 10-Jun-2024
      • (2024)Software Architecture Recovery from Multiple Dependency ModelsProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635917(1185-1192)Online publication date: 8-Apr-2024
      • (2024)Noisy Sorting CapacityIEEE Transactions on Information Theory10.1109/TIT.2024.342528170:9(6121-6138)Online publication date: 8-Jul-2024
      • (2024)Distributed Differentially Private Ranking AggregationIEEE Transactions on Computational Social Systems10.1109/TCSS.2022.322509611:1(503-513)Online publication date: Feb-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media