CoBITs: a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives

Jenq-Haur Wang¹ &
Hung-Chi Chang²

616 Accesses
2 Citations
Explore all metrics

Abstract

There’s more and more precious content digitized in digital archives especially for cultural heritage. It could cost much effort in digitization and archiving. To meet the requirements in a digital archiving system, several issues must be addressed. First, it usually requires resources such as computation and storage for each individual digital archive to maintain its own service. Second, the archived content would be more useful if they can be easily utilized in providing services such as searching across multiple archives. Current approaches usually adopt metadata harvesting that would build a centralized index from separate digital libraries. They usually suffer from the problem of metadata inconsistency. In this paper, we propose a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives. To reduce the loads in each archive, we dynamically distribute the tasks of crawling, indexing, and query processing depending on the response time. Distributed crawler-based approach can simplify the design of indexing and query processing steps by maintaining the data to be indexed local to the machine for crawling. It can facilitate efficient archiving and indexing by automatically following the link structure of contents published on the Web. Also, it enables simpler implementation and easier support for cross-archive applications such as search and copy detection. Experimental results show the potential of the proposed approach in load balancing with appropriate task distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

System of HPC content archiving

Article 24 November 2017

MX-tree: A Double Hierarchical Metric Index with Overlap Reduction

How to Search the Internet Archive Without Indexing It

Notes

http://www.ndap.org.tw/, the homepage for the first phase of the digital archives project in Taiwan. The second phase projects are available in: http://www.teldap.tw/
http://catalog.digitalarchives.tw/
http://www.archive.org/
http://www.netpreserve.org/
http://crawler.archive.org/
http://lucene.apache.org/nutch/
http://www.petitcolas.net/fabien/watermarking/stirmark/

References

Banbridge D, Don K, Buchanan G, Witten I, Jones S, Jones M, Barr M (2004) In Proceedings of the 8th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2004) pp 1–13
Bender M, Michel S, Triantafillou P, Weikum G, Zimmer C (2005) Improving collection selection with overlap awareness in P2P search engines. In Proceedings of SIGIR 2005, pp 67–74
Boldi P, Codenotti B, Santini M, Vigna S (2004) UbiCrawler: a scalable fully distributed Web crawler. Softw Pract Experience 34(8):711–726
Article Google Scholar
Buchanan G, Bainbridge D, Don K, Witten I (2005) A new framework for building digital library collections. In Proceedings of ACM/IEEE Joint Conference on Digital Libraries (JCDL 2005), pp 23–31
Callan J (2002) Distributed information retrieval. In Advances in information retrieval. pp 127–150
Cho J, Garcia-Molina H (2002) Parallel crawlers. In Proceedings of the 11th World Wide Web conference (WWW 2002), pp 124–135
Efron M, Organisciak P, Fenlon K (2011) Building topic models in a federated digital library through selective document exclusion. Proc Am Soc Info Sci Tech 48:1–10. doi:10.1002/meet.2011.14504801048
Heydon A, Najork M (1999) Mercator: a scalable, extensible web crawler. World Wide Web 2(4):219–229. Available at http://link.springer.com/article/10.1023%2FA%3A1019213109274
Lagoze C, Sompel HV, Nelson M, Warner S The open archives initiative protocol for metadata harvesting (2.0). Public draft, available at http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm
Liu X, Maly K, Zubair M, Nelson ML (2003) Repository synchronization in the OAI framework. In Proceedings of the Joint Conference on Digital Libraries (JCDL 2003), pp 191–198
Lu J, Callan J (2003) Content-based retrieval in hybrid peer-to-peer networks. In Proceedings of the twelfth International Conference on Information and Knowledge Management (CIKM 2003), pp 199–206
Lu J, Callan J (2005) Federated search of text-based digital libraries in hierarchical peer-to-peer networks. In Proceedings of 27th European Conference on Information Retrieval Research (ECIR 2005), pp 52–66
Maniatis P, Roussopoulos M, Giuli T, Rosenthal D, Baker M (2005) The LOCKSS peer-to-peer digital preservation system. ACM Trans Comput Syst 23(1):2–50
Article Google Scholar
Payette S, Lagoze C (1998) Flexible and Extensible Digital Object and Repository Architecture (FEDORA). In Proceedings of the 2nd European Conference on Research and Advanced Technology for Digital Libraries (ECDL 1998), pp 41–59
Seara EFR, Sunye MS, Bona LCE, Vignatti T, Vignatti AL, Doucet A (2012) Extending OAI-PMH over structured P2P networks for digital preservation. Int J Digit Libr 12:13–26
Article Google Scholar
Shkapenyuk V, Suel T (2002) Design and implementation of a high-performance distributed web crawler. In Proceedings of the International Conference on Data Engineering (ICDE 2002), pp 357–368
Simeoni F, Yakici M, Neely S, Crestani F (2008) Metadata harvesting for content-based distributed information retrieval. J Am Soc Inf Sci Technol 59(1):12–24
Article Google Scholar
Singh A, Srivatsa M, Liu L, Miller T (2003) Apoidea: A decentralized peer-to-peer architecture for crawling the world wide web. In Proceedings the SIGIR 2003 Workshop on Distributed IR, LNCS 2924. pp 126–142
Smith M, Barton M, Bass M, Branschofsky M, McClellan G, Stuve D, Tansley R, Walker JH (2003) DSpace: an open source dynamic digital repository. D-Lib Mag 9(No.1)
Staples T, Wayland R, Payette S (2003) The fedora project: an open-source digital object repository management system. D-Lib Mag 9(No. 4)
Stribling J, Councill I, Li J, Kaashoek M, Karger D, Morris R, Shenker S (2005) OverCite: A cooperative digital research library. In Proceedings of the 4th International Workshop on Peer-to-Peer Systems (IPTPS 2005), pp 69–79
Suel T, Mathur C, Wu J, Zhang J, Delis A, Kharrazi M, Long X, Shanmugasundaram K (2003) ODISSEA: A peer-to-peer architecture for scalable web search and information retrieval. In Proceedings of the 6th International Workshop on the Web and Database (WebDB 2003), pp 67–72
Teregowda P, Urgaonkar B, Giles CL (2010) Cloud computing: A digital libraries perspective. In Proceedings of IEEE 3rd International Conference on Cloud Computing (Cloud 2010), pp 115–122
Trnkoczy J, Stankovski V (2008) Improving the performance of federated digital library services. Futur Gener Comput Syst 24:824–832
Article Google Scholar
Trnkoczy J, Turk Z, Stankovski V (2006) A grid-based architecture for personalized federation of digital libraries. Libr Collect Acquis Tech Serv 30:139–153
Article Google Scholar
Vignatti T, Bona LCE, Sunye MS (2009) Long-term digital archiving based on selection of repositories over P2P networks. In Proceedings of IEEE 9th International Conference on Peer-to-Peer Computing (P2P 2009), pp 194–203
Wang JH, Chang HC, Hsiao JH (2008) Protecting digital library collections with collaborative web image copy detection. In Proceedings of the 11th International Conference on Asian Digital Libraries (ICADL 2008), pp 332–335
Wittek P, Daranyi S (2011) Leveraging on high-performance computing and cloud technologies in digital libraries: A case study. In Proceedings of IEEE 3rd International Conference on Cloud Computing Technology and Science (CloudCom 2011), pp 606–611

Download references

Acknowledgment

We would like to thank the support from National Science Council, Taiwan under the grant number NSC101-2219-E-027-005.

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei, Taiwan
Jenq-Haur Wang
Institute of Information Science, Academia Sinica, Taipei, Taiwan
Hung-Chi Chang

Authors

Jenq-Haur Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hung-Chi Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jenq-Haur Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, JH., Chang, HC. CoBITs: a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives. Multimed Tools Appl 74, 2639–2658 (2015). https://doi.org/10.1007/s11042-013-1461-5

Download citation

Published: 12 April 2013
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11042-013-1461-5

CoBITs: a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

System of HPC content archiving

MX-tree: A Double Hierarchical Metric Index with Overlap Reduction

How to Search the Internet Archive Without Indexing It

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CoBITs: a distributed indexing approach to collaborative content-based multimedia retrieval across digital archives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

System of HPC content archiving

MX-tree: A Double Hierarchical Metric Index with Overlap Reduction

How to Search the Internet Archive Without Indexing It

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation