article

Scalable Metadata Management Through OSD+ Devices

Authors:

Ana Avilés-González,

Pilar González-FérezAuthors Info & Claims

International Journal of Parallel Programming, Volume 42, Issue 1

Pages 4 - 29

https://doi.org/10.1007/s10766-012-0207-8

Published: 01 February 2014 Publication History

Abstract

We present the design and implementation of both an enhanced new type of OSD device, the OSD+ device, and a metadata cluster based on it. The new OSD+ devices support data objects and directory objects. Unlike "data" objects, present in a traditional OSD, directory objects store file names and attributes, and support metadata-related operations. By using OSD+ devices, we show how the metadata cluster of the Fusion Parallel File System (FPFS) can effectively be managed by all the servers in a system, improving the performance, scalability and availability of the metadata service. We also describe how a directory with millions of files, and accessed by thousands of clients at the same time, is efficiently distributed across several servers to provide high IOPS rates. The performance of our metadata cluster based on OSD+s has been evaluated and compared with that achieved by Lustre. The results show that our proposal obtains a better throughput than Lustre when both use a single metadata server, easily getting improvements of more than 60---80%, and that the performance scales with the number of OSD+s. They also show that FPFS is able to provide a sustained throughput of more than 70,000 creates per second, and more than 120,000 stats per second, for huge directories on a cluster with just 8 OSD+s.

References

[1]

Wang, F., Xin, Q., Hong, B., Brandt, S.A., Miller, E.L., Long, D.D.E., McLarty, T.T.: `File system workload analysis for large scale scientific computing applications. In: Proceedings of the 2004 Mass Storage Systems and Technologies (2004, Apr)

[2]

Patil, S., Gibson, G.: Scale and concurrency of giga+: file system directories with millions of files. In: Proceedings of the 9th USENIX Conference on File and Storage Technology (FAST'11), pp. 15---30 (2011, Feb)

[3]

Roselli, D., Lorch, J., Anderson, T.: A comparison of file system workloads. In: Proceedings of the 2000 USENIX Annual Technical Conference, pp. 41---54 (2000, June)

Digital Library

[4]

Latham, R., Miller, N., Ross, R., Carns, P.: A next-generation parallel file system for linux clusters. In: LinuxWorld, pp. 56---59 (2004, Jan)

[5]

Weil, S.: Ceph: Reliable, Scalable, and High-Performance Distributed Storage. Ph.D. dissertation, University of California, Santa Cruz (2007, Dec)

[6]

Di, W.: CMD Code Walk Through 2009 {Online}. Available: http://wiki.lustre.org/images/7/70/SC09-CMD-Code.pdf

[7]

Braams, P.J.: High-performance Storage Architecture and Scalable Cluster File System 2008 {Online}. Available: http://wiki.lustre.org/index.php/Lustre_Publications

[8]

Mesnier, M., Ganger, G.R., Riedel, E.: Object-based storage. In: IEEE Communications Magazine, pp. 84---90 (2003, Aug)

[9]

Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), pp. 1---10 (2010, May)

[10]

Freitas, R., Slember, J., Sawdon, W., Chiu, L.: GPFS Scans 10 Billion Files in 43 Minutes. IBM Almaden Research Center, Tech. Rep. RJ10484, July 2011 {Online}. Available: http://www.almaden.ibm.com/storagesystems/resources/GPFS-Violin-white-paper.pdf

[11]

Welch, B., Unangst, M., Abbasi, Z., Gibson, D., Mueller, J., Small, B., Zelenka, J., Zhou, B.: Scalable performance of the Panasas Parallel File System. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies (2008)

[12]

Hildebrand, D., Honeyman, P.: Exporting storage systems in a scalable manner with pNFS. In: Proceedings of the 22nd IEEE/13th NASA Goddard Conference on Mass Storage Systems and Technologies (2005)

Digital Library

[13]

Polyakov, E.: The Elliptics Network, 2009 {Online}. Available: http://www.ioremap.net/projects/elliptics

[14]

Polyakov, E.: Parallel optimized host message exchange layered file system, 2009 {Online}. Available: http://www.ioremap.net/projects/pohmelfs

[15]

Ali, N., Devulapalli, A., Dalessandro, D., Wyckoff, P., Sadayappan, P.: An OSD-based approach to managing directory operations in parallel file systems. In: IEEE International Conference on Cluster Computing, pp. 175---184 (2008)

[16]

Honicky, R.J., Miller, E.L.: Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: Proceedings of the 18th International Parallel & Distributed Processing Symposium (IPDPS) (2004, Apr)

[17]

Brandt, S.A., Miller, E.L., Long, D.D.E., Xue, L.: Efficient metadata management in large distributed storage systems. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 290---298 (2003)

[18]

Sinnamohideen, S., Sambasivan, R.R., Hendricks,J., Liu, L., Ganger, G.R.: A transparently-scalable metadata service for the Ursa Minor storage system. In: Proceedings of the 2010 USENIX Annual Technical Conference (ATC'10) (2010)

Digital Library

[19]

Skeen, D., Stonebraker, M.: A formal model of crash recovery in a distributed system. In: IEEE Transactions on Software Engineering, vol. 9, pp. 219---228 (1983, May) {Online}.

Digital Library

[20]

Sun-Oracle: Lustre Tunning, 2010 {Online}. Available: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html

[21]

Tweedie, S.: Journaling the Linux ext2fs Filesystem. In: LinuxExpo'98 (1998)

[22]

Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L.: The new ext4 filesystem: current status and future plans. In: Linux Symposium, 2007 {Online}. Available: http://ols.108.redhat.com/2007/Reprints/mathur-Reprint.pdf

[23]

Kerberos: An authentication service for computer networks. IEEE Commun. Mag. 32, 33---38 (1994, Sept) {Online}. Available: http://gost.isi.edu/publications/kerberos-neuman-tso.html

[24]

INCITS Technical Committee T10: SCSI Object-Based Storage Device Commands-3 (OSD---3). Project t10/2128---d. Working draft, revision 02 (2010, July). http://www.t10.org/drafts.htm#OSD_Family

[25]

Hewlett---Packard: Fstrace (2002). http://tesla.hpl.hp.com/open-source/fstrace

[26]

University Corporation for Atmospheric Research: Metarates, 2004 {Online}. Available: http://www.cisl.ucar.edu/css/software/metarates/

[27]

Morrone, C., Loewe, B., McLarty, T.: mdtest HPC Benchmark, 2010 {Online}. Available: http://sourceforge.net/projects/mdtest

[28]

Newman, H.: HPCS mission partner file I/O scenarios, revision 3 (2008, Nov) {Online}. Available: http://wiki.lustre.org/images/5/5a/Newman_May_Lustre_Workshop.pdf

[29]

Satyanarayanan M., Kistler J.J., Kumar P., Okasaki M.E., Siegel E.H.: Coda: a highly available file system for a distributed workstation enviroment. IEEE Trans. Comput. 39(4), 447---459 (1990)

Digital Library

[30]

Morris, J.H., Satyanarayanan, M., Conner, M.H., Howard, J.H., Rosenthal, D.S., Smith, F.D.: Andrew: a distributed personal computing enviroment. Commun. ACM, 29, 184---201 (1986, March) {Online}.

Digital Library

[31]

Weil, S.A., Pollack, K.T., Brandt, S.A., Miller, E.L.: Dynamic matadata management for petabyte-scale file systems. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing (2004, Nov)

[32]

Weijia, L., Wei, X., Shu, J., Zheng, W.: Dynamic hashing: adaptive metadata management for petabyte-scale file systems. In: Proceedings of the 14th IEEE/23rd NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 159---164 (2006, May)

[33]

Feng, D., Wang, J., Wang, F., Xia, P.: DOIDFH: an effective distributed metadata management scheme. In: Proceedings of the 5th International Conference on Compurational Science and Applications (2007, Oct)

[34]

Wang J., Feng D., Wang F., Lu C.: MHS: a distributed metadata management strategy. J. Syst. Softw. 82(12), 2004---2011 (2009)

Digital Library

[35]

Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: CRUSH: Controlled, scalable, decentralized placement of replicated data. In: Proceedings the 2006 ACM/IEEE Conference on Supercomputing (2006, Nov)

[36]

Hedges, R., Fitzgerald, K., Gary, M., Stearman, D.M.: Comparison of leading parallel NAS file systems on commodity hardware. In: Petascale Data Storage Workshop 2010 (poster) (2010, Nov) {Online}. Available: http://www.pdsiscidac.org/events/PDSW10/resources/posters/parallelNASFSs.pdf

[37]

Wheeler, R.: One billion files: Scalability limits in linux file systems. In: LinuxCon'10 (2010, Aug) {Online}. Available: http://events.linuxfoundation.org/slides/2010/linuxcon2010-wheeler.pdf

Cited By

Singh HBawa S(2018)Scalable Metadata Management Techniques for Ultra-Large Distributed Storage Systems -- A Systematic ReviewACM Computing Surveys10.1145/321268651:4(1-37)Online publication date: 31-Jul-2018
https://dl.acm.org/doi/10.1145/3212686

Scalable Metadata Management Through OSD+ Devices

Recommendations

A Metadata Cluster Based on OSD+ Devices
SBAC-PAD '11: Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing

We present the design and implementation of both an enhanced type of OSD device, the OSD+ device, and a metadata cluster based on it. OSD+s support data objects and directory objects. A directory object stores file names and attributes, and supports ...
Scalable Huge Directories through OSD+ Devices
PDP '13: Proceedings of the 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

Management of directories with millions of files, accessed by thousands of clients at the same time, is a problem recently identified in HPC environments. This paper introduces an OSD+-based technique to deal with those directories. We use directory ...
Supporting Scalable and Adaptive Metadata Management in Ultralarge-Scale File Systems

This paper presents a scalable and adaptive decentralized metadata lookup scheme for ultralarge-scale file systems (more than Petabytes or even Exabytes). Our scheme logically organizes metadata servers (MDSs) into a multilayered query hierarchy and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Parallel Programming

International Journal of Parallel Programming Volume 42, Issue 1

February 2014

237 pages

ISSN:0885-7458

Issue’s Table of Contents

Copyright © Copyright © 2014 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2014

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Singh HBawa S(2018)Scalable Metadata Management Techniques for Ultra-Large Distributed Storage Systems -- A Systematic ReviewACM Computing Surveys10.1145/321268651:4(1-37)Online publication date: 31-Jul-2018
https://dl.acm.org/doi/10.1145/3212686

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents