Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Scalable Metadata Management Through OSD+ Devices

Published: 01 February 2014 Publication History

Abstract

We present the design and implementation of both an enhanced new type of OSD device, the OSD+ device, and a metadata cluster based on it. The new OSD+ devices support data objects and directory objects. Unlike "data" objects, present in a traditional OSD, directory objects store file names and attributes, and support metadata-related operations. By using OSD+ devices, we show how the metadata cluster of the Fusion Parallel File System (FPFS) can effectively be managed by all the servers in a system, improving the performance, scalability and availability of the metadata service. We also describe how a directory with millions of files, and accessed by thousands of clients at the same time, is efficiently distributed across several servers to provide high IOPS rates. The performance of our metadata cluster based on OSD+s has been evaluated and compared with that achieved by Lustre. The results show that our proposal obtains a better throughput than Lustre when both use a single metadata server, easily getting improvements of more than 60---80%, and that the performance scales with the number of OSD+s. They also show that FPFS is able to provide a sustained throughput of more than 70,000 creates per second, and more than 120,000 stats per second, for huge directories on a cluster with just 8 OSD+s.

References

[1]
Wang, F., Xin, Q., Hong, B., Brandt, S.A., Miller, E.L., Long, D.D.E., McLarty, T.T.: `File system workload analysis for large scale scientific computing applications. In: Proceedings of the 2004 Mass Storage Systems and Technologies (2004, Apr)
[2]
Patil, S., Gibson, G.: Scale and concurrency of giga+: file system directories with millions of files. In: Proceedings of the 9th USENIX Conference on File and Storage Technology (FAST'11), pp. 15---30 (2011, Feb)
[3]
Roselli, D., Lorch, J., Anderson, T.: A comparison of file system workloads. In: Proceedings of the 2000 USENIX Annual Technical Conference, pp. 41---54 (2000, June)
[4]
Latham, R., Miller, N., Ross, R., Carns, P.: A next-generation parallel file system for linux clusters. In: LinuxWorld, pp. 56---59 (2004, Jan)
[5]
Weil, S.: Ceph: Reliable, Scalable, and High-Performance Distributed Storage. Ph.D. dissertation, University of California, Santa Cruz (2007, Dec)
[6]
Di, W.: CMD Code Walk Through 2009 {Online}. Available: http://wiki.lustre.org/images/7/70/SC09-CMD-Code.pdf
[7]
Braams, P.J.: High-performance Storage Architecture and Scalable Cluster File System 2008 {Online}. Available: http://wiki.lustre.org/index.php/Lustre_Publications
[8]
Mesnier, M., Ganger, G.R., Riedel, E.: Object-based storage. In: IEEE Communications Magazine, pp. 84---90 (2003, Aug)
[9]
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), pp. 1---10 (2010, May)
[10]
Freitas, R., Slember, J., Sawdon, W., Chiu, L.: GPFS Scans 10 Billion Files in 43 Minutes. IBM Almaden Research Center, Tech. Rep. RJ10484, July 2011 {Online}. Available: http://www.almaden.ibm.com/storagesystems/resources/GPFS-Violin-white-paper.pdf
[11]
Welch, B., Unangst, M., Abbasi, Z., Gibson, D., Mueller, J., Small, B., Zelenka, J., Zhou, B.: Scalable performance of the Panasas Parallel File System. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies (2008)
[12]
Hildebrand, D., Honeyman, P.: Exporting storage systems in a scalable manner with pNFS. In: Proceedings of the 22nd IEEE/13th NASA Goddard Conference on Mass Storage Systems and Technologies (2005)
[13]
Polyakov, E.: The Elliptics Network, 2009 {Online}. Available: http://www.ioremap.net/projects/elliptics
[14]
Polyakov, E.: Parallel optimized host message exchange layered file system, 2009 {Online}. Available: http://www.ioremap.net/projects/pohmelfs
[15]
Ali, N., Devulapalli, A., Dalessandro, D., Wyckoff, P., Sadayappan, P.: An OSD-based approach to managing directory operations in parallel file systems. In: IEEE International Conference on Cluster Computing, pp. 175---184 (2008)
[16]
Honicky, R.J., Miller, E.L.: Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution. In: Proceedings of the 18th International Parallel & Distributed Processing Symposium (IPDPS) (2004, Apr)
[17]
Brandt, S.A., Miller, E.L., Long, D.D.E., Xue, L.: Efficient metadata management in large distributed storage systems. In: Proceedings of the 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 290---298 (2003)
[18]
Sinnamohideen, S., Sambasivan, R.R., Hendricks,J., Liu, L., Ganger, G.R.: A transparently-scalable metadata service for the Ursa Minor storage system. In: Proceedings of the 2010 USENIX Annual Technical Conference (ATC'10) (2010)
[19]
Skeen, D., Stonebraker, M.: A formal model of crash recovery in a distributed system. In: IEEE Transactions on Software Engineering, vol. 9, pp. 219---228 (1983, May) {Online}.
[20]
Sun-Oracle: Lustre Tunning, 2010 {Online}. Available: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html
[21]
Tweedie, S.: Journaling the Linux ext2fs Filesystem. In: LinuxExpo'98 (1998)
[22]
Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., Vivier, L.: The new ext4 filesystem: current status and future plans. In: Linux Symposium, 2007 {Online}. Available: http://ols.108.redhat.com/2007/Reprints/mathur-Reprint.pdf
[23]
Kerberos: An authentication service for computer networks. IEEE Commun. Mag. 32, 33---38 (1994, Sept) {Online}. Available: http://gost.isi.edu/publications/kerberos-neuman-tso.html
[24]
INCITS Technical Committee T10: SCSI Object-Based Storage Device Commands-3 (OSD---3). Project t10/2128---d. Working draft, revision 02 (2010, July). http://www.t10.org/drafts.htm#OSD_Family
[25]
Hewlett---Packard: Fstrace (2002). http://tesla.hpl.hp.com/open-source/fstrace
[26]
University Corporation for Atmospheric Research: Metarates, 2004 {Online}. Available: http://www.cisl.ucar.edu/css/software/metarates/
[27]
Morrone, C., Loewe, B., McLarty, T.: mdtest HPC Benchmark, 2010 {Online}. Available: http://sourceforge.net/projects/mdtest
[28]
Newman, H.: HPCS mission partner file I/O scenarios, revision 3 (2008, Nov) {Online}. Available: http://wiki.lustre.org/images/5/5a/Newman_May_Lustre_Workshop.pdf
[29]
Satyanarayanan M., Kistler J.J., Kumar P., Okasaki M.E., Siegel E.H.: Coda: a highly available file system for a distributed workstation enviroment. IEEE Trans. Comput. 39(4), 447---459 (1990)
[30]
Morris, J.H., Satyanarayanan, M., Conner, M.H., Howard, J.H., Rosenthal, D.S., Smith, F.D.: Andrew: a distributed personal computing enviroment. Commun. ACM, 29, 184---201 (1986, March) {Online}.
[31]
Weil, S.A., Pollack, K.T., Brandt, S.A., Miller, E.L.: Dynamic matadata management for petabyte-scale file systems. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing (2004, Nov)
[32]
Weijia, L., Wei, X., Shu, J., Zheng, W.: Dynamic hashing: adaptive metadata management for petabyte-scale file systems. In: Proceedings of the 14th IEEE/23rd NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 159---164 (2006, May)
[33]
Feng, D., Wang, J., Wang, F., Xia, P.: DOIDFH: an effective distributed metadata management scheme. In: Proceedings of the 5th International Conference on Compurational Science and Applications (2007, Oct)
[34]
Wang J., Feng D., Wang F., Lu C.: MHS: a distributed metadata management strategy. J. Syst. Softw. 82(12), 2004---2011 (2009)
[35]
Weil, S.A., Brandt, S.A., Miller, E.L., Maltzahn, C.: CRUSH: Controlled, scalable, decentralized placement of replicated data. In: Proceedings the 2006 ACM/IEEE Conference on Supercomputing (2006, Nov)
[36]
Hedges, R., Fitzgerald, K., Gary, M., Stearman, D.M.: Comparison of leading parallel NAS file systems on commodity hardware. In: Petascale Data Storage Workshop 2010 (poster) (2010, Nov) {Online}. Available: http://www.pdsiscidac.org/events/PDSW10/resources/posters/parallelNASFSs.pdf
[37]
Wheeler, R.: One billion files: Scalability limits in linux file systems. In: LinuxCon'10 (2010, Aug) {Online}. Available: http://events.linuxfoundation.org/slides/2010/linuxcon2010-wheeler.pdf

Cited By

View all
  • (2018)Scalable Metadata Management Techniques for Ultra-Large Distributed Storage Systems -- A Systematic ReviewACM Computing Surveys10.1145/321268651:4(1-37)Online publication date: 31-Jul-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Parallel Programming
International Journal of Parallel Programming  Volume 42, Issue 1
February 2014
237 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 February 2014

Author Tags

  1. Fusion parallel file system
  2. Management of huge directories
  3. Metadata cluster
  4. OSD+
  5. Scalability

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Scalable Metadata Management Techniques for Ultra-Large Distributed Storage Systems -- A Systematic ReviewACM Computing Surveys10.1145/321268651:4(1-37)Online publication date: 31-Jul-2018

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media