Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Regression with small data sets: a case study using code surrogates in additive manufacturing

Published: 01 November 2018 Publication History

Abstract

There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a computer simulation is run using many different values of the input parameters and a regression model is built to relate the outputs of the simulation to the inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments, but the cost of running expensive simulations at many sample points can be high. In this paper, we use a problem from the domain of additive manufacturing to show that even with small data sets we can build good quality surrogates by appropriately selecting the input samples and the regression algorithm. Our work is broadly applicable to simulations in other domains and the ideas proposed can be used in time-constrained machine learning tasks, such as hyper-parameter optimization.

References

[1]
ACME (2016) Accelerated climate modeling for energy web page. https://climatemodeling.science.energy.gov/projects/accelerated-climate-modeling-energy
[2]
Atkeson C, Schaal SA, Moore AW (1997) Locally weighted learning. AI Rev. 11:75---133
[3]
Austin PC, Steyerberg EW (2015) The number of subjects per variable required in linear regression analyses. J Clin Epidemiol 68:627---636
[4]
Babyak MA (2004) What you see may not be what you get: a brief, non-technical introduction to overfitting in regression-type models. Psychosom Med 66:411---421
[5]
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1(1):23---34
[6]
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281---305
[7]
Beuth J et al (2013) Process mapping for qualification across multiple direct metal additive manufacturing processes. In: Bourell D (ed) International solid freeform fabrication symposium, an additive manufacturing conference. University of Texas at Austin, Austin, Texas, pp 655---665
[8]
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. CRC Press, Boca Raton
[9]
Burl MC et al (2006) Automated knowledge discovery from simulators. In: Proceedings, Sixth SIAM international conference on data mining, pp 82---93
[10]
Carriera-Perpiñán MA (1996) A review of dimension reduction techniques. Tech. rep., Technical Report CS-96-09, Department of Computer Science, University of Sheffield, UK
[11]
Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1---27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[12]
Chapelle O, Vapnik V, Bengio Y (2002) Model selection for small sample regression. Mach Learn 48(1):9---23
[13]
Committee on Mathematical Foundations of Verification, Validation, and Uncertainty Quantification; Board on Mathematical Sciences and Their Applications, Division on Engineering and Physical Sciences, National Research Council (2012) Assessing the reliability of complex models: mathematical and statistical foundations of verification, validation, and uncertainty quantification. The National Academies Press, Washington
[14]
Eagar T, Tsai N (1983) Temperature-fields produced by traveling distributed heat-sources. Weld J 62:S346---S355
[15]
Fang K-T, Li R, Sudjianto A (2005) Design and modeling for computer experiments. Chapman and Hall/CRC Press, Boca Raton
[16]
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19, 1(03):1---67
[17]
GPy (2012) GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy
[18]
Guo Y, Graber A, McBurney RN, Balasubramanian R (2010) Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms. BMC Bioinf 11:447
[19]
Isaksson A, Wallman M, Goransson H, Gustafsson M (2008) Cross-validation and bootstrapping are unreliable in small sample classification. Pattern Recogn Lett 29:1960---1965
[20]
Kamath C (2009) Scientific data mining: a practical perspective. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
[21]
Kamath C (2016) Data mining and statistical inference in selective laser melting. Int J Adv Manuf Technol 86:1659---1677
[22]
Kamath C, Cantú-Paz E (2001) Creating ensembles of decision trees through sampling. In: Proceedings of the 33-rd symposium on the interface: computing science and statistics
[23]
Kamath C, El-dasher B, Gallegos GF, King WE, Sisto A (2014) Density of additively-manufactured, 316L SS parts using laser powder-bed fusion at powers up to 400 W. Int J Adv Manuf Technol 74:65---78
[24]
Kleijnen JPC (2008) Design and analysis of simulation experiments. Springer, New York
[25]
Mitchell DP (1991) Spectrally optimal sampling for distribution ray tracing. Comput Graph 25(4):157---164
[26]
Oehlert GW (2000) A first course in design and analysis of experiments. W. H. Freeman. http://users.stat.umn.edu/~gary/Book.html
[27]
Owen AB (2003) Quasi-Monte Carlo sampling. Course notes from Siggraph course. http://www-stat.stanford.edu/~owen/reports/
[28]
Owen AB (1998) Latin supercube sampling for very high-dimensional simulations. ACM Trans Model Comput Simul 8(1):71---102
[29]
Qian Y et al (2016) Uncertainty quantification in climate modeling and projection. Bull Am Meteorol Soc 97(5):821---824
[30]
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
[31]
Rokach L (2010) Pattern classification using ensemble methods. World Scientific Publishing, Singapore
[32]
Rokach L, Maimon O (2014) Data mining with decision trees: theory and applications. World Scientific Publishing, Singapore
[33]
Rudy J (2013) Py-earth. https://contrib.scikit-learn.org/py-earth/
[34]
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207---1245
[35]
Shiflet AB, Shiflet GW (2006) Introduction to computational science: modeling and simulation for the sciences. Princeton University Press, Princeton
[36]
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
[37]
Verhaeghe F, Craeghs T, Heulens J, Pandalaers L (2009) A pragmatic model for selective laser melting with evaporation. Acta Mater 57:6006---6012
[38]
Yadroitsev I, Gusarov A, Yadroitsava I, Smurov I (2010) Single track formation in selective laser melting of metal powders. J Mater Process Technol 210:1624---1631

Cited By

View all
  • (2023)A critical review on applications of artificial intelligence in manufacturingArtificial Intelligence Review10.1007/s10462-023-10535-y56:Suppl 1(661-768)Online publication date: 1-Jul-2023
  • (2022)Efficient Sampling Algorithm for Electric Machine Design Calculations incorporating Empirical Knowledge2022 International Conference on Electrical Machines (ICEM)10.1109/ICEM51905.2022.9910814(1089-1095)Online publication date: 5-Sep-2022
  • (2022)A review of machine learning techniques for process and performance optimization in laser beam powder bed fusion additive manufacturingJournal of Intelligent Manufacturing10.1007/s10845-022-02012-034:8(3249-3275)Online publication date: 15-Sep-2022
  • Show More Cited By
  1. Regression with small data sets: a case study using code surrogates in additive manufacturing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Knowledge and Information Systems
    Knowledge and Information Systems  Volume 57, Issue 2
    November 2018
    244 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 November 2018

    Author Tags

    1. Code surrogates
    2. Regression
    3. Sampling
    4. Small data sets

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A critical review on applications of artificial intelligence in manufacturingArtificial Intelligence Review10.1007/s10462-023-10535-y56:Suppl 1(661-768)Online publication date: 1-Jul-2023
    • (2022)Efficient Sampling Algorithm for Electric Machine Design Calculations incorporating Empirical Knowledge2022 International Conference on Electrical Machines (ICEM)10.1109/ICEM51905.2022.9910814(1089-1095)Online publication date: 5-Sep-2022
    • (2022)A review of machine learning techniques for process and performance optimization in laser beam powder bed fusion additive manufacturingJournal of Intelligent Manufacturing10.1007/s10845-022-02012-034:8(3249-3275)Online publication date: 15-Sep-2022
    • (2022)Fast and accurate prediction of temperature evolutions in additive manufacturing process using deep learningJournal of Intelligent Manufacturing10.1007/s10845-021-01896-834:4(1701-1719)Online publication date: 7-Jan-2022

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media