Pitfalls and protocols of data science in manufacturing practice

2165 Accesses
32 Citations
1 Altmetric
Explore all metrics

Abstract

Driven by ongoing migration for Industry 4.0, the increasing adoption of artificial intelligence, big data analytics, cloud computing, Internet of Things, and robotics have empowered smart manufacturing and digital transformation. However, increasing applications of machine learning and data science (DS) techniques present a range of procedural issues including those that involved in data, assumptions, methodologies, and applicable conditions. Each of these issues may increase difficulties for implementation in practice, especially associated with the manufacturing characteristics and domain knowledge. However, little research has been done to examine and resolve related issues systematically. Gaps of existing studies can be traced to the lack of a framework within which the pitfalls involved in implementation procedures can be identified and thus appropriate procedures for employing effective methodologies can be suggested. This study aims to develop a five-phase analytics framework that can facilitate the investigation of pitfalls for intelligent manufacturing and suggest protocols to empower practical applications of the DS methodologies from descriptive and predictive analytics to prescriptive and automating analytics in various contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quality 4.0: a review of big data challenges in manufacturing

Article Open access 11 April 2021

Industrial Big Data Analytics: Challenges and Opportunities

Bringing Advanced Analytics to Manufacturing: A Systematic Mapping

References

Amaran, S., Sahinidis, N. V., Sharda, B., & Bury, S. J. (2016). Simulation optimization: a review of algorithms and applications. Annals of Operations Research, 240, 351–380.
Article Google Scholar
Bai, Y., Sun, Z., Zeng, B., Long, J., Li, L., de Oliveira, J., et al. (2019). A comparison of dimension reduction techniques for support vector machine modeling of multi-parameter manufacturing quality prediction. Journal of Intelligent Manufacturing, 30, 2245–2256.
Article Google Scholar
Bakker, M., Riezebos, J., & Teunter, R. H. (2012). Review of inventory systems with deterioration since 2001. European Journal of Operational Rsearch, 221(2), 275–284.
Article Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. (2010). A theory of learning from different domains. Machine Learning, 79, 151–175.
Article Google Scholar
Beveridge, S., & Nelson, C. R. (1981). A new approach to decomposition of economic time series into permanent and transitory components with particular attention to measurement of the ‘business cycle’. Journal of Monetary Economics, 7(2), 151–174.
Article Google Scholar
Birant, D., & Kut, A. (2007). ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1), 208–221.
Article Google Scholar
Birge, J. R., & Louveaux, F. (2011). Introduction to stochastic programming (2nd ed.). New York: Springer.
Book Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning (1st ed.). Berlin: Springer.
Google Scholar
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327.
Article Google Scholar
Bordelon, J., & Maniar, P. (2006). The sub-100-nm imperative: parametric yield ramp. EETimes. Retrieved 6 May, 2019 from https://www.eetimes.com/the-sub-100-nm-imperative-parametric-yield-ramp/#.
Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association, 65(332), 1509–1526.
Article Google Scholar
Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth: CRC Press.
Google Scholar
Brettel, M., Friederichsen, N., Keller, M., & Rosenberg, M. (2014). How virtualization, decentralization and network building change the manufacturing landscape: an Industry 4.0 perspective. International Journal of Information and Communication Engineering, 8(1), 37–44.
Google Scholar
Brown, D. (1927). Centralized control with decentralized responsibilities. American Management Association Annual Convention, Series 57, (reprinted in Johnson, H.T. (Ed.), Systems and Profits: Early Management Accounting at DuPont and General Motors (Arno Press, 1980)), 1927.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Article Google Scholar
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27.
Article Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Article Google Scholar
Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790–799.
Article Google Scholar
Cheng, F.-T., Huang, H.-C., & Kao, C.-A. (2012). Developing an automatic virtual metrology system. IEEE Transactions on Automation Science and Engineering, 9(1), 181–188.
Article Google Scholar
Chien, C.-F., Chang, K.-H., & Wang, W.-C. (2014a). An empirical study of design-of-experiment data mining for yield-loss diagnosis for semiconductor manufacturing. Journal of Intelligent Manufacturing, 25(5), 961–972.
Article Google Scholar
Chien, C.-F., & Chen, C.-H. (2007). A novel timetabling algorithm for a furnace process for semiconductor fabrication with constrained waiting and frequency-based setups. OR Spectrum, 29(3), 391–419.
Article Google Scholar
Chien, C.-F., Chen, Y.-J., Hsu, C.-Y., & Wang, H.-K. (2014b). Overlay error compensation using advanced process control with dynamically adjusted proportional-integral R2R controller. IEEE Transactions on Automation Science and Engineering, 11(2), 473–484.
Article Google Scholar
Chien, C.-F., Chou, C.-W., & Yu, H.-C. (2016). A novel route selection and resource allocation approach to improve the efficiency of manual material handling system in 200-nm wafer fabs for Industry 3.5. IEEE Transactions on Automation Science and Engineering, 13(4), 1567–1580.
Article Google Scholar
Chien, C.-F., & Chuang, S.-C. (2014). A framework for root cause detection of sub-batch processing system for semiconductor manufacturing big data analytics. IEEE Transactions on Semiconducutor Manufacturing, 27(4), 475–488.
Article Google Scholar
Chien, C.-F., & Hsu, C.-Y. (2011). UNISON analysis to model and reduce step-and-scan overlay errors for semiconductor manufacturing. Journal of Intelligent Manufacturing, 22(3), 399–412.
Article Google Scholar
Chien, C.-F., Hsu, C.-Y., & Hsiao, C. (2012). Manufacturing intelligence to forecast and reduce semiconductor cycle time. Journal of Intelligent Manufacturing, 23(6), 2281–2294.
Article Google Scholar
Chien, C.-F., Kuo, C.-J., & Yu, C. (2020a). Tool allocation to smooth work-in-process for cycle time reduction and an empirical study. Annals of Operations Research, 290, 1009–1033.
Article Google Scholar
Chien, C.-F., Lin, Y.-S., & Lin, S.-K. (2020b). Deep reinforcement learning for selecting demand forecast models to empower Industry 3.5 and an empirical study for a semiconductor component distributor. International Journal of Production Research, 58(9), 2784–2804.
Article Google Scholar
Chien, C.-F., Wang, H.-J., & Wang, M. (2007). A UNISON framework for analyzing alternative strategies of IC final testing for enhancing overall operational effectiveness. International Journal of Production Economics, 107(1), 20–30.
Article Google Scholar
Chien, C.-F., & Zheng, J.-N. (2012). Mini-max regret strategy for robust capacity expansion decisions in semiconductor manufacturing. Journal of Intelligent Manufacturing, 23(6), 2151–2159.
Article Google Scholar
Choi, T.-M., Wallace, S. W., & Wang, Y. (2018). Big data analytics in operations management. Production and Operations Management, 27(10), 1868–1883.
Article Google Scholar
Chopra, S., Reinhardt, G., & Dada, M. (2004). The effect of lead time uncertainty on safety stocks. Decision Sciences, 35(1), 1–24.
Article Google Scholar
Chou, C.-W., Chien, C.-F., & Gen, M. (2014). A multiobjective hybrid genetic algorithm for TFT-LCD module assembly scheduling. IEEE Transactions on Automation Science and Engineering, 11(3), 692–705.
Article Google Scholar
Chouichi, A., Blue, J., Yugma, C., & Pasqualini, F. (2020). Chamber-to-chamber discrepancy detection in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 33(1), 86–95.
Article Google Scholar
Clemen, R. T., & Reilly, T. (2013). Making hard decisions with decision tools (3rd ed.). Boston: Cengage Learning.
Google Scholar
Comon, P. (1994). Independent component analysis, a new concept? Signal Processing, 36, 287–314.
Article Google Scholar
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.
Google Scholar
Donders, A. R., van der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). Review: A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10), 1087–1091.
Article Google Scholar
Efroymson, M. (1960). Multiple regression analysis. Mathematical Methods for Digital Computers, 1, 191–203.
Google Scholar
Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the seventeenth international joint conference on Artificial Intelligence (IJCAI’01) (vol. 2, pp. 973–978).
Fausett, L. V. (1994). Fundamentals of neural networks: Architectures algorithms and applications (1st ed.). Upper Saddle River: Prentice Hall.
Google Scholar
Flynn, B. B., Huo, B., & Zhao, X. (2010). The impact of supply chain integration on performance: A contingency and configuration approach. Journal of Operations Management, 28(1), 58–71.
Article Google Scholar
Francis, R. L., McGinnis, L. F., Jr., & White, J. A. (1992). Facility layout and location: An analytical approach (2nd ed.). Upper Saddle River: Prentice-Hall.
Google Scholar
Freivalds, A., & Niebel, B. (2013). Niebel’s methods, standards, & work design (13th ed.). New York: McGraw-Hill Education.
Google Scholar
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.
Google Scholar
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Article Google Scholar
Friedman, M. (1957). A theory of the consumption function. Princeton: Princeton University Press.
Book Google Scholar
Fu, W., & Chien, C.-F. (2019). UNISON data-driven intermittent demand forecast framework to empower supply chain resilience and an empirical study in electronics distribution. Computers & Industrial Engineering, 135, 940–949.
Article Google Scholar
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37.
Article Google Scholar
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1), 2096–2130.
Google Scholar
Goldratt, E. M., & Cox, J. (1992). The goal: A process of ongoing improvement. Great Barrington: North River Pr.
Google Scholar
Golmohammadi, D. (2015). A study of scheduling under the theory of constraints. International Journal of Production Economics, 165, 38–50.
Article Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. In Proceedings of the international conference on neural information processing systems (NIPS 2014) (pp. 2672–2680).
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424–438.
Article Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Google Scholar
Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2009). Multivariate data analysis (7th ed.). Upper Saddle River: Prentice Hall.
Google Scholar
Hammer, M., & Champy, J. A. (1993). Reengineering the corporation: A manifesto for business revolution (1st ed.). New York: HarperBusiness.
Google Scholar
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Burlington: Morgan Kaufmann.
Google Scholar
Haskaraman, F. (2016). Chamber matching in semiconductor manufaturing using statistical analysis and run-to-run control. Master of Engineering Thesis, Massachusetts Institute of Technology, Cambridge, MA.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.
Book Google Scholar
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Article Google Scholar
Hillier, F., & Lieberman, G. J. (2015). Introduction to operations research (10th ed.). New York: McGraw-Hill.
Google Scholar
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Based eimation for nnorthogonal poblems. Technometrics, 12(1), 55–67.
Article Google Scholar
Hoffer, J. A., Venkataraman, R., & Topi, H. (2015). Modern database management (12th ed.). New York: Pearson.
Google Scholar
Hopp, W. J., & Spearman, M. L. (2011). Factory physics (3rd ed.). Long Grove: Waveland Press.
Google Scholar
Hu, Y.-F., Hou, J.-L., & Chien, C.-F. (2019). A UNISON framework for knowledge management of university–industry collaboration and an illustration. Computers & Industrial Engineering, 129, 31–43.
Article Google Scholar
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Society of London Proceedings Series A, 454, 903–998.
Article Google Scholar
Huang, S.-H., & Pan, Y.-C. (2015). Automated visual inspection in the semiconductor industry: A survey. Computers in Industry, 66, 1–10.
Article Google Scholar
Hung, S.-Y., Lee, C.-Y., & Lin, Y.-L. (2020). Data science for delamination prognosis and online batch learning in semiconductor assembly process. IEEE Transactions on Components, Packaging and Manufacturing Technology, 10(2), 314–324.
Article Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (1st ed.). Berlin: Springer.
Book Google Scholar
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.
Article Google Scholar
Jutten, C., & Hérault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24, 1–10.
Article Google Scholar
Kaisler, S. H., Espinosa, J. A., Armour, F., & Money, W. H. (2014). Advanced analytics: issues and challenges in a global environment. In 2014 47th Hawaii international conference on system sciences (pp. 729–738), Waikoloa, HI, January 6–9. https://doi.org/10.1109/hicss.2014.98.
Kao, Y.-T., Dauzère-Pérès, S., Blue, J., & Chang, S.-C. (2018). Impact of integrating equipment health in production scheduling for semiconductor fabrication. Computers & Industrial Engineering, 120, 450–459.
Article Google Scholar
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2004). Segmenting time series: A survey and novel approach. In M. Last, A. Kandel, & H. Bunke (Eds.), Data mining in time series databases (Vol. 57, pp. 1–22). Singapore: World Scientific.
Chapter Google Scholar
Kerzner, H. R. (2017). Project management: A systems approach to planning, scheduling, and controlling (12th ed.). Hoboken: Wiley.
Google Scholar
Khakifirooz, M., Chien, C.-F., & Chen, Y.-J. (2018). Bayesian inference for mining semiconductor manufacturing big data for yield enhancement and smart production to empower industry 4.0. Applied Soft Computing, 68, 990–999.
Article Google Scholar
Khakifirooz, M., Chien, C.-F., & Chen, Y.-J. (2020a). Dynamic support vector regression control system for overlay error compensation with stochastic metrology delay. IEEE Transactions on Automation Science and Engineering, 17(1), 502–512.
Article Google Scholar
Khakifirooz, M., Chien, C.-F., & Fathi, M. (2019). Compensating misalignment using dynamic random-effect control system: A case of high-mixed wafer fabrication. IEEE Transactions on Automation Science and Engineering, 16(4), 1788–1799.
Article Google Scholar
Khakifirooz, M., Chien, C.-F., Fathi, M., & Pardalos, P. (2020b). Minimax optimization for recipe management in high-mixed semiconductor lithography process. IEEE Transactions on Industrial Informatics, 16(8), 4975–4985.
Article Google Scholar
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of international conference on learning representations (ICLR).
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the international joint conference on artificial intelligence (IJCAI 1995) (pp. 1137–1143).
Koza, J. R. (1998). Genetic programming. Cambridge: MIT Press.
Google Scholar
Ku, C.-C., Chien, C.-F., & Ma, K.-T. (2020). Digital transformation to empower smart production for Industry 3.5 and an empirical study for textile dyeing. Computers & Industrial Engineering, 142, 106297.
Article Google Scholar
Kuo, C.-J., Chien, C.-F., & Chen, C.-D. (2011). Manufacturing intelligence to exploit the value of production and tool data to reduce cycle time. IEEE Transactions on Automation Science and Engineering, 8(1), 103–111.
Article Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Article Google Scholar
Lee, C.-Y., & Chen, B.-S. (2018). Mutually-exclusive-and-collectively-exhaustive feature selection scheme. Applied Soft Computing, 68, 961–971.
Article Google Scholar
Lee, C.-Y., & Chiang, M.-C. (2016). Aggregate demand forecast with small data and robust capacity decision in TFT-LCD manufacturing. Computers & Industrial Engineering, 99, 415–422.
Article Google Scholar
Lee, C.-Y., & Chien, C.-F. (2014). Stochastic programming for vendor portfolio selection and order allocation under delivery uncertainty. OR Spectrum, 36(3), 761–797.
Article Google Scholar
Lee, C.-Y., Huang, T.-S., Liu, M.-K., & Lan, C.-Y. (2019). Data science for vibration heteroscedasticity and predictive maintenance of rotary bearings. Energies, 12(5), 801.
Article Google Scholar
Lee, C.-Y., Hung, Y.-H., & Chen, Y.-W. (2020a). Hybrid data science and reinforcement learning in data envelopment analysis. In J. Zhu & V. Charles (Eds.), Data-enabled analytics: DEA for big data. Berlin: Springer.
Google Scholar
Lee, C.-Y., & Johnson, A. L. (2013). Operational efficiency. In A. B. Badiru (Ed.), Handbook of industrial and systems engineering (pp. 17–44). Cambridge: CRC Press.
Google Scholar
Lee, C.-Y., & Liang, C.-L. (2018). Manufacturer’s printing forecast, reprinting decision, and contract design in the educational publishing industry. Computers & Industrial Engineering, 125, 678–687.
Article Google Scholar
Lee, C.-Y., & Tsai, T.-L. (2019). Data science framework for variable selection, metrology prediction, and process control in TFT-LCD manufacturing. Robotics and Computer Integrated Manufacturing, 55, 76–87.
Article Google Scholar
Lee, C.-Y., Wu, C.-S., & Hung, Y.-H. (2020b). In-line predictive monitoring framework. IEEE Transactions on Automation Science and Engineering (forthcoming). https://doi.org/10.1109/TASE.2020.3014177.
Article Google Scholar
Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.
Article Google Scholar
Lee, J., Lapira, E., Bagheri, B., & Kao, H.-A. (2013). Recent advances and trends in predictive manufacturing systems in big data environment. Manufacturing Letters, 1(1), 38–41.
Article Google Scholar
Lee, J., Wu, F., Zhao, W., Ghaffari, M., Liao, L., & Siegel, D. (2014). Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mechanical Systems and Signal Processing, 42(1–2), 314–334.
Article Google Scholar
Lee, W.-J., & Ong, S.-C. (2010). Learning from small data sets to improve assembly semiconductor manufacturing processes. In The 2nd International Conference on Computer and Automation Engineering (ICCAE), Singapore.
Lei, Y., Li, N., Guo, L., Li, N., Yan, T., & Lin, J. (2018). Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mechanical Systems and Signal Processing, 104, 799–834.
Article Google Scholar
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.
Google Scholar
Liker, J. (2004). The toyota way: 14 management principles from the world’s greatest manufacturer (1st ed.). New York: McGraw-Hill Education.
Google Scholar
Lin, K.-Y., Chien, C.-F., & Kerh, R. (2016). UNISON framework of data-driven innovation for extracting user experience of product design of wearable devices. Computers & Industrial Engineering, 99, 487–502.
Article Google Scholar
Lin, Y.-C., Hung, M.-H., Huang, H.-C., Chen, C.-C., Yang, H.-C., Hsieh, Y.-S., et al. (2017). Development of advanced manufacturing cloud of things (AMCoT)—a smart manufacturing platform. IEEE Robotics and Automation Letters, 2(3), 1809–1816.
Article Google Scholar
Little, J. D. C. (1961). A proof for the queuing formula: L = λW. Operations Research, 9(3), 383–387.
Article Google Scholar
Lloyd, S. P. (1957). Least square quantization in PCM. Technical note, Bell laboratories, 1957. IEEE Transactions on Information Theory, 1982, 28(2), 129–137.
Low, C., Hsu, C.-M., & Huang, K.-I. (2004). Benefits of lot splitting in job-shop scheduling. The International Journal of Advanced Manufacturing Technology, 24(9–10), 773–780.
Article Google Scholar
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA.
Markowitz, H. M. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.
Google Scholar
Moniruzzaman, A. B. M., & Hossain, S. A. (2013). Nosql database: New era of databases for big data analytics-classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 1–14.
Google Scholar
Montgomery, D. C. (2012). Design and analysis of experiments (8th ed.). Hoboken: Wiley.
Google Scholar
Montgomery, D. C. (2019). Introduction to statistical quality control (8th ed.). Hoboken: Wiley.
Google Scholar
Nahmias, S., & Olsen, T. L. (2015). Production and operations analysis (7th ed.). Long Grove: Waveland Press.
Google Scholar
O’Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673–690.
Article Google Scholar
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Article Google Scholar
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572.
Article Google Scholar
Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.
Article Google Scholar
Pillac, V., Gendreau, M., Guéret, C., & Medaglia, A. L. (2013). A review of dynamic vehicle routing problems. European Journal of Operational Research, 225(1), 1–11.
Article Google Scholar
Pinedo, M. L. (2016). Scheduling: Theory, algorithms, and systems (5th ed.). Berlin: Springer.
Book Google Scholar
Pisano, G. P., & Wheelwright, S. C. (1995). The new logic of high tech R&D. Harvard Business Review, 73(5), 93–105.
Google Scholar
Politis, D. N. (2015). Model-free prediction and regression: A transformation-based approach to inference. Cham: Springer.
Book Google Scholar
Rai, A., Patnayakuni, R., & Seth, N. (2006). Firm performance impacts of digitally enabled supply chain integration capabilities. MIS Quarterly, 30(2), 225–246.
Article Google Scholar
Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320.
Article Google Scholar
Rauch, E., Linder, C., & Dallasega, P. (2020). Anthropocentric perspective of production before and within Industry 4.0. Computers & Industrial Engineering, 139, 105644.
Article Google Scholar
Reichertz, J. (2014). Induction, deduction, abduction. In U. Flick (Ed.), The SAGE handbook of qualitative data analysis. Thousand Oaks: SAGE Publications Ltd.
Google Scholar
Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(9), 2507–2517.
Article Google Scholar
Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. https://arxiv.org/abs/1708.08296.
Savage, L. J. (1951). The theory of statistical decision. Journal of the American Statistical Association, 46, 55–67.
Article Google Scholar
Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annuals of Statistics, 26(5), 1651–1686.
Google Scholar
Shen, L., Dauzère-Pérès, S., & Neufeld, J. S. (2018). Solving the flexible job shop scheduling problem with sequence-dependent setup times. European Journal of Operational Research, 265(2), 503–516.
Article Google Scholar
Sisinni, E., Saifullah, A., Han, S., Jennehag, U., & Gidlund, M. (2018). Industrial internet of things: Challenges, opportunities, and directions. IEEE Transactions on Industrial Informatics, 14(11), 4724–4734.
Article Google Scholar
Smith, S. (2003). Digital signal processing: A practical guide for engineers and scientists (1st ed.). Sydney: Newnes.
Google Scholar
Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2(6), 568–576.
Article Google Scholar
Stock, T., & Seliger, G. (2016). Opportunities of sustainable manufacturing in Industry 4.0. Procedia CIRP, 40, 536–541.
Article Google Scholar
Suits, D. B. (1957). Use of dummy variables in regression equations. Journal of the American Statistical Association, 52(280), 548–551.
Article Google Scholar
Tao, F., Zhang, H., Liu, A., & Nee, A. Y. C. (2019). Digital twin in industry: State-of-the-art. IEEE Transactions on Industrial Informatics, 15(4), 2405–2415.
Article Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.
Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67(1), 91–108.
Article Google Scholar
Tiwari, S., Wee, H. M., & Daryanto, Y. (2018). Big data analytics in supply chain management between 2010 and 2016: Insights to industries. Computers & Industrial Engineering, 115, 319–330.
Article Google Scholar
Tsai, T.-L., Huang, M.-H., Lee, C.-Y., & Lai, W.-W. (2019). Data science for extubation prediction and value of information in surgical intensive care unit. Journal of Clinical Medicine, 8, 1709.
Article Google Scholar
Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Article Google Scholar
Vapnik, V., & Chernovenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probabilities and its Applications, 16(2), 264–280.
Article Google Scholar
Velculescu, V. E., Zhang, L., Vogelstein, B., & Kinzler, K. W. (1995). Serial analysis of gene expression. Science, 270(5235), 484–487.
Article Google Scholar
Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2), 77–84.
Article Google Scholar
Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. E. (2012). Probability and statistics for engineers and scientists (9th ed.). London: Pearson.
Google Scholar
Wang, H.-K., Chien, C.-F., & Gen, M. (2015). An algorithm of multi-subpopulation parameters with hybrid estimation of distribution for semiconductor scheduling with constrained waiting time. IEEE Transactions on Semiconductor Manufacturing, 28(3), 353–366.
Article Google Scholar
Wang, J.-Q., Chen, J., Zhang, Y., & Huang, G. Q. (2016). Schedule-based execution bottleneck identification in a job shop. Computers & Industrial Engineering, 98, 308–322.
Article Google Scholar
Wang, T., Qiao, M., Zhang, M., Yang, Y., & Snoussi, H. (2020). Data-driven prognostic method based on self-supervised learning approaches for fault detection. Journal of Intelligent Manufacturing, 31, 1611–1619.
Article Google Scholar
Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Article Google Scholar
Wen, L., Li, X., Gao, L., & Zhang, Y. (2018). A new convolutional neural network-based data-driven fault diagnosis method. IEEE Transactions on Industrial Electronmics, 65(7), 5990–5998.
Article Google Scholar
Widrow, B. (1987). ADALINE and MADALINE. Plenary Speech, Vol. I. In Proceedings of IEEE 1st international conference on neural networks (pp. 143–158), San Diego, CA.
Wong, S. C., Gatt, A., Stamatescu, V., & McDonnell, M. D. (2016). Understanding data augmentation for classification: when to warp?. In 2016 international conference on digital image computing: Techniques and applications (DICTA). Gold Coast, QLD, Australia, 30 Nov.–2 Dec.
Xu, L., & Zhang, W.-J. (2001). Comparison of different methods for variable selection. Analytica Chimica Acta, 446(1–2), 475–481.
Article Google Scholar
Xu, X. (2012). From cloud computing to cloud manufacturing. Robotics and Computer-Integrated Manufacturing, 28(1), 75–86.
Article Google Scholar
Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159–175.
Article Google Scholar
Zhang, J., Ding, G., Zou, Y., Qin, S., & Fu, J. (2019). Review of job shop scheduling research and its new perspectives under Industry 4.0. Journal of Intelligent Manufacturing, 30, 1809–1830.
Article Google Scholar
Zhang, J.-L., Zhang, Y.-J., & Zhang, L. (2015). A novel hybrid method for crude oil price forecasting. Energy Economics, 49, 649–659.
Article Google Scholar
Zheng, A., & Casari, A. (2018). Feature engineering for machine learning: Principles and techniques for data scientists. Sebastopol: O’Reilly Media.
Google Scholar

Download references

Funding

The funding was provided by Ministry of Science and Technology, Taiwan (Grant Nos. MOST 106-2218-E-031-001 and MOST 109-2634-F-007-019).

Author information

Authors and Affiliations

Department of Information Management, National Taiwan University, Taipei City, 10617, Taiwan
Chia-Yen Lee
Institute of Manufacturing Information and Systems, National Cheng Kung University, Tainan City, 701, Taiwan
Chia-Yen Lee
Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu, 30013, Taiwan
Chen-Fu Chien

Authors

Chia-Yen Lee
View author publications
You can also search for this author in PubMed Google Scholar
Chen-Fu Chien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chia-Yen Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, CY., Chien, CF. Pitfalls and protocols of data science in manufacturing practice. J Intell Manuf 33, 1189–1207 (2022). https://doi.org/10.1007/s10845-020-01711-w

Download citation

Received: 17 November 2019
Accepted: 02 November 2020
Published: 23 November 2020
Issue Date: June 2022
DOI: https://doi.org/10.1007/s10845-020-01711-w

Pitfalls and protocols of data science in manufacturing practice

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Quality 4.0: a review of big data challenges in manufacturing

Industrial Big Data Analytics: Challenges and Opportunities

Bringing Advanced Analytics to Manufacturing: A Systematic Mapping

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Pitfalls and protocols of data science in manufacturing practice

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Quality 4.0: a review of big data challenges in manufacturing

Industrial Big Data Analytics: Challenges and Opportunities

Bringing Advanced Analytics to Manufacturing: A Systematic Mapping

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation