Abstract
Driven by ongoing migration for Industry 4.0, the increasing adoption of artificial intelligence, big data analytics, cloud computing, Internet of Things, and robotics have empowered smart manufacturing and digital transformation. However, increasing applications of machine learning and data science (DS) techniques present a range of procedural issues including those that involved in data, assumptions, methodologies, and applicable conditions. Each of these issues may increase difficulties for implementation in practice, especially associated with the manufacturing characteristics and domain knowledge. However, little research has been done to examine and resolve related issues systematically. Gaps of existing studies can be traced to the lack of a framework within which the pitfalls involved in implementation procedures can be identified and thus appropriate procedures for employing effective methodologies can be suggested. This study aims to develop a five-phase analytics framework that can facilitate the investigation of pitfalls for intelligent manufacturing and suggest protocols to empower practical applications of the DS methodologies from descriptive and predictive analytics to prescriptive and automating analytics in various contexts.
Similar content being viewed by others
References
Amaran, S., Sahinidis, N. V., Sharda, B., & Bury, S. J. (2016). Simulation optimization: a review of algorithms and applications. Annals of Operations Research, 240, 351–380.
Bai, Y., Sun, Z., Zeng, B., Long, J., Li, L., de Oliveira, J., et al. (2019). A comparison of dimension reduction techniques for support vector machine modeling of multi-parameter manufacturing quality prediction. Journal of Intelligent Manufacturing, 30, 2245–2256.
Bakker, M., Riezebos, J., & Teunter, R. H. (2012). Review of inventory systems with deterioration since 2001. European Journal of Operational Rsearch, 221(2), 275–284.
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., & Vaughan, J. (2010). A theory of learning from different domains. Machine Learning, 79, 151–175.
Beveridge, S., & Nelson, C. R. (1981). A new approach to decomposition of economic time series into permanent and transitory components with particular attention to measurement of the ‘business cycle’. Journal of Monetary Economics, 7(2), 151–174.
Birant, D., & Kut, A. (2007). ST-DBSCAN: an algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1), 208–221.
Birge, J. R., & Louveaux, F. (2011). Introduction to stochastic programming (2nd ed.). New York: Springer.
Bishop, C. M. (2006). Pattern recognition and machine learning (1st ed.). Berlin: Springer.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327.
Bordelon, J., & Maniar, P. (2006). The sub-100-nm imperative: parametric yield ramp. EETimes. Retrieved 6 May, 2019 from https://www.eetimes.com/the-sub-100-nm-imperative-parametric-yield-ramp/#.
Box, G. E. P., & Pierce, D. A. (1970). Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. Journal of the American Statistical Association, 65(332), 1509–1526.
Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth: CRC Press.
Brettel, M., Friederichsen, N., Keller, M., & Rosenberg, M. (2014). How virtualization, decentralization and network building change the manufacturing landscape: an Industry 4.0 perspective. International Journal of Information and Communication Engineering, 8(1), 37–44.
Brown, D. (1927). Centralized control with decentralized responsibilities. American Management Association Annual Convention, Series 57, (reprinted in Johnson, H.T. (Ed.), Systems and Profits: Early Management Accounting at DuPont and General Motors (Arno Press, 1980)), 1927.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Chang, C.-C., & Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790–799.
Cheng, F.-T., Huang, H.-C., & Kao, C.-A. (2012). Developing an automatic virtual metrology system. IEEE Transactions on Automation Science and Engineering, 9(1), 181–188.
Chien, C.-F., Chang, K.-H., & Wang, W.-C. (2014a). An empirical study of design-of-experiment data mining for yield-loss diagnosis for semiconductor manufacturing. Journal of Intelligent Manufacturing, 25(5), 961–972.
Chien, C.-F., & Chen, C.-H. (2007). A novel timetabling algorithm for a furnace process for semiconductor fabrication with constrained waiting and frequency-based setups. OR Spectrum, 29(3), 391–419.
Chien, C.-F., Chen, Y.-J., Hsu, C.-Y., & Wang, H.-K. (2014b). Overlay error compensation using advanced process control with dynamically adjusted proportional-integral R2R controller. IEEE Transactions on Automation Science and Engineering, 11(2), 473–484.
Chien, C.-F., Chou, C.-W., & Yu, H.-C. (2016). A novel route selection and resource allocation approach to improve the efficiency of manual material handling system in 200-nm wafer fabs for Industry 3.5. IEEE Transactions on Automation Science and Engineering, 13(4), 1567–1580.
Chien, C.-F., & Chuang, S.-C. (2014). A framework for root cause detection of sub-batch processing system for semiconductor manufacturing big data analytics. IEEE Transactions on Semiconducutor Manufacturing, 27(4), 475–488.
Chien, C.-F., & Hsu, C.-Y. (2011). UNISON analysis to model and reduce step-and-scan overlay errors for semiconductor manufacturing. Journal of Intelligent Manufacturing, 22(3), 399–412.
Chien, C.-F., Hsu, C.-Y., & Hsiao, C. (2012). Manufacturing intelligence to forecast and reduce semiconductor cycle time. Journal of Intelligent Manufacturing, 23(6), 2281–2294.
Chien, C.-F., Kuo, C.-J., & Yu, C. (2020a). Tool allocation to smooth work-in-process for cycle time reduction and an empirical study. Annals of Operations Research, 290, 1009–1033.
Chien, C.-F., Lin, Y.-S., & Lin, S.-K. (2020b). Deep reinforcement learning for selecting demand forecast models to empower Industry 3.5 and an empirical study for a semiconductor component distributor. International Journal of Production Research, 58(9), 2784–2804.
Chien, C.-F., Wang, H.-J., & Wang, M. (2007). A UNISON framework for analyzing alternative strategies of IC final testing for enhancing overall operational effectiveness. International Journal of Production Economics, 107(1), 20–30.
Chien, C.-F., & Zheng, J.-N. (2012). Mini-max regret strategy for robust capacity expansion decisions in semiconductor manufacturing. Journal of Intelligent Manufacturing, 23(6), 2151–2159.
Choi, T.-M., Wallace, S. W., & Wang, Y. (2018). Big data analytics in operations management. Production and Operations Management, 27(10), 1868–1883.
Chopra, S., Reinhardt, G., & Dada, M. (2004). The effect of lead time uncertainty on safety stocks. Decision Sciences, 35(1), 1–24.
Chou, C.-W., Chien, C.-F., & Gen, M. (2014). A multiobjective hybrid genetic algorithm for TFT-LCD module assembly scheduling. IEEE Transactions on Automation Science and Engineering, 11(3), 692–705.
Chouichi, A., Blue, J., Yugma, C., & Pasqualini, F. (2020). Chamber-to-chamber discrepancy detection in semiconductor manufacturing. IEEE Transactions on Semiconductor Manufacturing, 33(1), 86–95.
Clemen, R. T., & Reilly, T. (2013). Making hard decisions with decision tools (3rd ed.). Boston: Cengage Learning.
Comon, P. (1994). Independent component analysis, a new concept? Signal Processing, 36, 287–314.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.
Donders, A. R., van der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). Review: A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10), 1087–1091.
Efroymson, M. (1960). Multiple regression analysis. Mathematical Methods for Digital Computers, 1, 191–203.
Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the seventeenth international joint conference on Artificial Intelligence (IJCAI’01) (vol. 2, pp. 973–978).
Fausett, L. V. (1994). Fundamentals of neural networks: Architectures algorithms and applications (1st ed.). Upper Saddle River: Prentice Hall.
Flynn, B. B., Huo, B., & Zhao, X. (2010). The impact of supply chain integration on performance: A contingency and configuration approach. Journal of Operations Management, 28(1), 58–71.
Francis, R. L., McGinnis, L. F., Jr., & White, J. A. (1992). Facility layout and location: An analytical approach (2nd ed.). Upper Saddle River: Prentice-Hall.
Freivalds, A., & Niebel, B. (2013). Niebel’s methods, standards, & work design (13th ed.). New York: McGraw-Hill Education.
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–67.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.
Friedman, M. (1957). A theory of the consumption function. Princeton: Princeton University Press.
Fu, W., & Chien, C.-F. (2019). UNISON data-driven intermittent demand forecast framework to empower supply chain resilience and an empirical study in electronics distribution. Computers & Industrial Engineering, 135, 940–949.
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A survey on concept drift adaptation. ACM Computing Surveys, 46(4), 1–37.
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1), 2096–2130.
Goldratt, E. M., & Cox, J. (1992). The goal: A process of ongoing improvement. Great Barrington: North River Pr.
Golmohammadi, D. (2015). A study of scheduling under the theory of constraints. International Journal of Production Economics, 165, 38–50.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial networks. In Proceedings of the international conference on neural information processing systems (NIPS 2014) (pp. 2672–2680).
Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37(3), 424–438.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Hair, J. F., Jr., Black, W. C., Babin, B. J., & Anderson, R. E. (2009). Multivariate data analysis (7th ed.). Upper Saddle River: Prentice Hall.
Hammer, M., & Champy, J. A. (1993). Reengineering the corporation: A manifesto for business revolution (1st ed.). New York: HarperBusiness.
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Burlington: Morgan Kaufmann.
Haskaraman, F. (2016). Chamber matching in semiconductor manufaturing using statistical analysis and run-to-run control. Master of Engineering Thesis, Massachusetts Institute of Technology, Cambridge, MA.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Hillier, F., & Lieberman, G. J. (2015). Introduction to operations research (10th ed.). New York: McGraw-Hill.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Based eimation for nnorthogonal poblems. Technometrics, 12(1), 55–67.
Hoffer, J. A., Venkataraman, R., & Topi, H. (2015). Modern database management (12th ed.). New York: Pearson.
Hopp, W. J., & Spearman, M. L. (2011). Factory physics (3rd ed.). Long Grove: Waveland Press.
Hu, Y.-F., Hou, J.-L., & Chien, C.-F. (2019). A UNISON framework for knowledge management of university–industry collaboration and an illustration. Computers & Industrial Engineering, 129, 31–43.
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Royal Society of London Proceedings Series A, 454, 903–998.
Huang, S.-H., & Pan, Y.-C. (2015). Automated visual inspection in the semiconductor industry: A survey. Computers in Industry, 66, 1–10.
Hung, S.-Y., Lee, C.-Y., & Lin, Y.-L. (2020). Data science for delamination prognosis and online batch learning in semiconductor assembly process. IEEE Transactions on Components, Packaging and Manufacturing Technology, 10(2), 314–324.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning: With applications in R (1st ed.). Berlin: Springer.
Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4), 422–446.
Jutten, C., & Hérault, J. (1991). Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24, 1–10.
Kaisler, S. H., Espinosa, J. A., Armour, F., & Money, W. H. (2014). Advanced analytics: issues and challenges in a global environment. In 2014 47th Hawaii international conference on system sciences (pp. 729–738), Waikoloa, HI, January 6–9. https://doi.org/10.1109/hicss.2014.98.
Kao, Y.-T., Dauzère-Pérès, S., Blue, J., & Chang, S.-C. (2018). Impact of integrating equipment health in production scheduling for semiconductor fabrication. Computers & Industrial Engineering, 120, 450–459.
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2004). Segmenting time series: A survey and novel approach. In M. Last, A. Kandel, & H. Bunke (Eds.), Data mining in time series databases (Vol. 57, pp. 1–22). Singapore: World Scientific.
Kerzner, H. R. (2017). Project management: A systems approach to planning, scheduling, and controlling (12th ed.). Hoboken: Wiley.
Khakifirooz, M., Chien, C.-F., & Chen, Y.-J. (2018). Bayesian inference for mining semiconductor manufacturing big data for yield enhancement and smart production to empower industry 4.0. Applied Soft Computing, 68, 990–999.
Khakifirooz, M., Chien, C.-F., & Chen, Y.-J. (2020a). Dynamic support vector regression control system for overlay error compensation with stochastic metrology delay. IEEE Transactions on Automation Science and Engineering, 17(1), 502–512.
Khakifirooz, M., Chien, C.-F., & Fathi, M. (2019). Compensating misalignment using dynamic random-effect control system: A case of high-mixed wafer fabrication. IEEE Transactions on Automation Science and Engineering, 16(4), 1788–1799.
Khakifirooz, M., Chien, C.-F., Fathi, M., & Pardalos, P. (2020b). Minimax optimization for recipe management in high-mixed semiconductor lithography process. IEEE Transactions on Industrial Informatics, 16(8), 4975–4985.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of international conference on learning representations (ICLR).
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the international joint conference on artificial intelligence (IJCAI 1995) (pp. 1137–1143).
Koza, J. R. (1998). Genetic programming. Cambridge: MIT Press.
Ku, C.-C., Chien, C.-F., & Ma, K.-T. (2020). Digital transformation to empower smart production for Industry 3.5 and an empirical study for textile dyeing. Computers & Industrial Engineering, 142, 106297.
Kuo, C.-J., Chien, C.-F., & Chen, C.-D. (2011). Manufacturing intelligence to exploit the value of production and tool data to reduce cycle time. IEEE Transactions on Automation Science and Engineering, 8(1), 103–111.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Lee, C.-Y., & Chen, B.-S. (2018). Mutually-exclusive-and-collectively-exhaustive feature selection scheme. Applied Soft Computing, 68, 961–971.
Lee, C.-Y., & Chiang, M.-C. (2016). Aggregate demand forecast with small data and robust capacity decision in TFT-LCD manufacturing. Computers & Industrial Engineering, 99, 415–422.
Lee, C.-Y., & Chien, C.-F. (2014). Stochastic programming for vendor portfolio selection and order allocation under delivery uncertainty. OR Spectrum, 36(3), 761–797.
Lee, C.-Y., Huang, T.-S., Liu, M.-K., & Lan, C.-Y. (2019). Data science for vibration heteroscedasticity and predictive maintenance of rotary bearings. Energies, 12(5), 801.
Lee, C.-Y., Hung, Y.-H., & Chen, Y.-W. (2020a). Hybrid data science and reinforcement learning in data envelopment analysis. In J. Zhu & V. Charles (Eds.), Data-enabled analytics: DEA for big data. Berlin: Springer.
Lee, C.-Y., & Johnson, A. L. (2013). Operational efficiency. In A. B. Badiru (Ed.), Handbook of industrial and systems engineering (pp. 17–44). Cambridge: CRC Press.
Lee, C.-Y., & Liang, C.-L. (2018). Manufacturer’s printing forecast, reprinting decision, and contract design in the educational publishing industry. Computers & Industrial Engineering, 125, 678–687.
Lee, C.-Y., & Tsai, T.-L. (2019). Data science framework for variable selection, metrology prediction, and process control in TFT-LCD manufacturing. Robotics and Computer Integrated Manufacturing, 55, 76–87.
Lee, C.-Y., Wu, C.-S., & Hung, Y.-H. (2020b). In-line predictive monitoring framework. IEEE Transactions on Automation Science and Engineering (forthcoming). https://doi.org/10.1109/TASE.2020.3014177.
Lee, J., Bagheri, B., & Kao, H.-A. (2015). A cyber-physical systems architecture for Industry 4.0-based manufacturing systems. Manufacturing Letters, 3, 18–23.
Lee, J., Lapira, E., Bagheri, B., & Kao, H.-A. (2013). Recent advances and trends in predictive manufacturing systems in big data environment. Manufacturing Letters, 1(1), 38–41.
Lee, J., Wu, F., Zhao, W., Ghaffari, M., Liao, L., & Siegel, D. (2014). Prognostics and health management design for rotary machinery systems—reviews, methodology and applications. Mechanical Systems and Signal Processing, 42(1–2), 314–334.
Lee, W.-J., & Ong, S.-C. (2010). Learning from small data sets to improve assembly semiconductor manufacturing processes. In The 2nd International Conference on Computer and Automation Engineering (ICCAE), Singapore.
Lei, Y., Li, N., Guo, L., Li, N., Yan, T., & Lin, J. (2018). Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mechanical Systems and Signal Processing, 104, 799–834.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3), 18–22.
Liker, J. (2004). The toyota way: 14 management principles from the world’s greatest manufacturer (1st ed.). New York: McGraw-Hill Education.
Lin, K.-Y., Chien, C.-F., & Kerh, R. (2016). UNISON framework of data-driven innovation for extracting user experience of product design of wearable devices. Computers & Industrial Engineering, 99, 487–502.
Lin, Y.-C., Hung, M.-H., Huang, H.-C., Chen, C.-C., Yang, H.-C., Hsieh, Y.-S., et al. (2017). Development of advanced manufacturing cloud of things (AMCoT)—a smart manufacturing platform. IEEE Robotics and Automation Letters, 2(3), 1809–1816.
Little, J. D. C. (1961). A proof for the queuing formula: L = λW. Operations Research, 9(3), 383–387.
Lloyd, S. P. (1957). Least square quantization in PCM. Technical note, Bell laboratories, 1957. IEEE Transactions on Information Theory, 1982, 28(2), 129–137.
Low, C., Hsu, C.-M., & Huang, K.-I. (2004). Benefits of lot splitting in job-shop scheduling. The International Journal of Advanced Manufacturing Technology, 24(9–10), 773–780.
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In 31st conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA.
Markowitz, H. M. (1952). Portfolio selection. The Journal of Finance, 7(1), 77–91.
Moniruzzaman, A. B. M., & Hossain, S. A. (2013). Nosql database: New era of databases for big data analytics-classification, characteristics and comparison. International Journal of Database Theory and Application, 6(4), 1–14.
Montgomery, D. C. (2012). Design and analysis of experiments (8th ed.). Hoboken: Wiley.
Montgomery, D. C. (2019). Introduction to statistical quality control (8th ed.). Hoboken: Wiley.
Nahmias, S., & Olsen, T. L. (2015). Production and operations analysis (7th ed.). Long Grove: Waveland Press.
O’Brien, R. M. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41(5), 673–690.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11), 559–572.
Pigott, T. D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353–383.
Pillac, V., Gendreau, M., Guéret, C., & Medaglia, A. L. (2013). A review of dynamic vehicle routing problems. European Journal of Operational Research, 225(1), 1–11.
Pinedo, M. L. (2016). Scheduling: Theory, algorithms, and systems (5th ed.). Berlin: Springer.
Pisano, G. P., & Wheelwright, S. C. (1995). The new logic of high tech R&D. Harvard Business Review, 73(5), 93–105.
Politis, D. N. (2015). Model-free prediction and regression: A transformation-based approach to inference. Cham: Springer.
Rai, A., Patnayakuni, R., & Seth, N. (2006). Firm performance impacts of digitally enabled supply chain integration capabilities. MIS Quarterly, 30(2), 225–246.
Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for adaboost. Machine Learning, 42, 287–320.
Rauch, E., Linder, C., & Dallasega, P. (2020). Anthropocentric perspective of production before and within Industry 4.0. Computers & Industrial Engineering, 139, 105644.
Reichertz, J. (2014). Induction, deduction, abduction. In U. Flick (Ed.), The SAGE handbook of qualitative data analysis. Thousand Oaks: SAGE Publications Ltd.
Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(9), 2507–2517.
Samek, W., Wiegand, T., & Müller, K.-R. (2017). Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. https://arxiv.org/abs/1708.08296.
Savage, L. J. (1951). The theory of statistical decision. Journal of the American Statistical Association, 46, 55–67.
Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annuals of Statistics, 26(5), 1651–1686.
Shen, L., Dauzère-Pérès, S., & Neufeld, J. S. (2018). Solving the flexible job shop scheduling problem with sequence-dependent setup times. European Journal of Operational Research, 265(2), 503–516.
Sisinni, E., Saifullah, A., Han, S., Jennehag, U., & Gidlund, M. (2018). Industrial internet of things: Challenges, opportunities, and directions. IEEE Transactions on Industrial Informatics, 14(11), 4724–4734.
Smith, S. (2003). Digital signal processing: A practical guide for engineers and scientists (1st ed.). Sydney: Newnes.
Specht, D. F. (1991). A general regression neural network. IEEE Transactions on Neural Networks, 2(6), 568–576.
Stock, T., & Seliger, G. (2016). Opportunities of sustainable manufacturing in Industry 4.0. Procedia CIRP, 40, 536–541.
Suits, D. B. (1957). Use of dummy variables in regression equations. Journal of the American Statistical Association, 52(280), 548–551.
Tao, F., Zhang, H., Liu, A., & Nee, A. Y. C. (2019). Digital twin in industry: State-of-the-art. IEEE Transactions on Industrial Informatics, 15(4), 2405–2415.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B, 67(1), 91–108.
Tiwari, S., Wee, H. M., & Daryanto, Y. (2018). Big data analytics in supply chain management between 2010 and 2016: Insights to industries. Computers & Industrial Engineering, 115, 319–330.
Tsai, T.-L., Huang, M.-H., Lee, C.-Y., & Lai, W.-W. (2019). Data science for extubation prediction and value of information in surgical intensive care unit. Journal of Clinical Medicine, 8, 1709.
Valiant, L. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Vapnik, V., & Chernovenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probabilities and its Applications, 16(2), 264–280.
Velculescu, V. E., Zhang, L., Vogelstein, B., & Kinzler, K. W. (1995). Serial analysis of gene expression. Science, 270(5235), 484–487.
Waller, M. A., & Fawcett, S. E. (2013). Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management. Journal of Business Logistics, 34(2), 77–84.
Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. E. (2012). Probability and statistics for engineers and scientists (9th ed.). London: Pearson.
Wang, H.-K., Chien, C.-F., & Gen, M. (2015). An algorithm of multi-subpopulation parameters with hybrid estimation of distribution for semiconductor scheduling with constrained waiting time. IEEE Transactions on Semiconductor Manufacturing, 28(3), 353–366.
Wang, J.-Q., Chen, J., Zhang, Y., & Huang, G. Q. (2016). Schedule-based execution bottleneck identification in a job shop. Computers & Industrial Engineering, 98, 308–322.
Wang, T., Qiao, M., Zhang, M., Yang, Y., & Snoussi, H. (2020). Data-driven prognostic method based on self-supervised learning approaches for fault detection. Journal of Intelligent Manufacturing, 31, 1611–1619.
Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Wen, L., Li, X., Gao, L., & Zhang, Y. (2018). A new convolutional neural network-based data-driven fault diagnosis method. IEEE Transactions on Industrial Electronmics, 65(7), 5990–5998.
Widrow, B. (1987). ADALINE and MADALINE. Plenary Speech, Vol. I. In Proceedings of IEEE 1st international conference on neural networks (pp. 143–158), San Diego, CA.
Wong, S. C., Gatt, A., Stamatescu, V., & McDonnell, M. D. (2016). Understanding data augmentation for classification: when to warp?. In 2016 international conference on digital image computing: Techniques and applications (DICTA). Gold Coast, QLD, Australia, 30 Nov.–2 Dec.
Xu, L., & Zhang, W.-J. (2001). Comparison of different methods for variable selection. Analytica Chimica Acta, 446(1–2), 475–481.
Xu, X. (2012). From cloud computing to cloud manufacturing. Robotics and Computer-Integrated Manufacturing, 28(1), 75–86.
Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing, 50, 159–175.
Zhang, J., Ding, G., Zou, Y., Qin, S., & Fu, J. (2019). Review of job shop scheduling research and its new perspectives under Industry 4.0. Journal of Intelligent Manufacturing, 30, 1809–1830.
Zhang, J.-L., Zhang, Y.-J., & Zhang, L. (2015). A novel hybrid method for crude oil price forecasting. Energy Economics, 49, 649–659.
Zheng, A., & Casari, A. (2018). Feature engineering for machine learning: Principles and techniques for data scientists. Sebastopol: O’Reilly Media.
Funding
The funding was provided by Ministry of Science and Technology, Taiwan (Grant Nos. MOST 106-2218-E-031-001 and MOST 109-2634-F-007-019).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lee, CY., Chien, CF. Pitfalls and protocols of data science in manufacturing practice. J Intell Manuf 33, 1189–1207 (2022). https://doi.org/10.1007/s10845-020-01711-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-020-01711-w