Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Composition Prediction of a Debutanizer Column using Equation Based Artificial Neural Network Model

2014, Neurocomputing

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/260211393 Composition Prediction of a Debutanizer Column using Equation Based Artificial Neural Network Model ARTICLE in NEUROCOMPUTING · MAY 2014 Impact Factor: 2.08 · DOI: 10.1016/j.neucom.2013.10.039 CITATIONS READS 3 205 4 AUTHORS, INCLUDING: Mohd azlan Hussain Badrul Mohamed Jan 196 PUBLICATIONS 1,498 CITATIONS 51 PUBLICATIONS 136 CITATIONS University of Malaya SEE PROFILE University of Malaya SEE PROFILE Bawadi Abdullah Universiti Teknologi PETRONAS 41 PUBLICATIONS 22 CITATIONS SEE PROFILE All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. Available from: Badrul Mohamed Jan Retrieved on: 14 January 2016 Neurocomputing 131 (2014) 59–76 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Composition Prediction of a Debutanizer Column using Equation Based Artificial Neural Network Model Nasser Mohamed Ramli a,b, M.A. Hussain b,c,n, Badrul Mohamed Jan b, Bawadi Abdullah a a b c Chemical Engineering Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 31750 Tronoh, Perak, Malaysia Chemical Engineering Department, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia UMPDEC, University of Malaya, Malaysia art ic l e i nf o a b s t r a c t Article history: Received 27 May 2013 Received in revised form 17 September 2013 Accepted 28 October 2013 Communicated by J. Zhang Available online 7 January 2014 Debutanizer column is an important unit operation in petroleum refining industries. The design of online composition prediction by using neural network will help improve product quality monitoring in an oil refinery industry by predicting the top and bottom composition of n-butane simultaneously and accurately for the column. The single dynamic neural network model can be used and designed to overcome the delay introduced by lab sampling and can be also suitable for monitoring purposes. The objective of this work is to investigate and implement an artificial neural network (ANN) for composition prediction of the top and bottom product of a distillation column simultaneously. The major contribution of the current work is to develop these composition predictions of n-butane by using equation based neural network (NN) models. The composition predictions using this method is compared with partial least square (PLS) and regression analysis (RA) methods to show its superiority over these other conventional methods. Based on statistical analysis, the results indicate that neural network equation, which is more robust in nature, predicts better than the PLS equation and RA equation based methods. & 2014 Elsevier B.V. All rights reserved. Keywords: Statistical analysis Neural network Partial least square analysis Regression analysis Debutanizer column 1. Introduction Distillation column is considered one of the most common unit operations in the chemical industry. However, its complex behaviour and highly un-predictive nature, has made it as a unit operation which is complicated and difficult to handle by engineers [1]. Hence it becomes more important to attain the desired purity of products by manipulating the top and bottom composition of the distillation column accurately. In order to maintain and control the composition at its optimum value, it is necessary to predict it with high accuracy and precision, simultaneously with fast response. Chemical process industries also encounter a lot of problem in monitoring the debutanizer column. Open loop instability issues, non-linearity, multivariable issues and the difficulty to measure a certain variable directly are the key factors complicating the composition prediction. The composition at the top and bottom respectively for the column is currently measured using normal laboratory sampling which is tedious and time consuming. It has been found that the computing time for composition prediction monitoring by neural network is fast and accurate compared to normal laboratory while in the industry it n Corresponding author at: Chemical Engineering Department, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia. E-mail address: mohd_azlan@um.edu.my (M.A. Hussain). 0925-2312/$ - see front matter & 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.10.039 normally takes one day to measure the composition by laboratory sampling. In this context, the need for software-based online analyzer to provide the speed and accuracy for its measurement has become incumbent and this research deals with the prediction of the composition online using equation based artificial neural network models, and compared with partial least square and regression models. In relation to the use of online sensors, an adaptive soft sensor for online monitoring of melt index (MI), an important variable determining the product quality in the industrial propylene polymerization (PP) process, has been proposed by Zhang and Liu [2]. The fuzzy neural network (FNN) served as the basic model for its nonlinear approximation ability using its learning method. To overcome the difficulty of structure determination of the FNN, an adaptive fuzzy neural network (A-FNN) is subsequently developed to determine the number of fuzzy rules, where a novel adaptive method dynamically changes the structure of the model by the predefined thresholds. In order to get better generalization ability of the soft sensor, support vector regression (SVR) is introduced for parameter tuning, where the output function is transformed into an SVR based optimization problem. The soft sensors including the SVR, FNN–SVR and A-FNN–SVR models are compared in detail and the proposed soft sensor achieves good performance in the industrial MI prediction process. Three soft sensor models involving radial basis function (RBF), support vector machine (SVM), and independent component 60 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Nomenclature actual value At xmeamsuredmeasure value person correlation co-efficient Cp xpredicted predicted product yi ! yi Di yi difference actual and average actual Ea actual value Ep predicted value analysis–support vector machine (ICA–SVM)] methods has been developed by Yan and Liu [3]. The process is to infer the Chemical Oxygen Demand (COD) of the quench water produced from the pesticide waste incinerator. An optimization model of COD is further proposed based on a fore mentioned soft sensor models. The chaos genetic algorithm is introduced to solve the optimization model. A novel soft sensor model with principal component analysis, radial basis function neural network (RBF) and multi scale analysis (MSA) has been proposed by Shi J and Liu [4]. The purpose is to infer the melt index of manufactured products from real process variable, where PCA is carried out to select the important relevance process features and to eliminate the correlations of input variable, the MSA is to introduce much more information and to reduce the uncertainty of the system, and RBF networks are used to characterize the nonlinearity of the process. A black-box modeling scheme to predict melt index (MI) in the industrial propylene polymerization process has also been developed by Liu and Zhao [5]. MI is one of the most important quality variables determining product specification and influenced by a large number of process variables. In their work a faster statistical modeling method has been proposed to predict MI online which involves fuzzy neural network, particle swarm optimization (PSO) algorithm, and online correction strategy (OCS). Furthermore an adaptive soft sensor based on systematic process key variables has also been proposed for inferential control using derived adaptive model by Ma Ming et. al. [6]. The key variables are based on statistical approach of stepwise linear regression. The online plant measurements are selected as key features to estimate tardily-detected variables. The parameters of the linear inferential model are adapted as the online and offline data which are available. In order to improve the numerical characteristics of the algorithm, square root filter is used due to the multi-collinearity problem involved. The soft sensor has been implemented to an o-xylene purification column. The inferential model predicts accurately the real plant data which is useful for industrial application in the distillation column. The statistical stepwise regression technique was used to infer fast- measuring variables to some key variables so that the model is easy to maintain. By introducing the concept of adaptation, the model structure would reflect the current operation of the plant and the accuracy of the soft sensor could be improved. In this respect, Artificial Neural Network (ANN) offers as an alternative powerful and fast tool to model non-linear processes such as the debutanizer column and which can be utilized as an efficient soft sensor. ANN has the ability to learn the relationship between the outputs and the inputs for a system. To develop a process using ANN, it requires suitable network architecture and appropriate training data. The literature reported some work on debutanizer column modeling using neural network. For example, a nonlinear state space model is used for representing the inputs and outputs and singular value decomposition (SVD) is used to remove redundant nodes and model reduction in the work of Prasad and Bequette [7]. yi Ea Ep s2 Ft K MSE N R2 T difference predicted and average predicted average actual value average predicted value variance predicted value number of free model parameters mean square error number of observation R squared number of parameters The design of dynamic neural network soft sensors to improve product quality in a debutanizer column has also been reported using a three step predictive method to evaluate its top product concentration by Fortuna and co-workers [8]. The approach uses lagged values of the input and composition in the neural network prediction. Real time estimation of plant variables such as the composition are used for monitoring purposes and the number of neurons in the hidden layer for the neural network was determined by trial and error. The ANN estimator based on Levenberg– Marquardt (LM) algorithm has been used because it has been tested for binary as well as multi-component mixture by Singh and co-workers [9]. The LM algorithm suits very well to both cases and gives more accurate and sensitive results compared to Steepest Descent Back Propagation (SDBP) algorithm. For a complex chemical plant having hundreds of parameters, LM approach work efficiently. By using these parameters, the quality of the product could be estimated and corrective actions are taken simultaneously. ANN has also been utilized widely in crude fractionation section in the oil refinery industry where the output neural network prediction is the naphtha temperature rather than composition prediction by Zilochian and Bawazir [10]. Neural network has in reality been used for a number of chemical engineering applications involving sensor analysis, fault detection and nonlinear process control both in simulation and online implementation, as reported in the literature by Hussain [11]. Partial least square regression (PLSR) together with artificial neural network (ANN) with back propagation (BP) algorithm has also been proposed by Xuefeng [12]. The neural networks were trained to extract the quantitative information from the training samples for a preflash tower. Hybrid Artificial Neural Network (HANN) was employed to develop the naphtha dry point soft sensor which is the most important intermediate product concentration soft sensor in the p-xylene (PX) oxidation reaction. An optimization framework to obtain optimal operation of the dynamic processes under process-model mismatches has been developed by Mujtaba and Hussain [13]. In order to model these mismatches, neural network have been utilized in the batch distillation process for a binary batch distillation with only one specified product. In another work by Greaves and co-workers, a framework has been proposed to optimize the operation of batch system and utilize an artificial neural network (ANN) based process model in the optimization of the pilot-plant middle-vessel batch column [14]. The maximum-product problem is formulated and solved by optimizing the column operating parameters, such as the batch time, reflux and reboil ratios. The ANN based model was capable of reproducing the actual plant dynamics with good accuracy, and allows a large number of optimization studies to be carried out with little computational effort. Partial Least Square (PLS), an extension of PCA provide model parameters with diagnostic tools where by increasing the number of X variables, it could improve the precision of the PLS model [15]. In the literature there also exists some modeling work of N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 a debutanizer column using PLS. For example, dynamic partial least square regression is used in the inferential model for composition prediction in a multicomponent distillation column by Kano et. al [16]. Past sampling times measurement are used as input variables to interpret the dynamic process. PLS was also used to predict the composition profile in a simulated batch distillation column by Zamprogna et. al. [17]. The inputs are temperature measurements and the output is the composition in the distillate and bottom streams. The estimator performance is evaluated based on the pre-processing of the calibration and validation data sets. The number of measurements used as sensor inputs, consist of lagged measurements. A simple augmentation of the conventional PLS regression approach is based on the development and sequential use of multiple regression models. A soft sensor for a chemical process using PLS that could handle correlations for a number of process variables and nonlinearities based on the smoothness concept has also been proposed by Park and Han [18]. The proposed method was to build a soft sensor for a distillation column based on multivariate smoothing by using local weighted regression. There were two different type of cases applied for the distillation column which are the nonlinear and linear behavior and use for online measurement to estimate the important variables such as temperature and composition. Process monitoring using modified PLS through an independent component analysis (ICA) approach has also been developed by Zhang Yingwei and Zhang Yang [19]. The method make use of the kernel to the ICA-PLS to solve the non-linearity in the data set and the original algorithm are modified by giving the regression coefficient matrix and residual matrix to the ICA-PLS to reduce computation time. An application of PLS as a soft sensor has been developed to predict the melt flow index using measured process variable for an industrial autoclave reactor by Sharmin and co-workers [20]. Detailed first principle model for free radical polymerization is not an easy task since there are large reactions and kinetic parameters involved. Multivariate regression model are used to solve this problem and the melt index can be successfully predicted using these statistical tools. A multivariate statistical soft sensor for online estimation of product quality in an industrial batch polymerization process has also been proposed by Facco et. al.[21]. For each estimation, PLS sensors are designed, and their performance is evaluated against actual plant data. The estimation are evaluated by augmenting the process variable with lagged measurement. The projection method, using PLS regression are used to design a soft sensor for the online estimation of the resin quality properties. Multivariate statistical (MVS) techniques have been proven to be an excellent tool for analyzing and monitoring of processes where the process data are huge. Online soft sensor was proposed by using three different methods in terms of just in term learning (JITL) which are based on PLS, support vector regression (SVR) and least squares support vector regression (LSSVR) by Ge and Song [22]. The real time performance strategy is to enhance the online efficiency of the JILT based soft sensor for a distillation column. The JILT methods are suitable for real time performance. The modeling efficiency of SVR is not difficult because it only requires a quadratic programming optimization and the efficiency could be improved by the LSSVR. A least squares support vector machines (LS-SVM) soft-sensor model of propylene polymerization process has been developed by Shi and Liu to infer the MI of polypropylene [23]. Considering the use of cost function without regularization might lead to less robust estimates, the weighted least squares support vector machines (weighted LS-SVM) approach for the propylene polymerization process is further proposed to obtain a robust estimation of the melt index. Reliable estimation of melt index (MI) for the production of polypropylene has also been proposed by Shi 61 and coworkers [24]. Propylene polymerization process is highly nonlinear and characterized by multi-scale nature with huge number of variables and information which are highly correlated and derived at different sample rates from different sensors. A novel soft-sensor architecture based on radial basis function networks (RBF) combining independent component analysis (ICA) as well as multi-scale analysis (MSA) is proposed to infer the MI of polypropylene from other process variables. A RBF (radial basis function) neural network soft-sensor model for the polypropylene process has been developed by Li and Liu to infer the MI from a number of process variables [25]. Since the PP process is complicated for the RBF neural network with a general set of parameters, a new ant colony optimization (ACO) algorithm, N-ACO, and its adaptive version, A-N-ACO, which aimed to optimize the structure parameters of the RBF neural network, respectively. An optimal soft sensor, named the least squares support vector machines with Ant Colony-Immune Clone Particle Swarm Optimization (AC-ICPSO-LSSVM), has also been proposed by Jiang and coworkers which combines the advantages of the high accuracy of LSSVM and the fast convergence of PSO [26]. Furthermore, the immune clone (IC) method is introduced into the PSO algorithm to make the particles of ICPSO diverse and enhance global search capability for avoiding the premature convergence and local optimization of the conventional PSO algorithm. Another novel chemical soft-sensor approach for the prediction of the melt index (MI) in the propylene polymerization industry has been developed by Jiang and co-workers using accurate optimal predictive model of the MI values with the relevance vector machine (RVM) method [27]. The RVM is employed to build the MI prediction model and a modified particle swarm optimization (MPSO) algorithm is introduced to optimize the parameter of the RVM, after which the MPSO-RVM approach is developed. An online correcting strategy (OCS) is further carried out to update the modeling data and to revise the model’s parameter selfadaptively whenever model mismatch happens. In this paper we demonstrate the use of a single ANN to predict the composition of n-butane for the top and bottom of a debutanizer column simultaneously and compare it with predictions using PLS and regression analysis. One of the significant and novel contribution of this work is the use of an equation based neural network model whereas other works, mention previously, utilize neural network as a black box model only. The use of an equation based neural network is more reliable and robust than the conventional method and at the same time gives better prediction than the other methods such as the PLS and regression analysis. This equation based approach is also a concrete, fast and practical way of utilizing neural network models as a soft sensor for this system. Furthermore, we utilize a combination of online data both open loop and closed loop as well as simulated data and further validate these data using the closed loop system. This further enhanced the reliability and online capability of the NN model when applying it online as a software sensor. The paper is organized in various sections. Section 2 contains the description of the column and plant, and Section 3 describes the theoretical background and Section 4 describes the methodology for the online composition prediction. Finally Section 5 is the overall analysis for the online composition prediction. 2. Description of Crude Oil Processing Plant and Debutanizer Column The crude oil processing plant as seen in Fig. 1, consists of a refinery process, condensate fractionation and reforming aromatics section. The feedstocks of the refinery process are mainly crude oil while the products are petroleum products, liquefied 62 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Fig. 1. Block diagram for the oil refinery industry. petroleum gas, naphtha and low sulphur waxy residue. The refinery has two main process units, which are Catalytic Reforming Unit (CRU) and Crude Distillation Unit (CDU). The Crude Oil Terminal provides the feedstock and the crude oil is preheated using heat exchangers within the range of 190 1C – 210 1C. It is then further heated in a furnace to 340 1C – 342 1C before being routed to the CDU. The crude oil is separated into a number of fractions, which are heavy Straight Run Naphtha as overhead vapour, untreated kerosene, straight run kerosene and straight run diesel. From the crude tower, there are 3 sides cut streams, which are drawn to a stripper column and the stripper consists of a kerosene stripper, naphtha stripper and diesel stripper. From the CDU, the pretreater feed Heavy Straight Run Naphtha (HSRN) is mixed with hydrogen from the reformer and heated up to the reaction temperature using a heater and fed into the pretreater catalytic reactor. The reactions involved are denitrification and desulphurization, which will protect the reformer catalyst from poisoning. The product from the reactor is transferred to the pretreater stripper while the feed to the reforming unit is the bottom product of the stripper and the feed to the reformers reactors is the treated naphtha, which is heated to the reaction temperature. Effluent from the reactor is collected in a reformer separator where it is cooled. Some portion of the gas which is separated, is recycled to the reactor feed stream while the other portion is transferred to an absorber. In the absorber, at the raw naphtha feed, hydrogen gas is purged and recycled to the pretreater heater. The feed into the LPG absorber is liquid phase where it is drawn off and the liquid fraction is pumped into a stabiliser. Before being sent to storage, reformate is withdrawn from the stabiliser bottom for cooling. From the stabiliser reflux drum, overhead vapours from the stabiliser are cooled, condensed and recovered. The debutanizer column is the main column for producing the main product, which is the liquefied petroleum gas. The debutanizer column is located at the CDU section depicted top right in Fig. 1. The unit is used to recover light gases and LPG from the overhead distillate before producing light naphtha. The light gases Table 1 Column specification. Number of tray of the column Feed tray - stage number Type of tray used Column diameter Column height Condenser type Feed mass flowrate Feed temperature Feed pressure Overhead vapor mass flowrate Overhead liquid mass flowrate Condenser pressure Reboiler pressure 35 23 Valve 1.3 m 23.95 m Partial 44106 kghr " 1 113 1C 823.8 kPa 11286 kghr " 1 5040 kghr " 1 823.8 kPa 853.2 kPa mainly C2 is used to refine fuel gas and mixed with LPG. The feed to the debutanizer column which has 35 valve trays, is from the Deethanizer bottom product. The debutanizer condenser condenses the overhead vapor and the debutanizer overhead pressure control valves with two split ranges controls the overhead system. The reflux from the top of the debutanizer consists of the collected condensed hydrocarbon while reboiler section is used to strip lighter hydrocarbon. There are three manipulated variables for the column which are the feed flow rate, reflux flow rate and reboiler flow rate. The feed flow rate controls the feed to the column, the debutanizer reboiler control valve controls the reboiler temperature while the debutanizer bottom level controller controls the bottom product level. The debutanizer reflux control valve controls the ratio of the liquid and distillate flow rate at the top of the column. This column is a challenging process because it deals with non-linearity, is a highly multivariable process, involves a great deal of interactions between the variables, has lag in many of the control system, all of which makes it difficult system to be modeled by linear techniques. Hence non-linear methods such as the neural network equation based model is highly appropriate for this process. Table 1 outlines the 63 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 column specification while Table 2 describes in detail all the variables surrounding the column. The measured variables are the Feed flow, Pressure 1 (Debutanizer receiver overhead pressure), Flow 2 (LPG flow to storage), Flow 1 (Light Naphtha flow to storage), Level 2 (Debutanizer condenser level), Level 1 (Debutanizer level) and Temp 5 (Reboiler outlet temperature to column). The top and bottom compositions of the column are currently measured using laboratory sampling by gas chromatography. Fig. 2 shows the column configuration of the debutanizer column under study in this work. 3. Theoretical background Artificial Neural Network (ANN) is a popular and reliable tool when dealing with problems involving prediction of variables in engineering problems at the present age [14]. It comprises a great number of interconnected neurons that consists of a series of layers with a number of nodes. Every node receives a signal from the network link and the signal is added together before being applied to a specific transfer function to produce the output. The signal from the output will be sent to other node until it reaches the network output. Nodes called neuron are the basic processors Table 2 Description of the variables for the column. Tag Description Units Temp 1 Temp 2 Temp 3 Temp 4 Temp 5 Temp 6 Level 1 Level 2 Level 3 Level 4 Flow 1 Flow 2 Pressure 1 Debutanizer top temperature Debutanizer bottom temperature Debutanizer receiver bottom temperature Light Naphtha temperature after condenser E 1 Reboiler outlet temperature to column Debutanizer feed temperature Debutanizer level Debutanizer condenser level Debutanizer level indicator Condenser level indicator Light Naphtha flow to storage LPG flow to storage Debutanizer receiver overhead pressure 1C 1C 1C 1C 1C 1C % % % % m3/hr m3/hr kPa of neural network. Each connection between two nodes with a real value is called weight and the values of the weights are obtained by training a set of input and output correlations. The weights are adapted by the learning rule and it has long-term memory for the network. The advantage of ANN is in their ability to be used as an arbitrary function approximation mechanism that learns from observed data. However, using them is not so straightforward and a relatively good understanding of the underlying theory is essential. One of the main criteria is the choice of model and this will depend on the representation of data and its application. The second criteria is the learning algorithm where there are numerous trade-offs regarding these algorithms. Furthermore selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation to ensure the robustness of the selected model. If the model, cost function and learning algorithm are selected appropriately, the resulting ANN can be extremely robust and gives the correct implementation. It can be used naturally in online learning and large data set applications. However the main argument against the widespread use of the neural network is that it is a black box model and can only be represented by the NN structure and difficult to be represented by algorithmic equations which are cumbersome in nature. In this work, it can be shown that by the appropriate use of the activation functions and with proper pruning of the weights, an equation based neural network model can be obtained to be used in the prediction for the column compositions. The general equation for the output from the neural network can be given as (for a 3 layer network) i i i 1 2 3 y ¼ f ðLW 3;i f ðLW 2;i f ðIW 1;i p þ b Þ þ b Þ þb Þ IW 1;i ¼ input weight at layer 1 (input layer) b1 ¼ bias values at layer 1 LW 2;i ¼ layer weight at layer 2 (hidden layer) b2 ¼ bias values at layer 2 LW 3;i ¼ layer weight at layer 3 (output layer) b3 ¼ bias values at layer 3 p¼ vector inputs to the neural network Fig. 2. Debutanizer column configuration. ð1Þ 64 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 y¼vector outputs from the neural network i f ¼activation function at layer i This equation based neural network model is more robust and stable as compared to the black box based model, frequently used by researchers and practitioners and will be the highlight of our research work in this paper. PLS regression is a method that generalizes and combines features from principal component analysis and multiple regressions. This is very useful in data analyses for system which are collinear and have incomplete variables. The precision of PLS model is a function of the number of input variables. This is often useful in predicting a set of dependent variables (Y) from a large set of independent variables or predictors (X). PLS has been proven reliable in process monitoring and optimization prediction. PLS interpretation could indicate matrix vector multiplication to a set of bivariate regression. It provides the connection between two operations in algebra matrix and statistics. PLS has the ability to provide the foundation of a multivariable system. It could also demonstrate projection models as long as there is a similarity between the variables[15]. Based on PLS, the general regression equation is given as Y ¼ y þ XW nC þ F ð2Þ where y the variable average for Y, W nC are the loading weights and F is the residual in Y. The disadvantages of PLS with further increase in the size of the data sets is that we will start to see inadequacies in these multivariate methods, both in their efficiency and interpretability. PLS coefficients are of interest because it could be simplified when there are several components in the model but the disadvantages of the coefficients for the PLS equation is that information regarding the correlation structure among the response is unknown. Multivariate regression is the other conventional method used to obtain the relationship between the input variables, X and the output variable, Y. The Y can be predicted as a function of X by using an equation in the following form given as, Y 0 ¼ a þ bo X o þ b1 X 1 þ::: þ bn X n Temp 5, Pressure 1 and composition at both ends of the column. The simulated close loop response of the composition of n-butane at the top and bottom of the column was also established to compare with the online close loop data. The steady state for the column needs to be developed before transition of the steady-state to the dynamic state. Steady state simulations can be cast easily into dynamic simulations by specifying additional engineering details, including pressure/flow relationships and equipment dimensions. The necessary information such as feed conditions, feed compositions, reflux ratio, condenser pressure, reboiler pressure etc. have to be provided to the selected unit operation in the simulation. The simulation data was performed using similar steps test as in the plant to obtain the fluctuation of the process variable under open loop response, where the manipulated variables are reboiler and reflux flow rates. The data generated for the process is taken for 541 minutes with 1 minute sampling interval which amounts to a total data of 5410 as will be seen in later sections. These data that are available from actual plant are large and therefore need to be screened by performing principal component analysis (PCA) and partial least square (PLS), where the important variables for the column are obtained and are used for monitoring the composition of n-butane. Table 2 outlines all variables surrounding the column. For each of the step test, PCA is used to determine the important variables surrounding the column. Once we have determined the process variables, the important variables affecting the composition of n-butane is further analysed using PLS analysis. The raw process data generated are scaled down between 0.05 to 0.95 using the following equation:! " actual value " min value scaled value ¼ ð0:95 " 0:05Þ max value " min value þ min value ð4Þ Hence the actual value is then given by, actual value ðscaled value " min valueÞ ! ðmax value " min valueÞ ¼ ð0:95 "0:05Þ þ min value ð5Þ ð3Þ where Y’ is the predicted variable on the Y variable, a is the slope representing the predicted change in Y for a one unit increase in Xo [28]. The performance of regression analysis methods in practice depends on the form of the data generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes not testable if a large amount of data to be utilised. Regression models for prediction are still useful even when the assumptions are moderately violated, although they may not perform optimally. However, the main disadvantage in many applications, of these regression methods, is that it could give misleading results when causality exists on the observation data. 4. Methodology 4.1. Model data generation Although most online open loop response from the plant surrounding the column is available, some of the variables in open loop surrounding the column are not available. In this work, dynamic simulation of a debutanizer column is performed using the plant process simulator HYSYS to obtain the unavailable data sets from the plant where the variables that are not available are 4.2. Neural network, Partial least square (PLS) and Regression Analysis (RA) data sets One of the objective of this work is to develop composition predictions online using neural network, partial least square and regression analysis. The composition at the top and bottom for the column in the refinery is currently measured using normal laboratory sampling. Therefore neural network, PLS and RA are used as alternative online methods to predict the composition as they are expected to produce more robust, stable and precise results at a faster period. Open loop responses of the reboiler and reflux data set, which include the composition of n-butane, are used to develop the dynamic neural network architecture. The selected input variables to the network are time delayed including the composition of n-butane since the models are dynamic in nature and the outputs are the future predictions of n-butane. The numbers of past values for each input variable are considered to be only 1. These past values are determined by trial error method and it is found that this past value for each variable gives the optimum performance and also reduces the complexity of the dynamic model. The type of dynamic network used for this case is the Nonlinear Autoregressive Network with Exogenous inputs (NARX) while the training algorithm used is the Levenberg-Marquardt method. In addition, the adaptation learning function with momentum is used and the performance function evaluated is the mean square error criteria. 65 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 required inputs and outputs are fed to the regression analysis, it will calculate the predicted output, the equation for RA and the residual analysis. The regression is based on multivariate linear equation and these input variables are generally shown in Eq. 3 in terms of the X variable. The data sets are partitioned into 2 sets which are classified as training and validation sets with 65% data for training and 35% for validation. The network training and validation are achieved by using the mean square performance with specified number of epoch (training cycle). The number of inputs to the network is 10 and the outputs are 2 and the transfer function is linear for all the entire layers. The architecture consists of 3 layers which are the input, hidden and output layer. The weights and biases value used in the neural network equation are obtained after training and validation of the neural network. The hidden nodes are selected by trial and error method. The neural network is trained with an initial guess of the hidden nodes at 8 and then the number of hidden nodes is increased by a factor of 2 till the hidden nodes achieves a value of 40. The Root Mean Square Error (RMSE) is then monitored and the one with the lowest RMSE value is selected for determining the final number of hidden nodes. Fig. 3 shows the profile of RMSE with the change in the number of hidden nodes in the hidden layer. Analysis of variance (ANOVA) for NN is also done by using the Statistical Toolbox in MATLAB using the F test statistics method. In this work, the number of neurons which gives optimum predictions of the outputs is found to be 10 nodes as seen in the Fig. 3. Table 3 shows the important variables involved for the neural network where the open loop responses of the reboiler flow rate and reflux flow rate data set are obtained from plant and simulation. The simulated data is the composition of n-butane and the rest of the variables are obtained from actual plant data. The inputs for the neural network are obtained from mv2(k) to p_bot(k-1) while the outputs are the variable p_top(kþ1) and p_bot(kþ1) decided by data pretreatment using PCA and PLS as mention earlier. Multivariate data are measured based on observations and variables from the step tests in the input variables and the data generated for PLS is similar to the data generated for NN. PLS analysis are performed using the multivariate software called SIMCA-P. There are 2 important variables classified which are the primary variable and the observation variable. The primary variable consists of 10 variables surrounding the column and the observation variables are the top and bottom n-butane composition. Once the work set has been developed, the PLS model will be fitted with the Partial Least Square equation and it involves the loading weight and residual in terms of the composition of nbutane and average value of the composition of n-butane. The data generated for Regression Analysis (RA) is also similar to the data generated for NN and PLS. The data for regression are analyzed using the data analysis tool in Excel. The important elements of the RA modeling is the range of inputs and outputs of the data analyzed where the confidence level is set at 95%. Once all the 4.3. Model adequacy test for NN, PLS and RA models The performances and comparison of the predictions by the different methods are determined using the Root Mean Square Error method. (RMSE) given by; sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðxmeasured " xpredicted Þ2 RMSE ¼ ð6Þ N Correct Directional Change (CDC) measures the accuracy of a model in its prediction of the subsequent actual change of a predicted variable. The formula of CDC is given below as; CDC ¼ 100 N ∑D N i i ð7Þ where the formula of Di is defined as: Di ¼ yi ! yi The best known information criterion is the Akaike information criterion (AIC) and Bayesian information criteria (BIC) which is given below as; AIC ¼ MSE þ s2 BIC ¼ MSE þ 2K T ð8Þ log ðNÞs2 2K T ð9Þ Table 3 Variables involved in the PLS analysis, regression analysis and neural network. mv2 (k) mv2 (k-1) mv3 (k) mv3 (k-1) f (k) f (k-1) p_top (k) p_top (k-1) p_bot (k) p_bot (k-1) p_top (k þ1) p_bot (kþ 1) Manipulated reboiler flow rate Lag mv2 Manipulated reflux flow rate Lag mv3 Debutanizer feed temperature Lag feed temperature Top composition n-butane Lag top composition Bottom composition n-butane Lag bottom composition Future predictions top composition n-butane Future predictions bottom composition n-butane RMSE profile of n-butane 0.0012 0.001 RMSE 0.0008 0.0006 0.0004 0.0002 0 8 10 12 14 16 18 20 22 24 26 28 30 32 34 Number of hidden nodes bottom top Fig. 3. Profile of the RMSE with respect to number of hidden nodes. 36 38 40 66 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 The coefficient of determination which also determines the measure of fit is defined as below; 5.1. Step tests for reboiler flow rate ð10Þ 1 N jF t " At j ! 100% ∑ N i ¼ 1 At ð11Þ ∑Tt ¼ L ðyt "y t Þ2 Mean Absolute Percentage Error (MAPE) is measure of accuracy in a fitted time series value, given by; MAPE ¼ Pearson Correlation Coefficient (Cp), measures the goodness of the regression fit: the closer the value to one indicate higher accuracy as given below; S ∑N j ¼ 1 ðE p;j " E p;j ÞðE a;j " E a;j Þ C p ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 NS 2 S ∑N j ¼ 1 ðE p;j "E p;j Þ ∑j ¼ 1 ðE a;j "E a;j Þ ð12Þ Theil’s Inequality Coefficient (TIC), measures the model evaluation for the difference between output model and the actual output is considered as the error given below; qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ^ 2 ∑N i ¼ 1 ðyi " yi Þ TIC ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ^2 ∑N ∑N i ¼ 1 yi þ i ¼ 1 yi ð13Þ Based on the statistical analysis described above, the criteria for its acceptable performance is decided on the deviation between actual and composition prediction by NN, PLS and RA established as follows; low RMSE, CDC approaching 100, small AIC and BIC, R2 approaching 1, low MAPE, CP approaching 1 and low TIC value. Eqs. 6–10 are obtained from [29], Eq. 11 are obtained from [30], Eq. 12 are obtained from [31] and Eq. 13 obtained from [24] for this work. 5.2. Online close loop composition validation and simulation Figs. 8 and 9 represent the differences between online and simulation of the top and bottom composition of the n-butane in the column under normal operating condition. The calculated Root Mean Square Error (RMSE) for the top and bottom composition is 0.0251 and 0.0082 respectively and the Mean Square Error for top and bottom compositions is 0.00063 and 6.697 ! 10 " 5 respectively for n-butane. These result shows that there is a small Step test Temp 1 145 Reboiler flow rate (m3/hr) Figs. 4–7 show some of the step tests of the reboiler flow rate data sets. In order to generate the input-output data for the neural network training, various step changes are applied to the inputs to obtain the corresponding outputs in which the inputs for this system is the reboiler flow rate. The step test of the reboiler flow rate which is one of the manipulated variable are generated by using multi amplitude rectangular pulse [32]. The step test is important to observe the effect and the fluctuations of the process variable when performing changes to the reboiler flow rate. The fluctuations of Temp 1, Flow 1 and Pressure 1 (see Figs. 4–6) increases and decreases as the reboiler flow rate changes as shown in these figures. Level 1 (see Fig. 7) has no effect to the fluctuations as the step test of the reboiler flow rate changes which indicates that level does not effect the composition of n-butane. The step test for the reflux flow rate, the other manipulated variable, is also done in the same way but only the step tests for the reboiler flow rate are shown in this paper. 62 60 144 58 143 56 142 54 141 52 140 1 50 99 148 197 246 295 344 393 442 491 Temperature (0C) bt Þ2 ∑Tt ¼ L ðyt " y 50 540 Time (min) Reboiler.Flow Temp 1 Fig. 4. Temp 1 Debutanizer top temperature. Step test Flow 1 Reboiler flow rate (m3/hr) 145 144 143 142 141 140 1 50 99 148 197 246 295 344 393 Time (min) Reboiler.Flow Flow 1 Fig. 5. Flow 1 Light Naphtha flow to storage. 442 491 45 40 35 30 25 20 15 10 5 0 540 Flow rate (m3/hr) R2 ¼ 1 " 5. Results and Discussion 67 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Step test Pressure 1 850 144 800 143 750 142 700 141 650 140 1 50 99 148 197 246 295 344 393 442 491 Pressure (kPa) Reboiler flow rate (m3/hr) 145 600 540 Time (min) Reboiler.Flow Pressure 1 Fig. 6. Pressure 1 Debutanizer receiver overhead pressure. 80 144 75 70 143 65 142 60 141 Level (%) Reboiler flow rate (m3/hr) Step test Level 1 145 55 140 1 50 99 148 197 246 295 344 393 442 491 50 540 Time (min) Reboiler.Flow Level 1 Fig. 7. Level 1 Debutanizer level. Top composition n-butane Composition (mole fraction) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 2000 4000 6000 8000 10000 12000 Time (min) simulation online Fig. 8. Top composition n-butane close loop. deviation between the online and simulation data and the purpose of the close loop response is to validate between the online and simulation data. Once the close loop results has been verified with the simulation results, then the open loop response for the variables that is not available from the plant could be obtained in simulation, based on the same step size of the manipulated variable from the plant, which involve variables such as Temp 5, Pressure 1 and composition. The combined data consisting of the plant and simulation data are then used to developed the neural network model, represented by the equations as will be shown in the next section. Similar data sets are also used to generate the PLS and regression models for comparison with the neural network predictions for the top and bottom n-butane compositions. 5.3. Neural network, PLS and RA modeling 5.3.1. Neural network Equation-based model As mention in section 4.2, the final configuration of the neural network model obtained from the training and validation exercise is given to be of a 10-10-2 network. By applying the general Eq. (1) for this network with the linear activation function, we get the following equation for the top and bottom composition prediction of n-butane 68 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Bottom composition n-butane Composition (mole fraction) 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0 2000 4000 6000 8000 10000 12000 Time (min) simulation online Fig. 9. Bottom composition n-butane close loop. where y1 refers to top composition and y2 refers to the bottom composition; " # h i y1 1 1 2 y¼ ¼ LW 2;1 f IW 1;1 p þ b þ b ð14Þ y2 The values of the inputs weights IW 1;1 , layer weightLW 2;1 , b1 and b2, obtained after validation are given in the Appendix. A. Here p is the inputs to the neural network and for this case study is given by the vector, h p ¼ mv2ðkÞ mv2ðk "1Þ mv3ðkÞ mv3ðk " 1Þ f ðkÞ f ðk "1Þ ptop ðkÞ iT ptop ðk " 1Þ pbot ðkÞ pbot ðk " 1Þ On applying the values of the respective weights and biases for the validated optimum neural network model for Eq. (14) and with further pruning of the values, we get the following equation to represent the neural network model for the composition prediction as in equation below ie; " # y1 y¼ y2 $ % " 0:29 0:15 0:37 0:23 0:38 0:40 " 0:50 0:97 0:12 " 0:31 ¼ p " 0:09 0:006 0:31 " 0:10 0:02 " 0:019 " 0:42 " 0:12 0:36 " 0:08 $ " 0:28 " 0:22 % 5.3.2. PLS model After validation, The equation of PLS for prediction of n-butane at top composition is given as 2 IW 1;1 ¼ input weight at layer 1 (input layer) b1 ¼biases value at layer 1 LW 2;1 ¼ layer weight at layer 2 (hidden layer) b2 ¼biases value at layer 2 þ complex structure of the neural network, normally difficult to use in an online measurement system. ð15Þ This Eq. (15) is obtained by simplifying the general Eq. 1 by considering only the hidden layer with inputs weights IW 1;1 , and the output layer with the layer weight LW 2;1 Initially the matrix input IW 1;1 is multiplied with the input vector, p and added to biases value b1. Since the activation function of f1 is determined as unity, the resulting matrix is then multiplied to layer weight 2, LW 2;1 and added to biases value at layer 2, b2. By pruning out the small resulting values, the equation is then simplified to the version in Eqn (15). This Eq. (15) is a multi input multi output equation based representation of the neural network model for composition prediction of the debutanizer column. This equation is robust in nature and can be easily used as an online estimation for composition in the column, without having to resort to use of 2 3 mv2 ðkÞ 6 7 6 mv2 ðk " 1Þ 7 6 7 6 mv3 ðkÞ 7 6 7 6 7 6 mv3 ðk " 1Þ 7 6 7 6 f ðkÞ 7 6 7 Y 1;PLS ¼ 0:1335 þ 6 7 6 f ðk " 1Þ 7 6 7 6 p_top ðkÞ 7 6 7 6 7 6 p_top ðk " 1Þ 7 6 7 6 p_bot ðkÞ 7 4 5 p_bot ðk " 1Þ 3 " 0:003 6 0:0007 7 6 7 6 7 6 " 0:0006 7 6 7 6 " 0:001 7 7 2 3 6 0:07 6 7 6 " 0:001 7 6 " 0:07 7 6 7 6 7 6 " 0:0007 7 6 7 6 7 6 " 0:06 7 6 7 6 7 6 " 0:0004 7 6 0:06 7 6 7 6 7 6 " 0:000076 7 6 7 6 7 6 " 0:06 7 6 7 6 7 6 7 6 " 0:11 7 þ 6 0:0003 7 6 7 6 7 7 6 7 6 0:001 7 6 0:06 7 6 7 6 7 6 0:002 7 6 " 0:01 7 6 7 6 7 6 7 6 7 6 0:004 7 4 0:68 5 6 7 6 0:018 7 6 " 0:83 7 6 7 6 0:004 7 6 7 6: 7 6 7 6 5 4: ð16Þ " 0:0003 and the equation of PLS for predictions of n-butane at the bottom composition is given as, 2 2 3 mv2 ðkÞ 6 7 6 mv2 ðk " 1Þ 7 6 7 6 mv3 ðkÞ 7 6 7 6 7 6 mv3 ðk " 1Þ 7 6 7 6 f ðkÞ 7 6 7 Y 2;PLS ¼ 0:05276 þ 6 7 6 f ðk " 1Þ 7 6 7 6 p_top ðkÞ 7 6 7 6 7 6 p_top ðk " 1Þ 7 6 7 6 p_bot ðkÞ 7 4 5 p_bot ðk "1Þ 3 " 0:004 6 " 0:004 7 6 7 6 7 6 " 0:004 7 6 7 6 " 0:003 7 7 2 3 6 0:002 6 7 6 7 6 " 0:0007 7 6 " 0:002 7 6 7 6 " 0:001 7 6 7 6 7 6 " 0:0012 7 6 7 6 7 6 " 0:0011 7 6 0:0004 7 6 7 6 7 6 " 0:0002 7 6 7 6 7 6 " 0:004 7 6 7 6 7 þ 6 0:001 7 6 0:002 7 6 7 6 7 6 7 6 7 6 0:002 7 6 0:06 7 6 7 6 7 6 0:002 7 6 0:17 7 6 7 6 7 6 7 6 7 6 0:001 7 4 1:64 5 6 7 6 " 0:0002 7 6 7 " 0:073 6 7 6 " 0:0017 7 6 7 6: 7 6 7 6 7 4: 5 ð17Þ 0:0012 The F residual for PLS equation consists of 301 data points for top and bottom composition. 69 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Table 4 ANOVA of the n-butane top composition for NN model. Table 5 ANOVA of n-butane bottom composition for NN model. Regression Statistics Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.9943 1.00 0.9884 0.0018 301 ANOVA df SS MS F Regression Residual Total 10 290 300 0.0906 0.0010 0.0917 0.0090 3.5399E-06 2562.012 Multiple R R Square Adjusted R Square Standard Error Observations 0.9526 1.00 0.9383 0.0060 301 Significance F ANOVA df SS MS F Significance F 2.3449E-276 Regression Residual Total 10 290 300 0.0467 0.0106 0.0573 0.0046 3.6617E-05 127.565 5.7343E-100 Y _1;RA ¼ 0:0008mv2 ðkÞ " 0:0007mv2 ðk " 1Þ þ 0:0004mv3 ðkÞ "0:0006mv3 ðk " 1Þ " 0:0011f ðkÞ þ 0:0019 f ðk " 1Þ þ 1:01p_top ðkÞ "0:051p_top ðk " 1Þ þ 0:002p_bot ðkÞ "0:01p_bot ðk " 1Þ " 0:078 ð18Þ each other. The standard deviation s could also be determined from the MS of the residual and has the value of 6.05 ! 10 " 3. The analysis for top and bottom composition based on ANOVA is used to determine the hypothesis between the actual and predicted value of n-butane composition. The F test in ANOVA provides a single test of the hypothesis that all the population is assume to be equal. The F test was used to access differences for a set of two group where the two groups are the regression and residual. Y 2;RA ¼ 0:0019mv2 ðkÞ " 0:0018mv2 ðk " 1Þ " 0:002mv3 ðkÞ 5.4. Comparison NN, PLS and RA 5.3.3. Regression model For the regression model, the equations for the top and bottom prediction n-butane are described below; þ0:001mv3 ðk " 1Þ þ0:004f ðkÞ " 0:006 f ðk " 1Þ þ 0:30p_top ðkÞ " 0:23p_topðk " 1Þ þ 0:81p_bot ðkÞ "0:059p_bot ðk " 1Þ þ 0:27 ð19Þ 5.3.4. Analysis of variance (ANOVA) results for neural network model 5.3.4.1. Top composition. From Table 4, the adjusted R2 is smaller than R2 value since to the number of cases is relatively small and the number of predictor variables is relatively large. There is a total of 301 samples data observations. The sum of square regression is calculated to be 0.0906 and the total sum of square is calculated to be 0.0917. The multiple R is calculated based on the square root of ratio between these 2 values. The multiple R is proportional to the total variance in the actual and predicted value. The standard error shows the ratio between the standard deviation to the square root of number of observations. The degree of freedom (df) is the variation between the sample size and number of groups with confidence level 95%. The sum of square (SS) consists of regression, residual and total. It is explained by the difference between each group mean and the overall mean. The value of mean squares (MS) are obtained from the ratio of the sum of the square (SS) to the degree of freedom (df). The F value is obtained from the ratio of MS of regression to MS of residual. From the ANOVA analysis outlined in Table 4, the F value obtained is 2562. It indicate that the between estimate groups is more than 2562 times the within group estimate. The standard deviation (s) may also be determined from the MS of residual and the s value is 1.88 ! 10 " 3. 5.3.4.2. Bottom composition. Table 5 also shows that the R2 value is greater that the adjusted R2 due the number of cases which is small and the number of predictor variables is large. The samples data observation consists of 301 data points. From the ANOVA analysis obtained in Table 5, the F value is 127. It indicates that the between groups estimate is more than 127 times the within group estimate. The significance F value is relatively very small so therefore the different population mean are recorded. The F value is larger than 1.83, which indicates that all the variables involved for composition prediction is important and related to Fig. 10 shows the observed versus predicted values of the top composition of n-butane as predicted by the neural network equation. It is apparent that all the points fall close to the 45 degree line. The calculated RMSE for the NN equation is 6.6 ! 10 " 7 were the square regression of one indicates excellent fit of data. Fig. 11 shows the composition line plot of the actual and neural network equation for n-butane top composition. Fig. 12 shows the observed versus predicted values of the bottom composition of n-butane from the NN equation. It is apparent that all the points fall close to the 45 degree line. The calculated RMSE for the NN equation is 3.88 ! 10 " 7. Fig. 13, shows the composition line plot of the actual and neural network equation for the n-butane bottom composition. The CDC value for top composition is calculated to be at 26.33 and for bottom composition is calculated to be 100 where high CDC value indicates better prediction. The regression value of R for top and bottom composition is 1 and thus the prediction between the actual and simulated is similar. The Cp value for bottom and top composition are calculated to be 1 and the MAPE for top and bottom are calculated to be 0.0005 and 0.00132 respectively. The TIC values for bottom and top composition are calculated to be 3.56 ! 10 " 6 and 2.45 ! 10 " 6 respectively. Fig. 14 shows the observed versus predicted values of the top composition of n-butane from using the PLS equation. It is apparent that all the points fall close to the 45 degree line. The calculated RMSE for the PLS equation is 0.002 with R2 is 0.9851 but the scattered data points around the regression line are an indication of poor prediction. Fig. 15, shows the composition plot of the actual and PLS equation n-butane top composition. Fig. 16 shows the observed versus predicted values of the bottom composition of n-butane from PLS equation. The calculated RMSE for the PLS equation is 0.0059 and the value of the R2 is 0.8117. Again scattered data points around the regression line are an indication of poor prediction by the PLS equation. Fig. 17, shows the composition plot of the actual and PLS equation n-butane bottom composition. The CDC value for top composition is calculated to 17.66 and for bottom composition is calculated to be 56.66. The regression value of R for top and bottom composition is 0.99 and 0.9 respectively with Cp value is almost close to 1. The Cp value for 70 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 NN equation top composition n-butane Predicted composition (mole fraction) 0.2 R2 = 1 0.19 0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Actual composition (mole fraction) Fig. 10. Prediction versus actual value neural network equation top composition n-butane. Neural network prediction top composition n-butane Composition (mole fraction) 0.25 0.2 0.15 0.1 0.05 0 0 50 100 150 200 250 300 Time (min) Actual NN eq Fig. 11. Prediction and actual value for top composition n-butane. NN equation bottom composition n-butane Predicted composition (mole fraction) 0.09 0.08 2 R =1 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Actual composition (mole fraction) Fig. 12. Prediction and actual value equation based neural network bottom composition n-butane. bottom and top composition are calculated to be 0.9 and 0.99 respectively and the MAPE for top and bottom are calculated to be 0.034 and 0.97. The TIC values for bottom and top composition are calculated to be 5.51 ! 10 " 2 and 7.9 ! 10 " 3 respectively. Fig. 18 shows the observed versus predicted values of the n-butane top composition using regression analysis equation. Most of the data points falls close to the 45 degree line but with more scatter than the neural network case. The calculated RMSE for the regression equation is 0.0021 and the value of the R2 is 0.9888. Fig. 19 shows the composition plot of the actual and RA equation of the n-butane top composition. Fig. 20 shows the observed versus predicted values of the n-butane bottom 71 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Neural network prediction bottom composition n-butane Composition (mole fraction) 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 50 100 150 200 250 300 Time (min) Actual NN eq Fig. 13. Prediction and actual value for bottom composition n-butane. PLS equation top composition n-butane Predicted composition (mole fraction) 0.2 R2 = 0.9851 0.19 0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Actual composition (mole fraction) Fig. 14. Prediction versus actual value equation based PLS top composition n-butane. PLS prediction top composition n-butane Composition (mole fraction) 0.25 0.2 0.15 0.1 0.05 0 0 50 100 150 200 250 300 Time (min) Actual PLS eq Fig. 15. Prediction and actual value for top composition n-butane. composition using regression analysis equation. The points are scattered as shown in the figure by the RA equation. This indicates poor prediction by the RA equation. The calculated RMSE for the normal regression equation is 0.0064 and the value of the R2 is 0.8148. Fig. 21 shows the composition plot of the actual and RA equation n-butane bottom composition. The CDC value for top composition is calculated to 17.33 and for bottom composition is calculated to be 56.66. The Cp value for bottom and top composition are calculated to be 0.89 and 0.99 respectively. The MAPE for top and bottom are calculated to be 0.058 and 2.67. The TIC values for RA prediction bottom and top composition are calculated to be 5.46 ! 10 " 2 and 6.86 ! 10 " 3 respectively. The Akaike information criteria (AIC) is related to the square of residual to the number of free model parameters. The purpose is to weigh the error of the model against the number of parameters. The BIC is similar to AIC except that it is motivated by the Bayesian model selection principles. The AIC values depend on the mean square error, the variance, the number of free model parameter 72 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 PLS equation bottom composition n-butane Predicted composition (mole fraction) 0.08 0.07 0.06 R2 = 0.8117 0.05 0.04 0.03 0.02 0.01 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Actual composition (mole fraction) Fig. 16. Prediction versus actual value equation based PLS bottom composition n-butane. PLS prediction bottom composition n-butane Composition (mole fraction) 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 50 100 150 200 250 300 Time (min) Actual PLS eq Fig. 17. Prediction and actual value for bottom composition n-butane. RA equation top composition n-butane Predicted composition (mole fraction) 0.2 R 2 = 0.9888 0.19 0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 Actual composition (mole fraction) Fig. 18. Prediction versus actual value equation based RA top composition n-butane. and number of parameter. The BIC values depend on the mean square error, the variance, number of observation, the number of free model parameter and number of parameter. The AIC and BIC predicted by NN for top composition is calculated to be 2572 and 2555 respectively while the AIC and BIC for bottom composition calculated to be 1957 and 1942 respectively. The AIC and BIC predicted by PLS for top composition is calculated to be 2573 and 2558 respectively. The AIC and BIC for bottom composition calculated to be 2073 and 2059 respectively. The AIC and BIC predicted by RA for top composition calculated to be 2580 and 2560 respectively and the AIC and BIC for bottom composition, calculated to be 2074 and 2058 respectively. These values can be seen in Table 6, which shows that the neural network equation with smaller AIC and BIC values, still gives the optimum prediction even with slight extra parameters in its formulation. From the statistical analysis outlined in Table 6, NN equation give better prediction for the n-butane composition than PLS equation and RA equation as the calculated RMSE is small, CDC is high, R2 is close to 1, MAPE is close to 0, Cp is close to 1 and TIC close to zero. The CDC values for NN are larger compared to PLS 73 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Regression prediction top composition n-butane Composition (mole fraction) 0.25 0.2 0.15 0.1 0.05 0 0 50 100 150 200 250 300 Time (min) Actual RA eq Fig. 19. Prediction and actual value for top composition n-butane. Predicted composition (mole fraction) RA equation bottom composition n-butane 0.08 0.07 0.06 2 R = 0.8148 0.05 0.04 0.03 0.02 0.01 0 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Actual composition (mole fraction) Fig. 20. Prediction versus actual value equation based RA bottom composition n-butane. Regression prediction bottom composition n-butane 0.09 Composition (mole fraction) 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 50 100 150 200 250 300 Time(min) Actual RA eq Fig. 21. Prediction and actual value for bottom composition n-butane. and RA and the high CDC indicates that the subsequent actual change of the predicted variable is high. The R and Cp value for NN is the optimum performance as the neural network prediction matches the actual data. The MAPE values indicate that NN prediction is the optimum as the values are closest to 0 compared to PLS and RA where the MAPE values are larger. When having a perfect fit, MAPE is zero. The percentage error calculated for MAPE is to compare the error of the fitted time series. The difference between actual value and predicted value divided by the actual value determine the MAPE. The absolute value is summed for every value fitted in time and divided again by the number of fitted points. The TIC values indicate the prediction by NN is the best as the TIC values are small as compared to PLS and RA. These statistical analyses proves that the prediction by the proposed NN model gives optimum performance, better than the other conventional methods. 74 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Table 6 Statistical analysis of NN equation, PLS equation and RA equation for top and bottom n-butane predictions. rmse_bottom rmse_top CDC_bottom CDC_top R_bottom R_top AIC_bottom AIC_top BIC_bottom BIC_top MAPE_bottom MAPE_top Cp_bottom Cp_top TIC_bottom TIC_top NN eq PLS eq RA eq 3.88E-07 6.6E-07 100 26.33 1 1 1957.26 2572.72 1942.43 2555.89 0.00132 0.0005 1 1 3.56E-06 2.45E-06 0.0059 0.0020 56.66 17.66 0.90 0.99 2073.63 2573.78 2059.8 2558.96 0.97 0.034 0.90 0.99 5.51E-02 7.90E-03 0.0064 0.0021 56.66 17.33 0.89 0.99 2074.26 2580.29 2058.44 2560.46 2.67 0.058 0.89 0.99 5.46E-02 6.86E-03 NN eq PLS eq RA eq 5 second 45 second 1 minute Table 7 Computing time. Computing time Residual analysis top composition equation NN, PLS and RA 6.70E-07 Residual composition (mole fraction) 0.02 NN 0.015 0.01 6.68E-07 PLS RA 6.66E-07 6.64E-07 0.005 6.62E-07 0 -0.005 0 50 100 150 200 250 300 6.60E-07 -0.01 6.58E-07 -0.015 6.56E-07 NN residual composiiton (mole fraction) 6.72E-07 0.025 6.54E-07 -0.02 Time (min) PLS RA NN Fig. 22. Residual analysis for neural network equation, PLS equation and regression analysis equation top composition n-butane. The difference in computing time using these different approaches are shown in Table 7 where the NN model takes less than 5 seconds to compute which is faster than the PLS (45 seconds) and RA method (1 minute). Hence it is suitable for online measurement since the industrial method takes more than 1 day to analyse and compute. 5.5. Residual analysis Figs. 22 and 23 show the residual of the neural network equation, PLS equation and normal regression equation for top and bottom composition n-butane respectively. From the plot, the residual of the neural network equation is smaller compared to the PLS equation and NR equation. This shows that neural network is able to predict the top and bottom composition n-butane with high accuracy with small error compared to the PLS and RA. Residual analysis is very important to evaluate the deviation between actual and prediction for all the three models. 6. Conclusion This paper presents the prediction of the composition of nbutane at the top and bottom of a debutanizer column using the equation based neural network model which is then compared to other methods such as PLS and regression analysis. All of the results gives optimum results in predicting the n-butane compositions but it can be concluded that NN equation gives the best nbutane prediction compared to other models based on the statistical analyses. This proposed equation based NN model is useful for online composition prediction since it is robust, versatile with fast computing time and hence can be easily applied as a soft sensor for the distillation column. It could also easily be further 75 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 Residual analysis bottom composition equation NN, PLS and RA 0.015 4.04E-07 PLS RA 4.02E-07 4.00E-07 0.01 3.98E-07 0.005 3.96E-07 0 -0.005 0 50 100 150 200 250 3.94E-07 3003.92E-07 3.90E-07 -0.01 3.88E-07 -0.015 -0.02 3.86E-07 NN NN residual compositon (mole fraction) Residual composition (mole fraction) 0.02 3.84E-07 3.82E-07 -0.025 Time (min) PLS RA NN Fig. 23. Residual analysis for neural network equation, PLS equation and regression analysis equation bottom composition n-butane. Table A1 Input weight and biases value for n-butane with partition. input weight 1,1 for the first layer " 0.86 " 0.55 0.23 0.70 0.74 0.95 " 0.81 " 0.25 " 0.65 0.13 " 0.82 0.51 " 0.40 " 0.59 " 0.22 0.05 0.59 0.17 0.92 0.11 0.98 0.36 0.11 0.30 " 0.72 " 0.95 0.44 0.25 " 0.47 0.96 b1¼ biases at layer 1 " 0.09 " 0.08 " 0.16 " 0.17 0.34 0.31 0.45 0.16 1.01 " 0.07 0.34 0.97 0.21 0.81 0.63 0.91 " 0.43 " 0.24 " 0.07 0.85 0.97 " 0.62 " 0.19 " 0.13 " 0.77 0.77 " 0.50 0.72 " 0.71 " 0.63 0.96 0.23 " 0.45 0.95 " 0.62 0.08 0.35 " 0.89 " 0.37 " 0.73 0.04 0.15 0.34 0.39 " 0.77 0.68 " 0.80 0.84 0.64 " 0.82 0.65 " 0.36 0.63 " 0.56 " 0.30 0.81 0.36 0.13 " 1.08 0.77 " 0.16 " 0.13 0.71 " 0.40 0.50 " 0.10 0.86 " 0.93 0.14 0.66 layer weight 2,1 for the second layer 0.28 " 0.09 " 0.17 0.24 0.07 0.18 " 0.11 0.53 0.43 " 0.10 " 0.20 0.37 " 0.32 0.86 " 0.84 " 0.57 b2¼ biases at layer 2 " 0.11 0.02 " 0.50 0.24 0.16 " 0.07 applied as an inverse controller in the equation form especially for nonlinear system, where linear controllers are not able to perform successfully. This proposed model based NN method is also easier to visualize and applied for various applications as compared to method of using the black box neural network structure which is cumbersome and non-portable in nature. Furthermore it is MIMO based model that can predict both the top and bottom composition through the use of a single vector equation. Acknowledgment The authors would like to acknowledge PETRONAS for providing the required data and information for the research and University Malaya for providing the research grant (PS107/2010B). Appendix A See Table A1. References [1] J.F. Canete, S. Gonzalez-Perez, P. Saz-Orosco, Artificial Neural Network Identification and Control of a Lab-Scale Distillation Column using LABVIEW, International Journal of Intelligent Systems and Technologies 3 (2008) 111–116. [2] M.M. Zhang, X.G. Liu, A soft sensor based on adaptive fuzzy neural network and support vector regression for industrial melt index prediction, Chemometrics and Intelligent Laboratory Systems 126 (2013) 83–90. [3] Z.B. Yan, X.G. Liu, Soft sensing and optimization of pesticide waste incinerator, Asia Pacific Journal of Chemical Engineering 7 (2012) 635–641. " 0.63 0.35 0.04 0.55 0.67 " 0.37 0.57 " 0.08 0.29 " 1.08 [4] J. Shi, X.G. Liu, Melt prediction by neural soft-sensor based on multi scale analysis and principal component analysis, Chinese Journal of Chemical Engineering 13 (2005) 849–852. [5] X.G. Liu, C.Y. Zhao, Melt index prediction based on fuzzy neural network and PSO algorithm with online correction strategy, American Institute Chemical Engineering Journal 58 (2012) 1194–1202. [6] K.o. Ma Ming-Da, Wang Jing-Wei, San-Jang, Wu Ming-Feng, Jang Shi Shang, Shieh Shyan-Shu,Wong David Shan-Hill, Development of adaptive soft sensor based on statistical identification of key variables, Control Engineering Practice 17 (2009) 1026–1034. [7] V. Prasad, B. Wayne Bequette, Nonlinear system identification and model reduction using artificial neural networks, Computer and Chemical Engineering 27 (2003) 1741–1754. [8] L. Fortuna, S. Graziania, M.G. Xibilia, Soft sensors for product quality monitoring debutanizer distillation columns, Control Engineering Practice 13 (2005) 499–508. [9] V. Singh, I. Gupta, H.O. Gupta, ANN-based estimator for distillation using Levenberg Marquardt approach, Engineering Applications of Artificial Intelligence 20 (2007) 249–259. [10] A. Zilochian, K. Bawazir, Application of Neural Network in Oil Refineries, CRC Press, 2001. (Chapter 7). [11] M.A. Hussain, Review of the application of neural networks in chemical process control – simulation and online implementation, Artificial Intelligence in Engineering 13 (1999) 55–68. [12] Y. Xuefeng, Hybrid artificial neural network based on BP-PLSR and its application in development of soft sensors, Chemometrics and Intelligent Laboratory Systems 103 (2010) 152–159. [13] I.M. Mujtaba, M.A. Hussain, Optimal Operation of Dynamic Processes Under Process-Model Mismatches: Application to Batch Distillation, Computers Chemical Engineering 22 (1998) 621–624. [14] M.A.I.M. Greaves, I.M. Mujtaba, M. Barolo, A. Trotta, M.A. Hussain, Neural Network approach to dynamic optimization of batch distillation Application to a Middle-vessel Column, Trans IChemE 81 (2003) 393–401. [15] L. Eriksson, E. Johansson, N. Kettaneh-Wold, J. Trygg, C. Wilstrom, S. Wold, Multi and Megavariate Data Analysis Part I Basic Principles and Applications, 2nd edition, Umetrics Academy, 2006. 76 N. Mohamed Ramli et al. / Neurocomputing 131 (2014) 59–76 [16] M. Kano, K. Miyazaki, S. Hasebi, I. Hashimoto, Inferential control system of distillation compositions using dynamic partial least squares regression, Journal of Process Control 10 (2000) 157–166. [17] E. Zamprogna, M. Barolo, D.E. Seborg, Estimating product composition profile in batch distillation via partial least square, Chemical Engineering Practice 12 (2004) 917–929. [18] S. Park, C. Han, A nonlinear soft sensor based on multivariate smoothing procedure for quality estimation in distillation columns, Computer and Chemical Engineering 24 (2000) 871–877. [19] Zhang Zhang Yingwei, Yang, Complex monitoring using modified partial least square method of independent component regression, Chemometrics and Intelligent Laboratory Systems 98 (2009) 143–148. [20] R. Sharmin, U. Sundararaj, S. Shah, L.V. Griend, Y.J. Sun, Inferential sensors for estimation of polymer quality parameters: Industrial application of a PLS based soft sensor for a LDPE plant, Chemical Engineering Science 61 (2006) 6372–6384. [21] P. Facco, F. Doplicher, F. Bezzo, M. Barolo, Moving average PLS soft sensor for online product quality estimation in an industrial batch polymerization process, Journal of Process Control 19 (2009) 520–529. [22] Song Zhihuan Ge Zhiqiang, A comparative study of just in time learning based methods for online soft sensor modeling, Chemometrics and Intelligent Laboratory Systems 104 (2010) 306–317. [23] J. Shi, X.G. Liu, Melt index prediction by weighted least square support vector machines, Journal of Applied Polymer Science 101 (2006) 285–289. [24] J. Shi, X.G. Liu, Y.X. Sun, Melt index prediction by neural network based on independent component analysis and multi scale analysis, Neurocomputing 70 (2006) 280–287. [25] J.B. Li, X.G. Liu, Melt index prediction by RBF neural network optimized with an adaptive new ant colony optimization algorithm, Journal of Applied Polymer Science 119 (2011) 3093–3100. [26] H.Q. Jiang, Z.B. Yan, X.G. Liu, Melt index prediction using optimized least square support vector machines based on hybrid particle swarm optimization algorithm, Neurocomputing 119 (2013) 469–477. [27] H.Q. Jiang, Y.D. Xiao, J.B. Li, X.G. Liu, Prediction of melt index based on relevance vector machine with modified particle swarm optimization, Chemical Engineering and Technology 35 (2012) 819–826. [28] R.M. Warner, Applied Statistics, Sage Publication, 2008. [29] Ramli Siti Aizura, Study of Neural Network for Heat exchanger with development of graphical user interface, Thesis Universiti, Teknologi PETRONAS, 2006. [30] J. Wan, M. Huang, Y. Ma, W. Guo, Y. Wang, H. Zhang, W. Li, X. Sun, Prediction of effluent quality of a paper mill wastewater treatment using an adaptive networkbased fuzzy inference system, Applied Soft Computing 11 (2011) 3238–3246. [31] R. Sharma, K. Singh, D. Singhal, R. Ghosh, Neural network applications for detecting process faults in packed towers, Chemical Engineering and Processing 43 (2004) 841–847. [32] J.S. Lim, M.A. Hussain, M.K. Aroua, Control of a hydrolyzer in an oleochemical plant using network based controllers, Neurocomputing 73 (2010) 3242–3255. Nasser Mohamed Ramli is a PhD student in the Chemical Engineering Department, Faculty of Engineering, University of Malaya. He obtained his bachelor’s degree in chemical engineering from Loughborough University, United Kingdom and his master’s degree from University of Queensland, Australia. His area of research is in artificial intelligence, process modeling and control. Dr Mohd Azlan Hussain joined the Department of Chemical Engineering, University of Malaya in 1987 as a lecturer and obtained his Ph.D in Chemical Engineering from Imperial College, London in 1996. He is a member of the American Institute of Chemical Engineers and British Institute of Chemical Engineers. At present he is holding the post of Professor in the department of chemical Engineering. His main research interests are in modelling, process controls, nonlinear control systems analysis and applications of artificial intelligence techniques in engineering systems. He has published more than 250 papers in book chapters, journals and conferences within these areas at present. He has also publish and edited a book on “Application of Neural Networks and other learning Technologies in Process Engineering” published by Imperial College Press in 2001. Dr Badrul Mohamed Jan, SPE is a researcher and academic lecturer attached to the Department of Chemical Engineering, University of Malaya, Malaysia. He holds a BS, MS and PhD degrees in petroleum engineering from New Mexico Institute of Mining and Technology. Jan’s research areas and interest include the development of super lightweight completion fluid for underbalance perforation, ultra low interfacial tension microemulsion for enhanced oil recovery, and conversion of palm oil mill effluent into super clean fuel for diesel replacement. He has worked closely with industry in oil and gas project such as 3 M Asia Pacific and BCI Chemical Corporation. He has also published numerous technical conference and journal papers. Jan is the deputy director of University Malaya Center of Innovation & Commercialization. His responsibilities include providing an environment at the University of Malaya conducive to researchers bringing their research outputs to a commercialization-ready level. Dr Bawadi Abdullah is a Senior Lecturer in the Chemical Engineering Department, Faculty of Engineering, Universiti Teknologi PETRONAS. He is also a Professional and a Charted Engineer. He obtained his bachelor’s degree in chemical engineering from University of Wales, Swansea, United Kingdom and master’s degree from Dalhousie University, Canada. He obtained his PhD degree from University of the New South Wales, Australia. He teaches at undergraduate level courses such as Transport Phenomena, Chemical Engineering Thermodynamics and Chemical Analysis. His area of research is reaction engineering.