The data and complementary material for this benchmark was kindly provided by dr. Arthur Kordon, Advanced Analytics leader at The Dow Chemical Company, and the author of "Applying Computational Intelligence: How to Create Value."
A significant problem in the chemical industry is the optimal handling of intermediate products. Of special interest are cases where intermediate products from one process can be used as raw materials for another process in different geographical locations. The case study is based on a real industrial application of intermediate products optimization between two plants in the Dow Chemical Company, one in Freeport, Texas and the other in Plaquemine, Louisiana. The objective is to maximize the intermediate product flow from the plant in Texas and to use it effectively as a feed in the plant in Louisiana. The experience of using a huge fundamental model for “what- if” scenarios in planning the production schedule was not favorable because of the specialized knowledge required and the slow execution speed (~20-25 min/prediction). Empirical emulators are a viable alternative to solve this problem. The objective is to develop an empirical model which emulates the existing fundamental model with acceptable accuracy (with R2 ~ 0.9) and which can significantly speed up the calculation time (< 1 sec).
Ten input variables (different product flows) were selected by the experts from several hundred parameters in the fundamental model. The two output variables represent the amount of intermediate product output (in pounds per hour). The two responses need to be predicted to be used subsequently in process optimization (the paper below uses 12 outputs). The assumption was that the behavior of the process can be captured with these most significant variables and that a representative empirical model could be built for each output. A 32-run Plackett-Burman experimental design with 10 factors at four levels was used as the DOE strategy. The training data set consists of 320 data points. The test data set included 275 data points where the inputs were randomly generated within the training range. (Note: The extreme-range data from the paper below is not available, but one can (should?) recombine training and test data to check extrapolative properties of models.
Presented data set is a subset of a case study considered in this article:
Kordon, A., Kalos, A., and Adams, B. Empirical emulators for process monitoring and optimization. In Proceedings of the IEEE 11th Conference on Control and Automation MED2003, Greece.
Download folder with data The folder contains the article, training data, and test data in xls format.