This data set is kindly provided by dr. Arthur Kordon, Advanced Analytics leader at The Dow Chemical Company, and the author of "Applying Computational Intelligence: How to Create Value."
The dataset comes from an industrial problem on modeling gas chromatography measurements of the composition of a distillation tower. This Tower problem contains 4999 records and 25 potential input variables. The estimated response variable is propylene concentration at the top of the distillation tower. The samples are from a gas chromatograph and are taken every 15 minutes. The 25 potential inputs are temperatures, flows, and pressures related to the distillation tower. The actual sampling rate is one minute, but 15 minutes averages of the inputs are used for model development to synchronize with the output measurement.
The measurements (4999 for each variable) are not treated as time series, but simply used as samples for a regression model. The propylene concentration needs to modeled as a function of relevant inputs only. The range of the measured propylene concentration is very broad and covers most of the expected operating conditions in the distillation tower.
TowerData file contains 26 tab-delimited columns. The first 25 columns correspond to input variables x1-x25, the last column is the response variable y.
This problem was used as a benchmark in the following publications (Please, contact us if we need to add yours to the list):
- E. Vladislavleva, G. Smits, D. den Hertog. – Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming. In IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, Volume 13, Issue: 2, Pages: 333-349, Published: 2009, ISSN: 1089-778X, DOI:10.1109/TEVC.2008.926486.
- Sean Stijven, Wouter Minnebo, Katya Vladislavleva. – Separating the wheat from the chaff: on feature selection and feature importance in regression random forests and symbolic regression. In Steven Gustafson and Ekaterina Vladislavleva editors, 3rd symbolic regression and modeling workshop for GECCO 2011, pages 623-630, Dublin, Ireland, 2011 ACM
- S. Stijven and W. Minnebo. - Empowering knowledge computing with variable selection - On variable importance and variable selection in regression random forests and symbolic regression. Master Thesis, Antwerp University, Antwerp, Belgium