Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields as diverse as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems.
Certain techniques such as Artificial Neural Networks, Clustering, Case Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling, often being used to address difficult and important problems. Other methods, for example Classification and Association Rule Extraction, have not been taken up by environmental modellers on any wide scale. On the other hand, classical statistical techniques for data analysis as regression, time series or principal components analysis are also suitable and have been applied for data mining of environmental data. Finally, integration of different techniques (either machine learning or statistics) into a single data mining process may produce significant improvements over the use of either approach alone, and constitutes an open issue to be explored.
Several high quality software packages have been developed, that enable easy investigation of data using multiple techniques. The WEKA package is one of the most widely applied and is open source and freely available for download, and GESCONDA is an environmental science-specific package under development. While a small number of environmental science projects have taken advantage of such technology and the wealth of recent data mining research, most of these have been undertaken by or with data mining specialists, and the majority of environmental modellers remain unaware of the tools available.
In this workshop, we introduce interested parties to a range of data mining techniques and to a small selection of software packages. We would like to bring those working specifically in the environmental modelling area into contact with data mining software and software developers, to make data mining techniques more accessible to modellers and to give developers a better idea of the needs and desires of the modelling community. The WEKA and GESCONDA packages will be introduced and discussed specifically, but not exclusively. We also invite presentations of interesting applications of data mining to environmental problems from workshop participants.
Work emphasizing (but not limited to) the following topics is of particular interest to the workshop:
A special hands-on tutorial session where a real data set will be analyzed with the WEKA and the GESCONDA packages will be included in the workshop program. Workshop participants are encouraged to attend and explore the possibilities of data mining for a real application.
Workshop Organizing Committee: