Tutorials
- April 18, AM (9:00-12:00)
- Track 1. Enterprise Data Mining: Case Studies and Integration in the E-World
- Zhexue Huang (ETI, The University of Hong Kong) and Graham J Williams (CSIRO Australia)
Abstract
Growing globalization and free trade creates an ever-competitive
market environment in almost all industry sectors. The rise of
e-commerce on the back of the internet further increases the
competitive nature of international trade and finance. Business
intelligence, covering technologies from data mining, data
warehousing, OLAP, decision support systems, and customer relationship
management (CRM), is a core component of success today. However,
there are challenges with the wealth of new technology that is rapidly
moving from the research laboratories into both commercial and freely
available products. Delivering solutions often requires a significant
investment, recognising the team exercise involved in the application
of statistical and machine learning algorithms to very large
collections of data. Bringing these into the enterprise offeres many
challenges beyond the application of the algorithms
This tutorial will have two themes: The first is to discuss current
practices of data mining in the business community based on consulting
experiences in Australia and South East Asia. The second theme will
explore emerging technology for the integration and delivery of data
mining solutions in the e-business world of today, in particular
through standards such as XML (the Extensible Markup Language).
- Track 2. Data Mining with Decision Trees
- Johannes Gehrke (Cornell University, USA)
Abstract
In this tutorial, we survey recent developments in learning tree-based
models for classification and regression called predictor trees. The
tutorial has three parts: (1) A general overview of tree-based
classification and regression. (2) A survey of methods to construct
predictor trees. (3) An overview of scalable data access methods to
construct predictor trees from very large training databases.
In the first part, we motivate predictor trees and their use in a data
mining environment. We show results from real-life studies that illustrate
how predictor trees give understandable models where traditional models are
hard or counter-intuitive to interpret, and compare related methods for
classification and regression.
In the second part of the tutorial, we discuss choices involved in tree
construction, including different split selection methods and tree pruning.
Although we survey the most popular methods, including work from all KDD
sub-communities, we emphasize recent work from the statistics literature.
Wherever possible, we interleave results on real datasets. The methods
presented in this part assume that the complete training database fits into
main memory.
The third part of the tutorial covers scalable methods for predictor tree
construction. We first motivate the concept of scalability and then survey
recent work in the database literature on scalable data access methods for
constructing predictor trees from very large training databases.
- April 18, PM (14:00-17:00)
- Track 3: Knowledge Extraction from Texts (ECT), application to Human Ressources in Industry
- Yves Kodratoff
Listing
Links with NLP: Information retrieval, information extraction.
The needs of ECT: how to construct terms, how to build taxonomies of
concepts.
An example of application, and type of rules obtained.
Handling the "free expression" part of questionnaires in Human
Ressources.
- Track 4: Rough Sets in KDD: A Tutorial
- Andrzej Skowron (Warsaw University) and Ning Zhong (Yamaguchi University)
Abstract
In recent years we witness a rapid growth of interest in rough set
theory and its applications, worldwide. The theory has been
followed by the development of several software systems that
implement rough set operations, in particular for solving
knowledge discovery and data mining tasks. Rough sets are applied
in domains, such as, for instance, medicine, finance,
telecommunication, vibration analysis, conflict resolution,
intelligent agents, pattern recognition, control theory, signal
analysis, process industry, marketing, etc.
The tutorial introduces basic notions and discusses methodologies for
analyzing data and surveys some of applications. In particular it
presents applications of rough set methods for feature selection,
feature extraction, discovery of patterns and their applications for
decomposition of large data tables
as well as the relationship of rough sets with association rules.
Boolean reasoning is crucial for all the discussed methods.
The tutorial also presents an overview of some extensions of
the classical rough set approach like rough mereology developed
as a tool for synthesis of objects satisfying
a given specification in a satisfactory degree with potential applications
in such areas like granular computing, spatial reasoning or data mining in
distributed environment. Survey of recent advances in granular computing
is included. The tutorial discusses applications of rough set methods
for knowledge discovery and data mining in medical databases.
- April 18, After Workshop ``International Workshop on Web Knowledge Discovery and Data Mining''