Data mining (sometimes called data or knowledge discovery) is the use of automated tools to examine and analyze data that has been stored in a database in order to find new, previously unknown relationships.

Pilot Software's White Paper (1998) explains the origin of the term as follows:

"Data mining derives its name from the similarities between searching for valuable business information in a large database...and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides."

Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.

The Virtuous Cycle of Data Mining

Identify the Business Problem

Transforming Data into Actionable Results

Acting on the Results

Measuring the Model's Effectiveness

Tasks Solved by Data Mining

Prediction: A task of learning a pattern from examples and using the developed model to predict future values of the target variable.

Classification A task of finding a function that maps records into one of several discrete classes

Clustering A task of identifying groups of records that are similar between themselves but different from the rest of the data. Often, the variables providing the best clustering should be identified as well.


Affinity Grouping Processing transactional data in order to find those groups of products that are sold together well. One also searches for directed association rules identifying the best product to be offered with a current selection of purchased products


