Knowledge Discovery and Data Mining

Knowledge discovery in databases (KDD) and data mining attempt to extract useful information from databases, heterogeneous data sources and very large databases (VLDB).

To be distinguished from:

Tasks:

Association Rules

Example: "People who buy beer also buy newspapers."

This is not a logical "inference" because it is not true for all people. It is only true for people in general.

Rule: beer -> newspapers
Rule: X -> Y

support: number of people who buy X and Y / all people

confidence: number of people who buy X and Y / number of people who buy X

For example: if out of 1000 people there are 2 people who buy beer and these 2 also buy newspapers, then the support for beer->newspapers is low (0.2%) but the confidence is high (100%).
On the other hand, if out of 1000 people 100 buy beer and newspapers and another 500 buy beer but no newspapers, then the support for beer->newspaper is higher (10%) but the confidence is lower (20%).

Applications

Major Challenge

It is fairly easy to disvover new information. The questions is which discovered rules, trends, factors are novel, interesting, plausible and understandable.