DataMing Note 1 Data Mining Basics

Introduction

This note is going to explain some basic concepts of data mining. After reading it, you should be able to answer these questions:

  • What’s data mining?
    –> Section[What’s Data Mining]
  • How to do data mining step by step?
    –> Section[KDD Process]
  • How’s the architecture of data mining system looks like?
    –> Section[Architecture of Data Mining System]
  • What algorithms can we apply to search for the pattern(model) that we want?
    –> Section[What do we do in each part of KDD process]

What’s Data Mining

Data mining helps us to extract useful information from large databases. It’s a step within the KDD process.

Definition: Knowledge discovery in database(KDD) is the process of finding useful information and knowledge in data.

Definition: Data mining is the use of algorithms to extract patterns or models in KDD process.

KDD Process

There are totally six steps in KDD process as is shown on below:
Steps of KDD
The KDD process can be divided into three parts. The first part is data preprocessing including step 1-3. The second part is data mining where many data mining algorithms involve. And the last part is evaluation and presentation. We will be mainly focus on the first and second parts in our data mining notes.

The following figures illustrates the overall KDD process in more details:
KDD Process

KDD Process in Parts

Bear in mind that KDD process is extremely important for your study of data mining. Your get to know what you are doing from KDD points of view at each steps. Always ask yourself why the techniques that you are studying helps for a better result.

Architecture of Data Mining System

Here is the architecture of data mining system:
Architecture of Data Mining System

What do we do in each part of KDD process

Data Preprocessing

Data Preprocessing

Data Mining

Data Mining

Evaluation and Presentation

Not addressed in this note at this time.

Conclusion

This notes briefly introduces the KDD process and data mining system architecture. In later notes we will illustrate data preprocessing and data mining algorithms in more details, if you are interested in them, please refer to kelvin.ink for more details.