Member-only story
Introduction to Data Mining
Data mining is a process of extracting and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Data mining is the analysis step of the Knowledge Discovery in Databases process or KDD.
Why mine data ?
Computerization and automated data gathering has resulted in extremely large data repositories.
Raw Data -> Patterns -> Knowledge
Scalability issues and desire for more automation makes more traditional techniques less effective.
- Statistical Methods
- Relational Query Systems
- OLAP (OnLine Analytical Processing)
The Data Mining (KDD) Process
Data Mining Techniques
The more popular data mining techniques include :
- Classification
- Clustering
- Regression
The other significant ideas :
- Associations Rules Learning
- Topic Identification, tracking and drift analysis
- Concept hierarchy creation
- Relevance of content.
- Anomaly detection