Data Mining

Well, you might say, why not just create a SQL statement with joining operations to retrieve specific information from our data tables? That is a good first start, but with SQL we can only uncover shallow data. This is information that is easily accessible from our data sources. While query tools and data mining tools compliment each other, straight query tools can't find any hidden data they can only find what we know how to ask for. The truth of the matter is that 80% of all interesting information can be abstracted from a data source using SQL if the user really knows and understands the data. But the remaining 20% of hidden information requires more advanced techniques. And from a small local company to a large, multi-national corporation, this 20% can prove to be the vital link between being a leader and struggling to meet the bottom line.

There's a cliché that says "if you know what you are looking for, use SQL. But if you only have a fuzzy idea, or a hunch, then use data mining". It's in this context that the power of the computer really shines. Consider that the computer can do both more and less than humans can. A computer can easily compare millions of data elements in seconds. No human can come close. Any three-year-old child can pick out a tree instantly, but it takes many CPU cycles for a computer to discern a tree. And even then, the computer won't know if it is a real tree or just a picture of one. So while computers excel in number crunching, humans excel in pattern recognition.

Another way to look at data mining is to consider the medieval laborer walking across a field. He might be walking over hidden coal deposits that in several hundred years could be mined to power machines that he could never conceive of. We want to mine for data in our various data sources, and we will permit the strength of the computer to sift through the data in search of the hidden gem. But only a human can tell if what is found is really a gem, or fools' gold. Much of the history of computer science has been deeply involved with the collection, manipulation, and dissemination of data – and the newest rage is data mining. We want to extract some knowledge from all of that data.

There is a wider term called Knowledge Discovery in Databases (KDD) that is used for the process of - logically enough - the discovery of knowledge in data. This term includes finding relationships and patterns in data. The term was generally agreed to mean this by the attendees at a KDD conference held in Montreal in 1995. The group also agreed that data mining should be used for only the discovery stage of the KDD process. KDD is not a new process, but has been on the minds of researchers for quite some time. The field includes statistics, artificial intelligence, data visualization, machine learning, expert systems, neural networks, and other disciplines. Phew! Where do we start with these concepts in a Beginning VB 6 database programming book? We'll begin by thinking about an algorithm that will model how to extract knowledge from data.

© 1998 by Wrox Press. All rights reserved.