 | |  |
| Exploratory Data Mining and Data Cleaning | 
| Authors: Tamraparni Dasu, Theodore Johnson Publisher: Wiley-Interscience Category: Book
List Price: $101.50 Buy New: $76.90 You Save: $24.60 (24%)
Buy New/Used from $63.99
Avg. Customer Rating:   (1 reviews) Sales Rank: 1141470
Languages: English (Original Language), English (Unknown), English (Published) Media: Hardcover Edition: 1 Number Of Items: 1 Pages: 224 Shipping Weight (lbs): 1.2 Dimensions (in): 9.3 x 6.4 x 0.8
ISBN: 0471268518 Dewey Decimal Number: 006.3 EAN: 9780471268512 ASIN: 0471268518
Publication Date: May 9, 2003 Availability: Usually ships in 1-2 business days
|
| Similar Items:
|
| Editorial Reviews:
Product Description
- Written for practitioners of data mining, data cleaning and database management.
- Presents a technical treatment of data quality including process, metrics, tools and algorithms.
- Focuses on developing an evolving modeling strategy through an iterative data exploration loop and incorporation of domain knowledge.
- Addresses methods of detecting, quantifying and correcting data quality issues that can have a significant impact on findings and decisions, using commercially available tools as well as new algorithmic approaches.
- Uses case studies to illustrate applications in real life scenarios.
- Highlights new approaches and methodologies, such as the DataSphere space partitioning and summary based analysis techniques.
Exploratory Data Mining and Data Cleaning will serve as an important reference for serious data analysts who need to analyze large amounts of unfamiliar data, managers of operations databases, and students in undergraduate or graduate level courses dealing with large scale data analys is and data mining.
|
| Customer Reviews:
  Terrific intro to the issues December 11, 2007 This is the best deep and practical introduction to data cleaning that I have seen. It provides an excellent overview of the practical problems in data cleaning, gives a good intuitive feeling for the core issues of outliers and robust statistics, and overviews of a good set of techniques for addressing data cleaning issues in a practical but relatively deep manner. It doesn't try to provide cookbook solutions, and instead points out the complexities and leaves the reader with a toolbox to work on tackling them.
The really interested reader will want to augment the book with some other reading, including (on the practical side) a book or website of tips on how to express robust statistics in SQL (the O'Reilly book on TransactSQL has good stuff), and (on the more statistical side) a deeper introduction to robust statistics (e.g. Rousseeuw and Leroy's Robust Regression and Outlier Detection).
In a future edition it would be nice to see more discussion of timeseries outliers, as well as an SQL cookbook that will run on commodity databases of modest size (which is the common case in practice, as opposed to the massive TelCo databases that the authors discuss).
|
|
| Powered by: Dknc, inc. and Amazon.com |  | 
For your safety and security, orders are processed through amazon.com
|
|
 |
|