Graphics of Large Datasets: Visualizing a Million

Predný obal
Springer Science & Business Media, 12. 6. 2007 - 271 strán (strany)

Graphics are great for exploring data, but how can they be used for looking at the large datasets that are commonplace to-day? This book shows how to look at ways of visualizing large datasets, whether large in numbers of cases or large in numbers of variables or large in both. Data visualization is useful for data cleaning, exploring data, identifying trends and clusters, spotting local patterns, evaluating modeling output, and presenting results. It is essential for exploratory data analysis and data mining. Data analysts, statisticians, computer scientists-indeed anyone who has to explore a large dataset of their own-should benefit from reading this book.

New approaches to graphics are needed to visualize the information in large datasets and most of the innovations described in this book are developments of standard graphics. There are considerable advantages in extending displays which are well-known and well-tried, both in understanding how best to make use of them in your work and in presenting results to others. It should also make the book readily accessible for readers who already have a little experience of drawing statistical graphics. All ideas are illustrated with displays from analyses of real datasets and the authors emphasize the importance of interpreting displays effectively. Graphics should be drawn to convey information and the book includes many insightful examples.

From the reviews:

"Anyone interested in modern techniques for visualizing data will be well rewarded by reading this book. There is a wealth of important plotting types and techniques." Paul Murrell for the Journal of Statistical Software, December 2006

"This fascinating book looks at the question of visualizing large datasets from many different perspectives. Different authors are responsible for different chapters and this approach works well in giving the reader alternative viewpoints of the same problem. Interestingly the authors have cleverly chosen a definition of 'large dataset'. Essentially they focus on datasets with the order of a million cases. As the authors point out there are now many examples of much larger datasets but by limiting to ones that can be loaded in their entirety in standard statistical software they end up with a book that has great utility to the practitioner rather than just the theorist. Another very attractive feature of the book is the many colour plates, showing clearly what can now routinely be seen on the computer screen. The interactive nature of data analysis with large datasets is hard to reproduce in a book but the authors make an excellent attempt to do just this." P. Marriott for the Short Book Reviews of the ISI

 

Čo hovoria ostatní - Napísať recenziu

Na obvyklých miestach sme nenašli žiadne recenzie.

Obsah

Introduction Antony Unwin
1
12 Data Visualization
4
13 Research Literature
7
14 How Large Is a Large Dataset?
9
15 The Effects of Largeness
17
151 Storage
18
152 Quality
19
153 Complexity
20
611 Type of Data
126
612 Visual Methods for Continuous Variables
127
613 Scaling Up Multiple Views for Larger Datasets
128
622 Reducing the Number of Cases
129
623 Density Estimation
131
624 Screen Real Estate Indexing
134
63 Software System
135
64 Application
137

155 Analyses
21
157 Graphical Formats
22
17 Software
23
18 What Is on the Website
24
183 Datasets
25
19 Contributing Authors
26
Basics
31
221 Barcharts and Spineplots for Univariate Categorical Data
32
222 Mosaic Plots for Multidimensional Categorical Data
33
23 Plots for Continuous Data
36
232 Scatterplots Parallel Coordinates and the Grand Tour
39
24 Data on Mixed Scales
44
25 Maps
47
26 Contour Plots and Image Maps
49
27 Time Series Plots
50
28 Structure Plots
51
Scaling Up Graphics Martin Theus
55
33 Area Plots
56
331 Histograms
57
332 Barcharts
58
333 Mosaic Plots
60
34 Point Plots
62
342 Scatterplots
63
343 Parallel Coordinates
65
35 From Areas to Points and Back
67
351 αBlending and Tonal Highlighting
69
36 Modifying Plots
71
37 Summary
72
Interacting with Graphics Antony Unwin
73
42 Interaction
74
43 Interaction and Data Displays
75
432 Selection and Linking
77
433 Selection Sequences
78
434 Varying Plot Characteristics
82
435 Interfaces and Interaction
84
436 Degrees of Linking
86
437 Warnings and Redmarking
87
44 Interaction and Large Datasets
88
442 Selection Linking and Highlighting
89
443 Varying Plot Characteristics for Large Datasets Zooming
92
45 New Interactive Tasks
98
452 Aggregation and Recoding
99
455 Managing Screen Layout
101
Applications
105
521 Weighted Displays and Weights in Datasets
107
531 Sorting and Reordering
110
532 Grouping Averaging and Zooming
111
54 Mosaic Plots
113
541 Combinatorics of Mosaic Plots
114
542 Cases per Pixel and Pixels per Case
116
544 Grayshading
119
545 Rescaling Binsizes
122
546 Rankings
123
Rotating Plots Dianne Cook and Leslie Miller
125
643 Scatterplot Matrix
138
65 Current and Future Developments
140
652 Software
141
Multivariate Continuous Data Parallel Coordinates Rida Moustafa and Ed Wegman
142
72 Interpolations and Inner Products
144
73 Generalized Parallel Coordinate Geometry
145
74 A New Family of Smooth Plots
149
75 Examples
150
Dealing with Massive Datasets
152
76 Detecting SecondOrder Structures
154
77 Summary
155
Networks Graham Wills Graham Wills
157
82 Layout Algorithms
158
821 Simple Tree Layout
159
822 Force Layout Methods
161
823 Individual Node Movement Algorithms
162
831 Speed Considerations
164
832 Interaction and Layout
165
84 NicheWorks
166
International Calling Fraud
167
86 Languages for Description and Layouts
172
862 Graph Specification via VizML
173
87 Summary
174
Trees Simon Urbanek
176
92 Growing Trees for Large Datasets
178
921 Scalability of the CART Growing Algorithm
179
922 Scalability of Pruning Methods
181
923 Statistical Tests and Large Datasets
183
924 Using Trees for Large Datasets in Practice
184
93 Visualization of Large Trees
187
932 Sectioned Scatterplots
192
933 Recursive Plots
195
94 Forests for Large Datasets
198
95 Summary
202
Transactions Barbara GonzaleaArevalo Felix HernandezCampos Steve Marron and Cheolwoo Park
203
102 Mice and Elephant Plots and Random Sampling
205
103 Biased Sampling
210
1031 Windowed Biased Sampling
211
1032 BoxCox Biased Sampling
213
104 Quantile Window Sampling
215
105 Commonality of Flow Rates
221
Graphics of a Large Dataset Antony Unwin and Martin Theus
227
Data Visualization for Large Datasets
228
113 Visualizing the InfoVis 2005 Contest Dataset
229
1132 Variables
230
1134 Multivariate Displays
235
1135 Grouping and Selection
239
1136 Special Features
242
1137 Presenting Results
247
1138 Summary
249
References
251
Authors
262
Index
267
Autorské práva

Iné vydania - Zobraziť všetky

Časté výrazy a frázy

Populárne pasáže

Strana 3 - ... spectacle of its avenue of horsechestnut trees, a mile long, in full flower. As the hours passed by, it occurred to me to try to count the number of spikes of flowers facing the drive on one side of the long avenue — I mean all the spikes that were visible in full sunshine on one side of the road. Accordingly, I fixed upon a tree of average bulk and flower, and drew imaginary lines — first halving the tree, then quartering, and so on, until I arrived at a subdivision that was not too large...
Strana 3 - I arrived at a subdivision that was not too large to allow of my counting the spikes of flowers it included. I did this with three different trees, and arrived at pretty much the same result : as well as I recollect, the three estimates were as nine, ten, and eleven. Then I counted the trees in the avenue, and, multiplying all together, I found the spikes to be just about 100,000 in number. Ever since then, whenever a million is mentioned, I recall the long perspective of the avenue of Bushey Park,...
Strana 1 - illustrious" when speaking of these. They are men whom the whole intelligent part of the nation mourns when they die; who have, or deserve to have, a public funeral; and who rank in future ages as historical characters. Permit me to add a word upon the meaning of a million, being a number so enormous as to be difficult to conceive. It is well to have a standard by which to realize it. Mine will be understood by many Londoners; it is as follows: One summer day I passed the afternoon in Bushey Park...

Bibliografické informácie