Mining of Massive Datasets

FREE Shipping

Mining of Massive Datasets

Mining of Massive Datasets

RRP: £99
Price: £9.9
£9.9 FREE Shipping

In stock

We accept the following payment methods


Familiarity with basic probability theory (CS109 or Stat116 or equivalent is sufficient but not necessary). Our goal in this chapter is to offer methods for discovering clusters in data. We are particularly interested in situations where the data is very large, and/or where the space either is high-dimensional, or the space is not Euclidean at all. We shall therefore discuss several algorithms that assume the data does not fit in main memory. However, we begin with the basics: the two general approaches to clustering and the methods for dealing with clusters in a non-Euclidean space.

Next, we consider approximate algorithms that work faster but are not guaranteed to find all frequent itemsets. Also in this class of algorithms are those that exploit parallelism, including the parallelism we can obtain through a MapReduce formulation. Finally, we discuss briefly how to find frequent itemsets in a data stream.

Although theoretical issues are discussed where relevant, the focus of the text is clearly on practical issues. Readers interested in a more rigorous treatment of the theoretical foundations for these techniques should look elsewhere. Fortunately, each chapter contains key references to guide the more formally minded reader.

Massive Datasets course. Note that the slides do not necessarily cover all the material convered in the corresponding chapters. The problem of finding frequent itemsets differs from the similarity search discussed in Chapter 3. Here we are interested in the absolute number of baskets that contain a particular set of items. In Chapter 3 we wanted items that have a large fraction of their baskets in common, even if the absolute number of baskets is small.Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford CS246: Mining We begin by reviewing the notions of distance measures and spaces. The two major approaches to clustering – hierarchical and point-assignment – are defined. We then turn to a discussion of the “curse of dimensionality,” which makes clustering in high-dimensional spaces difficult, but also, as we shall see, enables some simplifications if used correctly in a clustering algorithm. Familiarity with basic linear algebra (e.g., any of Math 51, Math 103, Math 113, CS 205, or EE 263 would be much more than necessary). The following materials are equivalent to the published book, with errata corrected to July 4, 2012.

  • Fruugo ID: 258392218-563234582
  • EAN: 764486781913
  • Sold by: Fruugo

Delivery & Returns


Address: UK
All products: Visit Fruugo Shop