To summarize an item into a small and typically fixed length value, we apply a hashing function to it. This chapter will cover the following recipes.
MD5 and cryptographic checksums
Using a hash table
Chapter 5Using Trees
data Tree = Node v l r | Null
Everything from creating simple binary trees to practical applications such as Huffman trees are covered in this section.
Height of a tree
Binary search tree
Huffman tree encoding and decoding
Chapter 6Using Graphs
type Graph = Table [Vertex]
A graph allows for representing network data such as social networks, biological gene relationship, and road topologies. Graphs are very common in data analysis and this chapter will cover some essential algorithms.
List of edges
Depth first traversal
Breadth first traversal
Visualizing a graph
Directed acyclic word graphs
Hexagonal and square grids
let (b, m) = linearRegression xs ys
This chapter contains recipes that answer questions about data deviation from the norm, existence of linear and quadratic trends, and probabilistic values of a network.
Moving average and median
Linear and quadratic regression
Pearson correlation coefficient
Neural network perception
Chapter 8Clustering Data
let clusters = kmeans points
Computer algorithms are becoming better and better at analyzing large data sets. As machines perform faster, so do their ability to detect interesting patterns in data.
Number of clusters
Parts of speech
Training a parts of speech tagger
Word lexemes clustering
a <- rpar task1
This chapter will cover parallel and concurrent design. Massive data analysis is a very real problem which this chapter will try to solve.
Evaluating in parallel
Controlling algorithms in sequence
Parallelizing pure functions
Mapping in parallel
Accessing tuple elements in parallel
h <- connectTo "localhost" myPort
The gratifying nature of analyzing data the moment it is received is the core subject of this chapter. The following real-time data topics will be covered.
Streaming Twitter data
Polling a webserver
Repsonding to system events
plot X11 Data2D [Color Red]  pts
Visualizing data is important in all steps of data analysis. It is always useful to have an inutitive understanding so this chapter covers many ways to graph data.
Plotting a line graph
Plotting a pie-chart
Plotting a bar graph
Displaying a scatter plot
Visualizing a graphical network
save = insertMany "item" mongoList
The last important step in data analysis is to export and present the data in a usable format. The recipes in this chapter cover how to save and present data.