# Over 130 Practical Recipes For Data Analysis and Machine Learning

• Harness Haskell in the real world
• Master data analysis techniques
• Implement machine learning   #### Chapter 3Taming Strings

``let strs = splitOn "," "bob,joe,nick"`` Many interesting analysis techniques can be used on a large corpus of words to examine the structure of a sentence or the contents of a book.

• Base conversion
• Substring search (Boyer–Moore–Horspool, Rabin-Karp)
• Split a string
• Longest common subsequence
• Phonetic code
• Edit distance
• Jaro–Winkler distance
• Scraping text
• Fixing spelling mistakes

#### Chapter 4Hashing Data

``let checksum = md5 file`` To summarize an item into a small and typically fixed length value, we apply a hashing function to it. This chapter will cover the following recipes.

• Hashing data
• MD5 and cryptographic checksums
• Using a hash table
• Geohashing
• Bloom filter
• Perceptual hashing

#### Chapter 5Using Trees

``data Tree = Node v l r | Null`` Everything from creating simple binary trees to practical applications such as Huffman trees are covered in this section.

• Binary tree
• Rose tree
• Depth-first traversal
• Height of a tree
• Binary search tree
• AVL tree
• Min-heap
• Huffman tree encoding and decoding

#### Chapter 6Using Graphs

``type Graph = Table [Vertex]`` A graph allows for representing network data such as social networks, biological gene relationship, and road topologies. Graphs are very common in data analysis and this chapter will cover some essential algorithms.

• List of edges
• Topological sort
• Depth first traversal
• Visualizing a graph
• Directed acyclic word graphs
• Hexagonal and square grids
• Maximal cliques

#### Chapter 7Statistics

``let (b, m) = linearRegression xs ys`` This chapter contains recipes that answer questions about data deviation from the norm, existence of linear and quadratic trends, and probabilistic values of a network.

• Moving average and median
• Covariance matrix
• Pearson correlation coefficient
• Bayesian network
• Playing cards
• Markov chain
• N-grams
• Neural network perception

#### Chapter 8Clustering Data

``let clusters = kmeans points`` Computer algorithms are becoming better and better at analyzing large data sets. As machines perform faster, so do their ability to detect interesting patterns in data.

• K-means clustering
• Hierarchical clustering
• Number of clusters
• Parts of speech
• Training a parts of speech tagger
• Word lexemes clustering
• Visualizing

#### Chapter 9Performance

``a <- rpar task1`` This chapter will cover parallel and concurrent design. Massive data analysis is a very real problem which this chapter will try to solve.

• Benchmarking runtime
• Evaluating in parallel
• Controlling algorithms in sequence
• Forking IO
• Parallelizing pure functions
• Mapping in parallel
• Accessing tuple elements in parallel
• MapReduce

#### Chapter 10Real-Time

``h <- connectTo "localhost" myPort`` The gratifying nature of analyzing data the moment it is received is the core subject of this chapter. The following real-time data topics will be covered.

• IRC bot
• Polling a webserver
• Repsonding to system events
• Sockets

#### Chapter 11Visualizing

``plot X11 Data2D [Color Red] [] pts`` Visualizing data is important in all steps of data analysis. It is always useful to have an inutitive understanding so this chapter covers many ways to graph data.

• Plotting a line graph
• Plotting a pie-chart
• Plotting a bar graph
• Displaying a scatter plot
• Visualizing a graphical network
• Using D3.js

#### Chapter 12Exporting

``save = insertMany "item" mongoList`` The last important step in data analysis is to export and present the data in a usable format. The recipes in this chapter cover how to save and present data.

• Exporting to CSV
• JSON
• SQLite
• MongoDB
• HTML
• LaTeX