# Basic Functional Programming Functions

import findspark
findspark.init()

import pyspark

sc = pyspark.SparkContext()

## Basic Functional Programming Functions

Spark has plenty of analogues to the native Python fucntional programming methods. However, as discussed in Transformations and Actions, nothing gets evaluated, merely strung together into RDDs. Only when we call some sort of Action do we get any actual computation.

Let’s consider a simple dataset

rdd = sc.parallelize([1, 2, 3, 4, 5])

## Familiar fns

### Map

mapped = rdd.map(lambda x: chr(x+64))
mapped.collect()
['A', 'B', 'C', 'D', 'E']

### Filter

filtered = rdd.filter(lambda x: x != 1)
filtered.collect()
[2, 3, 4, 5]

### Reduce

The reduce function is an Action, thus gets evaluated.

15

## New fns

### FlatMap

Flatmap combines typical mapping with behavior similar to the itertools.chain function

text = sc.textFile('../data/zen.txt')

Calling just map.split() on our text file returns a list of lists, split over newlines.

print(text.map(lambda x: x.split()).collect()[:3])
[['Beautiful', 'is', 'better', 'than', 'ugly.'], ['Explicit', 'is', 'better', 'than', 'implicit.'], ['Simple', 'is', 'better', 'than', 'complex.']]

However, using flatmap, we consolidate many sublists into one list.

print(text.flatMap(lambda x: x.split()).collect()[:30])
['Beautiful', 'is', 'better', 'than', 'ugly.', 'Explicit', 'is', 'better', 'than', 'implicit.', 'Simple', 'is', 'better', 'than', 'complex.', 'Complex', 'is', 'better', 'than', 'complicated.', 'Flat', 'is', 'better', 'than', 'nested.', 'Sparse', 'is', 'better', 'than', 'dense.']

## Fold

Simlar to the reduce function, fold consolidates many records into one value. There difference here, though, is that we can specify a starting value to begin reduction on.

For instance:

rdd = sc.parallelize(range(10))

45

Here we specify that we want to add the numbers [0:10) to the number 99.

$99*5 + \sum\limits_{i=1}^{9}{i} = 540$