I miss Pig

lines = LOAD 'file.txt' AS (line:chararray);
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
grouped = GROUP words BY word;
wordcount = FOREACH grouped GENERATE group, COUNT(words);
DUMP wordcount;
val linesDF = sc.textFile("file.txt").toDF("line")
val wordsDF = linesDF
.explode("line","word")((line: String) => line.split(" "))
val wordCountDF = wordsDF.groupBy("word").count()
wordCountDF.show()

I want to try it out, now what?

brew install pig
pig -x local

--

--

--

An entrepreneur, and a web expert.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ron Reiter

Ron Reiter

An entrepreneur, and a web expert.

More from Medium

Spark Datasets take head and limit

Apache Spark: aggregateByKey vs combineByKey