The Definitive Guide to Bash Analytics

cat sales.csv | less -S
cat sales.csv |  column -t -s "," | less -S

Let the Data Science begin

cat sales.csv |  cut -f 1 -d "," | head
cat sales.csv |  cut -f 1 -d "," | sort | uniq -c
cat sales.csv |  cut -f 1 -d "," | sort | uniq -c | sort -n
cat sales.csv |  cut -f 1 -d "," | tail -n+2 | sort | uniq -c | sort -n

Conditional Filtering

cat sales.csv | awk -F ',' '{ if ($1 == "Asia") print $9 }'
cat sales.csv | awk -F ',' '{ if ($1 == "Asia") print $0 }'
cat sales.csv | awk -F ',' '{ if ($1 == "Asia") print $0 }' | wc -l
cat sales.csv | awk -F ',' '{ if ($1 == "Asia") print $0 }' | awk '{s+=$1} END {print s}'

Joining

curl -O https://raw.githubusercontent.com/cristiroma/countries/master/data/csv/countries.csv
cat countries.csv | tr -d '"' > countries_clean.csv
join -t "," -1 2 -2 1 <(sort -k 2 -t "," sales.csv) <(sort -k 1 -t "," countries_clean.csv) 

Histograms

cat sales.csv | cut -f 10 -d "," | tail -n+2 | python -c "import sys, collections; values = [float(x) for x in sys.stdin]; min_v = min(values); max_v = max(values); norm = [int(10*(x-min_v)/(max_v-min_v)) for x in values]; print '\n'.join(['%12.4f - %12.4f: (%8d) %s' % (x[0]*((max_v-min_v)/10)+min_v, (x[0]+1)*((max_v-min_v)/10)+min_v, x[1], '*' * (100 * x[1] / len(values))) for x in sorted(collections.Counter(norm).items())])"

--

--

--

An entrepreneur, and a web expert.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Python for Beginners

I’ll Miss You 101

Valid Sudoku Challenge

Why you can never have this type of Pythagorean triple

#TCPDUMP #NC and #K8S fun !!

Acing the Coding Interview Even If You Can’t Solve the Problem

one woman talking to another woman as they sit at a small table

Assessing the CSS Assessment, part 1

Dynamic vs Static Libraries in C

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ron Reiter

Ron Reiter

An entrepreneur, and a web expert.

More from Medium

Deriving useful metrics from Pandas data frame comparison

Problem data

Python’s role in data analytics

Data Wrangling Python