Watch “Hadoop and Big Data 1/6 Challenging Old Assumptions” on YouTube:
Watch “Using Apache Pig With Amazon Elastic MapReduce – 1 of 5” on YouTube:
Watch “Using Apache Hive With Amazon Elastic MapReduce – 2 of 2” on YouTube:
Watch “Perl Tutorial 1 – Active Perl, Perl Editor, Hello World” on YouTube:
Google Talks: Watch “Research and Education in the Clouds: Experience at the Univ” on YouTube:
Watch “Introduction to Latex and Lyx – Part 1 of 5” on YouTube:
Watch “EC6054-2012 Tutorial 8 – Setting Up Stata” on YouTube:
Watch “R Tutorial #1 – Download, Installation, Setup – Statistical Programming Language R” on YouTube:
Unix (Linux) environment.
›Not an “emulator,” but has valuable core Unix functionalities (Perl, Python, Unix Core Utilities).
§Mac/Unix–built in already.
§|STAT Statistical Data Analysis: Free Data Analysis Programs for UNIX and DOS, (Paper from 1982
›Visit website, email
to request access to the files (due to licensing issues).
›DISCLAIMER: I have obtained but not yet tested these programs. The |STAT author claims no warranty & says that these programs may not work on “large” datasets.
›Perhaps these tools need to be rebuilt for Big Data on distributed servers. I really like the idea of these programs—their simplicity and potential power.
§Unix (mostly servers) / CygWin
(Unix “not-an-emulator” for Windows)
›The power of scripting (v. compiling to binary) languages.
: Fully capable scripting languages (for-loops, variables, input/output) which can very easily execute common tasks at the command line.
§Unix Core Utilities
: cut, paste, egrep, sort -k1,1, uniq -c, zcat, cat, gzip -c, sed ‘s///g’, join, less, zless, more, and more… Also see awk
§Everything you would want to do in Stata, but don’t have the memory to handle.
§Perl: great for parsing text, regular expressions, scraping websites.
›Click links above for tutorials to help! They’re unnatural to work with at first, but very quick, efficient, & powerful for writing & running scripts and editing data by hand (when necessary).