Running Hadoop on a Raspberry Pi 2 cluster
Last week I wrote about a 300 node cluster using Raspberry Pi (RPi) microcomputers. But can you do useful work on such a low-cost, low-power cluster? Yes, you can. Hadoop runs on massive clusters, but you can also run it on your own, highly-scalable, RPi cluster.
I’ve been involved with cluster computing ever since DEC introduced VAXcluster in 1984. In those days, a three node VAXcluster cost about $1 million. Today you can build a much more powerful cluster for under $1,000, including much more storage than anyone could afford back then.
Hadoop is the open-source version of Google’s Map/Reduce and Google File System (GFS), widely used for large data-crunching applications. It is a shared-nothing cluster, which means that as you add cluster nodes, performance scales up smoothly.
Raspberry Pi: Hands-on with the Pi-Desktop kit | Raspberry Pi’s smaller, cheaper rival: NanoPi Neo Plus2 weighs in at $25 | This is why you need to learn the Raspberry Pi 3 (ZDNet Academy) | Building a 300 node Raspberry Pi supercomputer | Raspberry Pi: Google plans more AI projects to follow DIY voice recognition kit | Raspberry Pi computing cluster: What I’m using it for, and what I’ve added to it
In the paper, Performance of a Low Cost Hadoop Cluster for Image Analysis, researchers Basit Qureshia, Yasir Javeda, Anis Kouba, Mohamed-Foued Sritic, and Maram Alajlan, built a 20 node RPi Model 2 cluster, brought up Hadoop on it, and used it for surveillance drone image analysis. They also benchmarked the RPi cluster against a 4-node PC cluster based on 3GHz Intel i7 CPUs, each with 4GB of RAM.
The 20 node cluster was divided into four, 5-node subnets, each attached to 16 port switches that are, in turn, networked to a managed 24 port core switch. The extra switch ports enable easy cluster expansion.
Each 700MHz RPi B runs Raspbian, an ARM-optimized version of Debian Linux. Each RPi has a Class 10, 16 GB SD card capable of up to 80MB/s read/write speeds. An image of the OS with Hadoop 2.6.2 was copied onto the SD cards. The Hadoop Master node, which implements the name-node only, was installed on a PC running Ubuntu 14.4 and Hadoop.
TechRepublic: Raspberry Pi laptop? Here’s a super-simple kit you can build yourself | The 20 silliest Raspberry Pi projects | Windows 10 face-off: Raspberry Pi thin client vs modern laptop| Raspberry Pi: Build your own turbo-charged cluster with OctaPi | How to give your Raspberry Pi ‘state-of-the art computer vision’ using Intel’s Neural Compute Stick | Raspberry Pi add-on lets you build your own AI assistant powered by Amazon, Google and Microsoft | Raspberry Pi Zero W: The smart person’s guide
You’d expect a cluster of 64-bit, 3GHz x86 CPUs to be much faster than 700MHz, 32-bit ARM CPUs, and you’d be right. The team ran a series of tests that were a) compute-intensive (calculating Pi), b) I/O intensive (document word counts), and, c) both (large image file pixel counts).
Here’s the word count results, taken from a figure in the paper.
In general, the x86 cluster was 10-20 times faster. However, the ability to put a Hadoop cluster in a backpack with a battery, opens up possibilities for powerful edge computing, such as the drone video pre-processing the authors explore in their paper. Also, today we have the RPi Model 3, with a processor with almost double the clock speed of the RPi tested by the researchers.
THE STORAGE BITS TAKE
Mobile edge clusters aren’t a thing today, but they will be, because our ability to gather data at the edge is growing much faster than network bandwidth to the edge. We’ll have to pre-process, for example, IoT data to compact it for network transmission.
When will they be economically viable? Three things have to happen:
- Mobile processors have to get faster, while remaining power efficient.
- More power efficient memory – whether low-power DRAM, or NVRAM – must enable larger memory cacacities on mobile processors.
- Universal Flash Storage (UFS) support on mobile processors, removing the current storage bottleneck of micro-SD cards.
All three will happen in the next five years. Then backpack clusters will be capable of real work out in the wild.