https://hackaday.com/2019/01/31/ai-on-raspberry-pi-with-the-intel-neural-compute-stick/

AI ON RASPBERRY PI WITH THE INTEL NEURAL COMPUTE STICK

 

I’ve always been fascinated by AI and machine learning. Google TensorFlow offers tutorials and has been on my ‘to-learn’ list since it was first released, although I always seem to neglect it in favor of the shiniest new embedded platform.

Last July, I took note when Intel released the Neural Compute Stick. It looked like an oversized USB stick, and acted as an accelerator for local AI applications, especially machine vision. I thought it was a pretty neat idea: it allowed me to test out AI applications on embedded systems at a power cost of about 1W. It requires pre-trained models, but there are enough of them available now to do some interesting things.

You can add a few of them in a hub for parallel tasks. Image credit Intel Corporation.

I wasn’t convinced I would get great performance out of it, and forgot about it until last November when they released an improved version. Unambiguously named the ‘Neural Compute Stick 2’ (NCS2), it was reasonably priced and promised a 6-8x performance increase over the last model, so I decided to give it a try to see how well it worked.

 

I took a few days off work around Christmas to set up Intel’s OpenVino Toolkit on my laptop. The installation script provided by Intel wasn’t particularly user-friendly, but it worked well enough and included several example applications I could use to test performance. I found that face detection was possible with my webcam in near real-time (something like 19 FPS), and pose detection at about 3 FPS. So in accordance with the holiday spirit, it knows when I am sleeping, and knows when I’m awake.

That was promising, but the NCS2 was marketed as allowing AI processing on edge computing devices. I set about installing it on the Raspberry Pi 3 Model B+ and compiling the application samples to see if it worked better than previous methods. This turned out to be more difficult than I expected, and the main goal of this article is to share the process I followed and save some of you a little frustration.

First off, Intel provides a separate install process for the Raspberry Pi. The normal installer won’t work (I tried). Very generally, there are 3 steps to getting the NCS2 running with some application samples: Initial configuration of the Raspberry Pi, installing OpenVino, and finally compiling some application samples. The last step will take 3+ hours and some will fail, pace yourself accordingly. If you’re not installing it right this moment, it’s still worth your time to read through the other examples section below to get a feel for what is possible.

PREPARING THE RASPBERRY PI

First, download Noobs, unzip to a microSD card (I used 16GB), and boot the Raspberry Pi off it. Install the default graphical environment, connect to the Internet, and update all software on the device. When done, open a terminal and run sudo raspi-config. Select interfaces→enable camera. Shut down, remove power, plug in your camera, and boot up.

When installed, the camera will look something like this. The included FFC cable is relatively short.

Open a terminal again, and run sudo modprobe bcm2835-v4l2 (note that’s a lowercase L, not a 1), then confirm /dev/video0 now exists by navigating to /dev and running ls. You’ll need to run this modprobe command each time you want the camera to be accessible this way, so consider adding this to startup.

Now, some of the applications we are going to compile will run out of memory, because the default swap partition size is 100 megabytes. Run sudo nano /etc/dphys-swapfile and increase it – I changed it from 100 to 1024 and this proved sufficient. Save, reboot and run free -h to confirm the swap size is increased. Finally, install cmake with sudo apt-get install cmake, as we’ll need that later on when compiling.

At this stage you’re ready to begin Intel’s OpenVino install process.

INSTALLING OPENVINO TOOLKIT

In this section, we’ll be roughly following the instructions from Intel. I’ll assume you’re installing to a folder on the desktop for simplicity. Download OpenVino for Raspberry Pi (.tgz file), then copy it to /home/pi/Desktop and untar it with tar xvf filename.tgz.

The install scripts need to explicitly know where they are located, so in the OpenVino folder, enter the /bin directory and open setupvars.sh in any text editor. Replace with the full path to your OpenVino folder, e.g. /home/pi/Desktop/inference_engine_vpu_arm/ and save.

The later scripts need this script loaded, so enter sudo nano /home/pi/.bashrc and add ‘source /home/pi/Desktop/inference_engine_vpu_arm/bin/setupvars.sh’ to the end of the file. This will load setupvars.sh every time you open a terminal. Close your terminal window and open it again to apply this change.

Next we’ll set up the USB rules that will allow the NCS2 to work. First add yourself to the user group that the hardware requires with sudo usermod -a -G users "$(whoami)". Log out, then back in.

Enter the install_dependencies folder of your OpenVino install. Run sh install_NCS_udev_rules.sh. Now if you plug in your NCS2 and run dmesg, you should see it correctly detected at the end of the output.

Intel’s documentation now shows us how to compile a single example application. We’ll compile more later. For now, enter /deployment_tools/inference_engine/samples and run:

1
2
3
$run mkdir build && cd build
$cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=armv7-a"
$make -j2 object_detection_sample_ssd

COMPILING OTHER EXAMPLES

Compiling the other examples is less straightforward, but due to our initial setup, we can expect some success. My goal was to get face recognition and pose estimation working, so I stopped there. Object detection, classification, and some type of speech recognition also appear to have compiled correctly.

Before we try to compile  the samples, it’s important to note that the pretrained AI models for the samples aren’t included in the Raspberry Pi OpenVino installer. In the normal installer, there’s a script that will automatically download them all for you – however no such luck with the Raspberry Pi version. Luckily you can download the relevant models for the application samples. In case that link breaks one day, all I did was look for URLs in all the scripts located in the model_downloader folder in the laptop/desktop version of the OpenVino installer. Alternatively, if you have OpenVino installed on another computer, you can copy the models over. I installed them to a folder named intel_models on the desktop, and the commands below assume you’ve done the same.

With that out of the way, enter /home/pi/Desktop/inference_engine_vpu_arm/deployment_tools/inference_engine/samples and open build_samples.sh in any text editor. Replace everything after the last if block (after the last “fi”) with:

1
2
3
4
5
6
build_dir=/home/pi/Desktop/
mkdir -p $build_dir
cd $build_dir
cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=
make -j8
printf "\nBuild completed.\n\n”

Now run ./build_samples.sh, for me this ran for about 3 hours before failing at 54% complete. However, several of the sample applications had compiled correctly by then. At this point, you should be able to enter the directory: deployment_tools/inference_engine/samples/build/armv7l/Release and run:

./interactive_face_detection_demo -d MYRIAD -i /dev/video0 -m /home/pi/Desktop/intel_models/face-detection-retail-0004/FP16/face-detection-retail-0004.xml

It supports emotion detection, but I always look this serious.

Or for pose estimation:

./human_pose_estimation_demo -d MYRIAD -i /dev/video0 -m /home/pi/Desktop/intel_models/human-pose-estimation-0001/FP16/human-pose-estimation-0001.xml

This can track multiple people at the same time without performance loss, but I didn’t have any volunteers at the time.

As for silly mistakes I seem to keep making, remember to use modprobe as described earlier to make the Raspberry Pi camera accessible as /dev/video0, and remember to actually plug in the NCS2.

Overall, performance is something like 18FPS for facial recognition and 2.5FPS for pose detection, very similar to performance on my laptop. That’s good enough to open up a few applications I had in mind.

Other than that, I’ve learned that while AI taking over the world mainly makes for very entertaining stories, with only a few afternoons of careful assistance, it is presently able to take over a sizable proportion of my desk.

16 THOUGHTS ON “AI ON RASPBERRY PI WITH THE INTEL NEURAL COMPUTE STICK”

  1. “Now run ./build_samples.sh, for me this ran for about 3 hours before failing at 54% complete.”

    Wondering if there’s a Raspberry Pi 3 Model B+ VM? Might speed things up, and lower the cost of failure.

  2. no volunteers? must have been a lonely Christmas 😦
    Great article. Thanks for the detailed log! This has been on my TODO list for a while as well. This article is just the push I needed.

    1. So as of today Intel released a newer OpenVINO which is much easier to get running on the Pi:
      >>>Compiled for Raspbian* 9 OS (ARM* CPU) including python2, python3 bindings, GStreamer* and GTK* support.

      I’m going to try it soon, but just a heads up that it probably removes a lot of the headache from the last version (released in late December, which is what I’ve been using and the author here is probably also using – unless he did all of this just this morning!).

      Best,
      Brandon

    1. Yes, that’s an unfortunately bottleneck right now (the datapath on the Pi when used with NCS2 over USB). It reduces performance by a factor of about 5 from what the NCS2 can do.

      Worse, the video has to go through the Pi CPU first, then over USB to the NCS2 (that’s the crux of the issue, actually). It’s actually why I’m making a carrier board (see other post) right now for the Raspberry Pi 3B+ Compute module, which will have the Myriad X directly.

      It’s still good though, at ~12FPS at 640×480 video doing object detection (at close to max CPU). It can just be 60FPS with 0% CPU w/ the same hardware/cost w/ the carrier board, which is why I’m making it.

  3. Great write-up! So in case you haven’t seen it yet, PINTO0309 has a great article on getting a lot of NCS2 + Raspberry Pi stuff running at well. Including MobileNet-SSD object detection using the Raspberry Pi and the NCS2.

    https://github.com/PINTO0309/MobileNet-SSD-RealSense

    He also has it working with the Intel RealSense camera, which then gives you depth as well. I’ve played with it a lot (including with the RealSense), and it’s super useful

    Here’s a GIF of it running the object detection on the Raspberry Pi + NCS2 with a webcam:

    And if you missed the other Hackaday (written by Lewin Day about Lew Wright’s work), check this out:
    https://hackaday.com/2019/01/25/robot-cant-take-its-eyes-off-the-bottle/

    He references PINTOs github as well.

    Also I’m actually working on a custom board for this that has the Myriad X onboard, two raspberry pi camera connectors (for stereo vision, more on that later), and a slot for the Raspberry Pi Compute Module 3B+.

    Why?

    When the video has to flow through the Rapberry Pi from a camera, over USB, and to the NCS2, the Pi has to do a lot of unnecessary video shuffling – which uses up the CPU while also limiting the framerate.

    For example with 640×480 video with a Pi 3B+ and a NCS2 running MobileNet-SSD object detection, the max is around 12FPS, at a high Pi CPU use.

    With the board above the cameras go directly to the Myriad X (instead of through the Pi host first), so MobileNet-SSD runs at about 60 FPS instead of 12FPS (or a 5x improvement), and also then has 0% impact on the RPi CPU, as the whole video path is on the Myriad X.

    So a 5x improvement while also taking the RPi CPU use from ~70% to 0%.

    And even better, the Myriad X can do real-time depth estimation from stereo cameras. So the dual-camera option allows you to do that too if you want, with 0% CPU impact on the RPi, as the Myriad X SHAVE cores do all the work.

    And stereo vision + object detection is SUPER POWERFUL in robotics (and I’m sure a bunch of other applications I can’t think of), I’ve recently discovered. As you can know what objects are, and a real-time answer to where they are in 3D space. So interacting with the physical world is then super-easy and real-time. And if the Pi CPU is not loaded down doing video shuffling, it leaves a lot of room for end-use processing.

    Anyways, getting a website up now for those who are interested in getting one of these PCBs. Going to open-source the hardware design and also do a Crowd Supply so folks can order it.

    Best,
    Brandon

      1. @Ostracus – that camera (the T265) is the Myriad 2 actually. It’s confusing, but the Myriad 2 is what’s in the NCS1. So it’s the older one that’s 6x to 8x slower than the Myriad X.

        This article is talking about the NCS2, which has the Myriad X in it. And my board is based on the Myriad X, not the Myriad 2, so it enables ~6x to 8x the speed of the Myriad 2 (and that camera).

        Thoughts?

        1. Also over USB the Myriad 2 (in the NCS1) gets ~5-6FPS on the Raspberry Pi 3B+ where the Myriad X (in the NCS2) gets about 11-12FPS. So this isn’t the 6x-8x you’d hope for. And the ‘why’ is the inefficient data path through the Raspberry Pi when the video has to go out through USB to the stick.

          With the carrier-board approach, the video (camera) feed can go directly to the Myriad X. So that you do see the 6x-8x speedup (and actually higher than that when used with the Raspberry Pi, as the Pi CPU limits even the NCS1).

          Thoughts?

    1. I want that board. Noticed the Google Edge TPU is going to deploy with the same format as you describe. Any idea of performance comparison between Myriad X and the Edge TPU?