NVIDIA Jetson TX2 Dev kit vs Intel?

RTasa · ‎08-31-2017

The Jetson TX2 seems to be a very powerful small form factor pc running Linux with

several camera inputs.

Is there anything on the Intel side that is comparable?

R-CNN , Tensor Flow, Yolo and high speed image analysis

seems to its strengths.

Will the RealSense D435 in concert with an Atom/ Apolo Lake Celeron be comparable?

My interests are image recognition combined with distance of the objects recognized in a

outdoor environment.

MartyG · ‎08-31-2017

This article on a user who used a TX2 with RealSense may be of interest to you.

http://www.jetsonhacks.com/2017/03/26/intel-realsense-camera-installation-nvidia-jetson-tx2/ Intel RealSense Camera Installation - NVIDIA Jetson TX2 - JetsonHacks

There is currently no information available on the processor compatibility of the new D-cameras, so I cannot recommend any boards that would match up to those camera models yet, unfortunately.

PSnip · ‎08-31-2017

Hi Marty,

Agree about Intel not recommending anything for D4 yet. However, since Intel has recommended ZR300+Joule in past, I believe Intel would suggest using D4 on Joule to start with. I believe that Intel will either release a successor of Joule with higher processing power, or provide a Joule+ASIC solution. But as you say, we can only guess. Only Intel can tell with certainty.

MartyG · ‎08-31-2017

At the time of the cancelation of Joule, Edison and Galileo, some users speculated that a newer model of Up Board called the Core may provide all the features in one board that the canceled boards provided separately.

http://www.up-board.org/up/comparison-up-versus-edisongalileo-joule/ Up Board | Power Up Your Ideas! - Comparison UP versus Edison,Galileo & Joule

PSnip · ‎08-31-2017

Thanks Marty.

I didn't know about Up board. Now I checked about it and see that it is much more powerful - 12 cores. I wonder if it's a practical solution for mobile application. It may be consuming too much power and generating too much heat?

As the Computer Vision & AI is strongly shifting towards Neural Networks, soon the generic processors will be out of choice. Atom or ARM is noway suitable for Neural computations. Graphics processor like Jetson will do a much better job. Intel may be already thinking about this & discontinuing Joule may be a step in that direction? I am curious to know about performance of the Movidius Compute Stick. If it does a decent job, one could just combine this with one of the Ubuntu powered boards to do all robotics stuff. The generic processor only needs to run the OS, and all heavy duty computation can be handled by Movidius stick. Use multiple sticks if one is not sufficient. So you really don't need Joule, a less powerful solution may do the job.

The reason I didn't order Movidius compute stick myself was that it only supports Caffe. I wish it could support TensorFlow and other models. I was in no mood to learn Caffe just to try their compute stick. But now I am giving it a second thought.

Regarding Joule, Intel is still selling those dev kits, and it will continue shipping till Dec 2017. Funny thing is that their ZR300 product page officially recommends using Joule dev kit.

https://click.intel.com/intelr-realsensetm-development-kit-featuring-the-zr300.html

So unless they plan to discontinue ZR300 too by year end, they should put an alternative to Joule dev kit in place :-)

MartyG · ‎08-31-2017

I believe that if one is aware that a product is going to be withdrawn from sale in the near future then it is still a valid choice so long as (a) you are aware its software is not going to receive further updates, (b) the current state of its software meets your immediate project needs, and (c) the price of the hardware is attractive as a result of impending cancelation (like with the current RealSense kit sale in the Intel Click online store til Sept 30 or whilst stocks last).

As the D-cameras support indoor and outdoor use, it would make them a natural successor to the ZR300. As the ZR300 developer kit was only launched in March 2017, I'd expect it to be on sale for another year yet before retirement, but I have no official insight on that matter.

Intel's strategy has increasingly been to develop "reference designs" for new products and then invite manufacturers to create their own version of that design (e.g the SR300-compatible Razer Stargazer and Creative BlasterX Senz3D cameras). This approach is also planned be used for Intel's forthcoming 'Project Alloy' Realsense-based "merged reality" headset, assuming that it is still in development and on course for the previously announced Q4 2017 release window.

So it is conceivable that rather than manufacture a successor to the Joule, Intel may choose to put its support behind another manufacturer's product, much like how the original Up Board was bundled with the R200 camera in the RealSense Robotic Development Kit (also on heavily discounted sale right now in the Click store, plug plug!)

Yes, neural systems with humble-powered hardware can do a lot now. I was reading yesterday about how a new neural learning program for analyzing galactic astronomy images can generate results in seconds even running on a smartphone, whereas before it could take a month to calculate the results.

RTasa · ‎09-01-2017

Wading through the myriad of options is pretty rough.

Its why I was hoping Intel would create a design standard that could carry this forward.

As I am trying to guess a path forward, the DCamera is coming out this month, YAY.

If it does some of the things I hope I will be in line to get one. Information is still sparse.

I just watched a TED talk on YOLO and they seem to have come very far on human recognition

(over 90% I think he mentioned) This is great news for me but what CPU demand this had to do

in realtime was unknown. Here is hoping that there will be an Intel low power VPU alternative that can handle this.

The Up boards look very interesting.

MartyG · ‎09-01-2017

You've probably seen this already, but Intel employee Bolous AbuJaber posted a tutorial for implementing YOLO using Euclid's cameras and TensorFlow. Perhaps the principles in that tutorial may be adaptable for the D-cameras, though the script code may have to be rewritten for SDK 2.0 if such an adaptation of the principles were possible for the D-camera hardware..

http://www.euclidcommunity.intel.com/static/tutorials/pdf/or_tutorial_final.pdf http://www.euclidcommunity.intel.com/static/tutorials/pdf/or_tutorial_final.pdf

RTasa · ‎09-01-2017

"and it achieves around 1FPS on Euclid."

I stopped reading after this. 1 fps isn't fast enough.

https://www.ted.com/talks/joseph_redmon_how_a_computer_learns_to_recognize_objects_instantly Joseph Redmon: How computers learn to recognize objects instantly | TED Talk

about 4/5 of the way through he is doing multiple object recognition in near

real-time on a phone. Euclid cant go as fast a phone???

MartyG · ‎09-01-2017

l will PM Bolous AbuJaber to check if that 1 FPS figure is correct.

PSnip · ‎09-02-2017

Hi Marty, Thanks for sharing. As ChicagoBob pointed out 1 fps is not something practical. However, one option I see is that one could run this powerful algorithm at 1fps or even less frequently to get good scene understanding. At the same time one could run a less powerful algorithm more frequently to post process the scene. Whenever I get D4 kits in hand, these things will be good to try out. Keep sharing the wonderful resources.

MartyG · ‎09-02-2017

Yes, I imagine that 1 frame per second would be like a click - click - click press of the button on an ordinary manual photography camera. You potentially miss a lot of detail inbetween the button presses that would make it useless for real-time image gathering like a moving creature in a nature photo, but could be useful under certain circumstances where you only need to take a periodic snapshot of what the camera is observing.

It's going to be great to see what people do with the technology such as longer range, higher frame rate and real-time sensing that the new D-cameras provide.

RTasa · ‎09-03-2017

I have several comments about this but most pressing that I hope that someone could give some feedback.

How a cell phone can get near realtime YOLO performance while full blown cpus can not.

What is secret to making this happen? The GPU? The ARM? The memory speed?

CNN's should be a snap for SSE3 asm optimization since it has many matrix multiplication

operations so a phone beating an Atom?

Maybe with someone familiar with the source code for the phone side could comment on how

they pulled this off. I am sure Intel has connections with the YOLO group

so maybe they can help. Intel for sure has people that can read the phone source code.

The hardware suggested at the Darknet website is completely out of date as

they are running this on a cell phone and a laptop (and no one has said what kind of laptop it is)

PSnip · ‎09-03-2017

Hi Bob,

I think there are 2 parts to your question.

(1) How some of the cellphones are achieving Very high accuracy on Object Recognition, whereas CPUs can't. The high accuracy on Object Recognition has been possible using Deep Learning & in particular CNN. Now to do the computations for a Neural Network, a generic CPU is not a optimum hardware. Neural Networks are math heavy, in particular lot of matrix multiplications. The closest analogy I can give is that computations are similar to those involve in Computer Graphics. Just like you need a powerful graphics processor to get good gaming performance, you need a special Processor or ASIC to shine on Neural Computations. Google's TPU or Qualcomm's Hexagon DSP are Deep Learning friendly processors. Since Nvidia already had Graphics Processors, and Neural Networks require similar computations, Nvidia also has some good offerings. There are also some cloud based offerings, where you could offload the heavy processing. Intel's Movidius is a similar powerful chip well suited for deep learning. Some of the cellphones which I know to be providing good object recognition performance use Qualcomm's Hexagon DSP. If not for this specialize hardware, performance will be nowhere close. As MartyG mentioned in one of the posts, Intel has now announced Myriad X VPU. This VPU should give us state of the art performance, but it is not clear when this will be available. So a long wait here I suppose.

(2) Second issue which I see is perceived performance. The performance as depicted in promotional videos or in specific research papers is some time not very realistic. It works well only on limited test environment, but not in real world. Apart from the data set & environment, 2 other important factors which can significantly impact the required processing power are Accuracy & no of categories.

RTasa · ‎09-04-2017

I don't know about you but the wild west of AI seems to be going in 10 directions at the same time.

With resource constrained R&D budgets we all have to be really choosy as to what road to chase.

I ordered a Movidius Compute Stick and while I waited found out it was only able to do single object detection

and did not have a tensor flow acceleration yet. So I ended up canceling the order.

The Movidius X chip seems exciting but with no release date I can't spend cycles on vaporware.

I put a lot of eggs into the Tensorflow basket since it was open source and had the most github downloads.

I have experimented with Keros which is more generic and of course slower.

I am slowly moving toward YOLO due to the impressive numbers.

Driven by the industry changes our goal at my company does not include thousands of

generic classifications but high inference accuracy of approximately 100 to 200 classifications.

Waiting for the D435 to be released and all it brings to the game of change and hoping a

release data for the Movidius X chip will be released soon as I have to pick and stick to a direction

by end of September.

Wonder if Intel will be bringing all this to CES like NVidia does? I will be there this year and would love

to get the story one on one.

PSnip · ‎09-04-2017

Hi Bob, I am in the same boat as you are. I soon hope to get hands on D435. so I want quickly like to finalize the platform. For initial playing around with camera, or running some basic algorithms, I could spend a couple of months with a Linux PC. But during that period I would certainly want to finalize the platform. Mryiad X has been announced recently, but I believe putting it on to dev boards and making dev board available to developers might take months. I would like to hear some dates from Intel.

As coming to Movidius compute stick, I too decided not to order it because of missing TensorFlow support. I have already put a lot of effort on TensorFlow training, because this was currently the most popular tool. So, I am kind of hoping that Intel realizes importance of providing TensorFlow support whenever they offer Myriad X kits to developers.

Michael_M_Intel1 · ‎09-04-2017

A couple of points regarding the performance of YOLO example on Euclid...

1. The default installation of TensorFlow does not seem to be optimized to use the vector units on the CPU. This is kind of annoying...

2. I don't know if a quantized network was used. Quantizing to 8-bit weights and arithmetic can lead to a significant speedup.

The combination of the above means a huge performance speedup has potentially been left on the table. On top of that it may be possible to experiment with the network architecture itself to improve performance.

In short, the YOLO example is a tutorial on how to get OR working... NOT a tuned and optimized benchmark.

By the way, I've also been working with the Movidius Neural Compute Stick SDK and just found out that the examples they guide you through were only using 1 out of 12 cores on the devices. I was wondering why the reported perf numbers were an order of magnitude lower than the specs... the good news is that is easy to fix, and with all the cores ticking over the performance is much much better, I just don't understand why it's not the default setting. But I digress

Summary: a tutorial is not (necessarily) a benchmark.

Corollary 1: when someone reports huge speedups for their implementation of algorithm X, check that they are using tuned baseline code for comparison and not unoptimized tutorial code.

Corollary 2: don't assume implementations in OSS stacks are optimized, either (OpenCV and ROS, I'm looking at you...)

RTasa · ‎09-05-2017

McCool

Are you implying that Movidius Compute stick can do R-CNN in real-time?

In general I guess we all await to see what the power of Intel can bring as we start moving to the AI revolution.

It will be hard to go backward after we make commitments to platforms over the next several months.

RTasa · ‎09-06-2017

Thinking a bit today and was wondering if optimizing the CNN with SSE asm working directly with YUV

using 16 bit floating point and using the new higher speed DDR5 memory if just using the CPU would

just be faster than anything else out there?

Wonder if Intel is thinking about sticking that in their IPP or whatever its called now a days.

PSnip · ‎09-07-2017

Hi Bob,

Your idea is valid as CNN will use some of the functions at low level, which if optimized on the native processor will definitely improve performance.

For example NVIDIA had CUDA to improve graphics performance. Now they also have cuDNN which optimizes some of the low level functions to better utilize the GPU architecture for CNN/DNN. cuDNN can be used with TensorFlow to get better performance if you are using NVIDIA GPU.

However, the problem I see with a GPU like ATOM or ARM is that they are not built with kind of instruction set or architecture to do these kind of arithmetic operations. No matter how good assembly you write, GPU won't excel at matrix multiplication. In my opinion you won't gain much trying to take that path.

Since you want to do all processing on a PC, does your PC already have a Graphics Processor? If yes, you could try turning on GPU option under TensorFlow and see what performance you get. BTW, another point I want to make is that in general a PC with any GPU will give you better performance than what you get on embedded development kits like Joule or Euclid. So, it would be interesting if you could first see what kind of performance you are already getting. And then based on how short you fall, you will have to embolden it with additional hardware. You could use a GPU, or Movidius NCS. But NCS as we had discussed last wont work directly with TensorFlow models. However, I just checked the ncsforum and appears that Intel is seriously looking in to adding TensorFlow support. You might want to check the ncsforum for this.

RTasa · ‎09-08-2017

I check back and forth and even post on the Movidius forum.

Last I checked someone was trying to get Tiny YOLO to work on the chip.

I don't think there are enough bodies there to address the questions popping up.

I have always relied on Intel to be a steady barometer but looking at

whatever is going on with the Myriad X (Link

https://www.intel.com/content/www/us/en/analytics/artificial-intelligence/overview.html Artificial Intelligence Enables a Data Revolution )

I really wonder whats going on with their marketing/sales/direction.

I asked a question at the Movidius site about Myriad X.

This is my post.

"Surprised no one has commented on this yet.

There isn't the specificity I had hoped for to understand how

the product would work in the real world.

The silicon isn't released right?

Will the compiler supply direct support to accelerate Tensorflow?

From the promo video it can take in multiple HD camera inputs but

whats more important is multiple object YOLO or RCNN type detection

is that supported? And last but not least is it due this year? Or next?

And if this isn't the place to ask this then where?"

No reply.

Add the deletion of the small form factor boards and you start to wonder

if Intel is just marketing AI or lacking skills and man power in promoting AI.