Multi face detection

NZafr · ‎04-06-2017

Hi,

I am working with RealSense SR300, DCM 3.3.27.5718, and RSSDK 10.0.26.0396.

When my application detects more than one face, I can use RSSDK to tell me that more than one face is detected (QueryNumberOfDetectedFaces), but when I apply QueryLandmarks(), I only get landmarks information for the first face that was detected (the face that have index 0).

For all the other faces, QueryLandmarks() returns NULL.

Anyone encountered this issue?

Is it possible to get landmarks data for the faces that were not detected first? If yes, how?

Thanks.

MartyG · ‎04-06-2017

You can track up to four faces with marked rectangles for face boundaries, and select a particular face to focus on. Only one of those faces at a time will have detectable tracking landmarks though.

I wonder if the instruction QueryFacesByID might help, as it lets you provide an ID number for the face that should be the one that is being actively tracked. It's a more specific version of QueryFaces, which returns all detected faces in an array.

https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?queryfacebyid_pxcfacedata.html Intel® RealSense™ SDK 2016 R2 Documentation

There's little other information on QueryFaceByID, but I did track down a script that uses it. That may be a useful reference for you.

https://software.intel.com/en-us/node/621659 Face recognition - Database not working correct

NZafr · ‎04-09-2017

Hi Marty,

Thank you for your response.

I am aware that the maximum faces with marked rectangles is 4, but thanks for clarifying that landmarks info can be provided for only one of those faces at a time.

With regard to QueryFaceByID, I am familiar with this function. I actually used QueryFaceByIndex which is a similar function.

My problem is, that although QueryFaceByIndex does return the face with the index I ask for, the face with the landmarks info is always the first detected face regardless of the face I request from QueryFaceByIndex.

Is there a way to configure RealSense to provide landmarks info for the face I want instead of the first face it detects?

MartyG · ‎04-09-2017

In case I am getting confused, I just wanted to clear something up. Are you wanting the program to return landmark info about a registered face in the database that is not your own face?

Unless you have more than one person in front of the camera at the same time, it seems logical that it would default to detecting only the landmarks for the face that is currently in front of it. After all, if it's just you then the other people in the database are not present to have their landmarks checked by the camera to verify their identity. If it could check the details of faces that were not currently in front of the camera then it would be like a robber trying to get past the facial recognition lock on a bank vault by holding a photo of the bank manager's face up to the camera.

Something similar occurs with hand tracking. When using both hands at the same time, one hand has the index number '0' and the other hand has the index number '1'. If only one hand is being used in front of the camera, only elements that are set to index '0' respond, and elements set to index '1' become inactive if the camera cannot see the other hand.

My apologies if I have misunderstood what you are aiming to do.

NZafr · ‎04-09-2017

What I am trying to do is a lot more simple than what you describe:

I want my system to return landmarks information of the face that is closest to the camera, regardless of the face identity (I actually don't care about identities, and I don't register any face to any DB).

Now, when there's more than one face in front of the camera, I can get the number of faces as well as getting the average depth of each face and deduce who is the closest face to the camera.

After that I have the index of the closest face, and I use it with QueryFaceByIndex.

However, if the index of the closest face is not 0, the QueryLandmarks function returns NULL (for index 0 it returns the landmarks info).

Hope it clarifies.

MartyG · ‎04-09-2017

I found a couple of interesting links. The first is an Intel article with downloadable source code that recognizes the face closest to the camera.

https://software.intel.com/en-us/articles/intel-realsense-sdk-based-real-time-face-tracking-and-animation Intel® RealSense™ SDK-Based Real-Time Face Tracking and Animation | Intel® Software

In the other link, the Crosswalk Project created an extension for RealSense that can detect faces based on various factors such as nearest and furthest.

https://crosswalk-project.github.io/realsense-extensions-crosswalk/spec/face.html Face Tracking And Recognition

NZafr · ‎04-09-2017

Thank you for sharing those links.

I am familiar with the information presented in them.

Theoretically my system is configured to track faces from closest to farthest, so I am really puzzled regarding why it doesn't work.

FYI, I am also in touch with someone from Intel's RealSense team, but so far she hasn't came up with a solution. I will update if she comes up with something.

Let me know if you have any other idea.

MartyG · ‎04-09-2017

It probably becomes a lot easier if the user has to get closer to the camera, since once they are close enough then their face will fill the camera's view and block out any other people present, thereby making the closest person the one that is tracked, because the camera cannot see past the nearest person's head.

NZafr · ‎04-09-2017

Unfortunately, in my system, the closest user doesn't have to be very close to the camera.

It is possible that the camera will detect 2 (or even more faces).

MartyG · ‎04-09-2017

You could add a Blob Tracking condition as the trigger for detection to begin. Blob Tracking is a crude form of tracking where it only reacts to large flat-ish areas of skin such as the forehead, rather than precise landmarks and joints. So once a person's face got close enough to the camera, their forehead should make 'Blob Detected' true. So if you made your face landmark detection routine's activation dependent on Blob Detected being true, that would enable the nearest user to have to get closer to the cam before it took action, as you have to get much closer to the camera for it to be triggered than with Face Tracking.

NZafr · ‎04-09-2017

Interesting idea, but I think it has a few problems:

1. The farthest person might be too far for Blob Detection to work.

For example: the closest person will be 1m from the camera, and there will be another person standing 1.2m from the camera.

In such case, both persons will be seen by the camera, but I suppose that Blob Detection won't work for any of them due to the large distance.

2. I suppose that the opposite might happen as well:

Blob detection might work for two persons if they are both close enough to the camera.

3. Even if what you suggest works and the Blob Detection starts working for the closest person, I am still left with the issue of getting landmarks info for a specific face of my choice. Right now I am at the point where I have the index of the closest face, and I activate the landmarks routine only for that face. The problem is that if this face index is not 0, the landmarks routine return NULL.

So all in all, my problem is not finding the closest face, but getting the landmarks info for it.

Regardless of all the above, it is a clever idea that can be useful in other scenarios. Thanks for sharing!

Out of curiosity, do you have idea from what distance Blob detection starts working?

MartyG · ‎04-09-2017

I have only used the Unity game creation engine implementation of Blob Tracking, using the 'TrackingAction' Unity script that comes packaged with the RealSense SDK's Unity Toolkit, so I don't know the kind of range it has in an environment such as C# / C++. In Unity, the triggering range is pretty close. If you imagine getting down on your knees in front of your desk, with the camera on the top of the desk, that's the kind of close range.

NZafr · ‎04-09-2017

Pretty close...

I am working in a C++ environment.

In any case, distances of less than 30-40cm are not relevant at all in my application.

MartyG · ‎04-09-2017

An alternative to Blob Detection would be something written for C++ that is equivalent to the Unity TrackingAction's 'Real World Box'. The Real World Box is an imaginary box inbetween the user and the camera lens that determines how deep inside the box (i.e how near to the camera) the user must be before an action is triggered.

So if you set the X, Y, Z of the box to be 100, 100, 100 cm in size, and set the box's Center to be 50, then that means that you have to get within 50 cm of the camera (the imaginary box's center) before an action will trigger. The smaller the Center value, the nearer the center is to the user and the easier it triggers. So if you set your Z to have a Center value of 70, you would have to get closer to the camera before the action would trigger

Here's a guide about it that I wrote a long time ago.

https://software.intel.com/en-us/forums/realsense/topic/610438 Unity Tip: Knowing When To Use The RealSense TrackingAction's Real World Box Center

NZafr · ‎04-09-2017

Clever idea, but again, in my case it is possible that multiple faces will be inside the Real World Box.

Say that you have 2 faces inside this Real World Box, so both are valid for tracking. Can you choose which one of them will be tracked and yield landmarks info? If yes, than what I have today is already sufficient since I can already iterate over all the faces that were detected and calculate which one of them is the closest.

MartyG · ‎04-10-2017

I have not actually tried multiple face tracking with the Real World Box implementation in Unity. I know it uses an ID system to differentiate between faces though, the same as with hand tracking. Here's an example from my project, where it is set to track the default face ('0') and read the bottom, tip and top landmarks of the nose.

It's usually sufficient to track just one part of the nose, but as the virtual avatar in my project has the full range of vertical movement from looking up in the air to crouching on the ground, I track all parts of the nose so that if one landmark goes out of the camera's view, one of the other landmarks takes over the tracking and prevents the tracking from stalling.

I believe that whatever face is first detected by the camera is automatically assigned index '0' and another face nearby that is detected afterwards would be assigned index '1'. I would guess that the face nearest the camera would get index '0' because it is the easiest one for the camera to lock on to its facial landmarks.

So like in the example above, if you wanted only the nearest face to be tracked then you would set the program to only respond to the face with index '0', and it would ignore any other faces that had index '1'.

NZafr · ‎04-10-2017

Your assumption is wrong, the face that was detected first gets index 0 (at least in my application), and it's not necessarily the face that is closest to the camera. That is despite the fact that the camera tracking strategy is configured to "closest face first".

If the face that is closest to the camera will always get index 0, I wouldn't have any problem because landmarks info is provided for the face with index 0.

In your application, since you configured it to always track the face with index 0, you don't see that problem. If you will try indexes other than 0, you will most likely get no landmarks info.

MartyG · ‎04-10-2017

I can only speak for how it seems to work in the Unity implementation of face tracking, since that is my primary development environment. I am always happy to admit if I am mistaken though, since the learning increases my knowledge. As official documentation for RealSense in Unity is very limited, I can only go by my own experience and documentation with developing in it. Perception is not a guarantee of truth though.

idata · ‎04-14-2017

Hi noamz,

Do you still need assistance with this case? Let us know if you still have questions.

-Sergio A

NZafr · ‎04-16-2017

Hi Sergio,

Yes, I still need help with this issue.

This issue is not yet resolved, so my initial question of this thread is still relevant:

How do you retrieve landmarks information for a face which has an index bigger than 0?

Whenever I apply the QueryLandmarks() function on a face with index bigger than 0, the return value is NULL.

Thanks,

Noam.

idata · ‎04-17-2017

Hi Noam,

Thanks for the confirmation. We'll investigate further on your case and contact you back as soon as we have an update.

.Regards,

-Sergio A