You can track up to four faces with marked rectangles for face boundaries, and select a particular face to focus on. Only one of those faces at a time will have detectable tracking landmarks though.
I wonder if the instruction QueryFacesByID might help, as it lets you provide an ID number for the face that should be the one that is being actively tracked. It's a more specific version of QueryFaces, which returns all detected faces in an array.
There's little other information on QueryFaceByID, but I did track down a script that uses it. That may be a useful reference for you.
Thank you for your response.
I am aware that the maximum faces with marked rectangles is 4, but thanks for clarifying that landmarks info can be provided for only one of those faces at a time.
With regard to QueryFaceByID, I am familiar with this function. I actually used QueryFaceByIndex which is a similar function.
My problem is, that although QueryFaceByIndex does return the face with the index I ask for, the face with the landmarks info is always the first detected face regardless of the face I request from QueryFaceByIndex.
Is there a way to configure RealSense to provide landmarks info for the face I want instead of the first face it detects?
In case I am getting confused, I just wanted to clear something up. Are you wanting the program to return landmark info about a registered face in the database that is not your own face?
Unless you have more than one person in front of the camera at the same time, it seems logical that it would default to detecting only the landmarks for the face that is currently in front of it. After all, if it's just you then the other people in the database are not present to have their landmarks checked by the camera to verify their identity. If it could check the details of faces that were not currently in front of the camera then it would be like a robber trying to get past the facial recognition lock on a bank vault by holding a photo of the bank manager's face up to the camera.
Something similar occurs with hand tracking. When using both hands at the same time, one hand has the index number '0' and the other hand has the index number '1'. If only one hand is being used in front of the camera, only elements that are set to index '0' respond, and elements set to index '1' become inactive if the camera cannot see the other hand.
My apologies if I have misunderstood what you are aiming to do.
What I am trying to do is a lot more simple than what you describe:
I want my system to return landmarks information of the face that is closest to the camera, regardless of the face identity (I actually don't care about identities, and I don't register any face to any DB).
Now, when there's more than one face in front of the camera, I can get the number of faces as well as getting the average depth of each face and deduce who is the closest face to the camera.
After that I have the index of the closest face, and I use it with QueryFaceByIndex.
However, if the index of the closest face is not 0, the QueryLandmarks function returns NULL (for index 0 it returns the landmarks info).
Hope it clarifies.
I found a couple of interesting links. The first is an Intel article with downloadable source code that recognizes the face closest to the camera.
In the other link, the Crosswalk Project created an extension for RealSense that can detect faces based on various factors such as nearest and furthest.
Thank you for sharing those links.
I am familiar with the information presented in them.
Theoretically my system is configured to track faces from closest to farthest, so I am really puzzled regarding why it doesn't work.
FYI, I am also in touch with someone from Intel's RealSense team, but so far she hasn't came up with a solution. I will update if she comes up with something.
Let me know if you have any other idea.
It probably becomes a lot easier if the user has to get closer to the camera, since once they are close enough then their face will fill the camera's view and block out any other people present, thereby making the closest person the one that is tracked, because the camera cannot see past the nearest person's head.
You could add a Blob Tracking condition as the trigger for detection to begin. Blob Tracking is a crude form of tracking where it only reacts to large flat-ish areas of skin such as the forehead, rather than precise landmarks and joints. So once a person's face got close enough to the camera, their forehead should make 'Blob Detected' true. So if you made your face landmark detection routine's activation dependent on Blob Detected being true, that would enable the nearest user to have to get closer to the cam before it took action, as you have to get much closer to the camera for it to be triggered than with Face Tracking.
Interesting idea, but I think it has a few problems:
1. The farthest person might be too far for Blob Detection to work.
For example: the closest person will be 1m from the camera, and there will be another person standing 1.2m from the camera.
In such case, both persons will be seen by the camera, but I suppose that Blob Detection won't work for any of them due to the large distance.
2. I suppose that the opposite might happen as well:
Blob detection might work for two persons if they are both close enough to the camera.
3. Even if what you suggest works and the Blob Detection starts working for the closest person, I am still left with the issue of getting landmarks info for a specific face of my choice. Right now I am at the point where I have the index of the closest face, and I activate the landmarks routine only for that face. The problem is that if this face index is not 0, the landmarks routine return NULL.
So all in all, my problem is not finding the closest face, but getting the landmarks info for it.
Regardless of all the above, it is a clever idea that can be useful in other scenarios. Thanks for sharing!
Out of curiosity, do you have idea from what distance Blob detection starts working?
I have only used the Unity game creation engine implementation of Blob Tracking, using the 'TrackingAction' Unity script that comes packaged with the RealSense SDK's Unity Toolkit, so I don't know the kind of range it has in an environment such as C# / C++. In Unity, the triggering range is pretty close. If you imagine getting down on your knees in front of your desk, with the camera on the top of the desk, that's the kind of close range.
An alternative to Blob Detection would be something written for C++ that is equivalent to the Unity TrackingAction's 'Real World Box'. The Real World Box is an imaginary box inbetween the user and the camera lens that determines how deep inside the box (i.e how near to the camera) the user must be before an action is triggered.
So if you set the X, Y, Z of the box to be 100, 100, 100 cm in size, and set the box's Center to be 50, then that means that you have to get within 50 cm of the camera (the imaginary box's center) before an action will trigger. The smaller the Center value, the nearer the center is to the user and the easier it triggers. So if you set your Z to have a Center value of 70, you would have to get closer to the camera before the action would trigger
Here's a guide about it that I wrote a long time ago.
Clever idea, but again, in my case it is possible that multiple faces will be inside the Real World Box.
Say that you have 2 faces inside this Real World Box, so both are valid for tracking. Can you choose which one of them will be tracked and yield landmarks info? If yes, than what I have today is already sufficient since I can already iterate over all the faces that were detected and calculate which one of them is the closest.