It sounds like what you are trying to achieve is to combine a point cloud from each camera into a single point cloud made up of the data from all the cameras. Is that correct, please?
In a multiple camera webinar session that Intel held in September 2018, they suggested two ways to do this:
1. Use software called Vicalib. "It uses a board that you show to each of the cameras in turn., and it establishes overlapping regions to then minimize the pose of each of those together".
Or 2. "Vicalib can do this, but there is a simpler approach, which will work in 90% of cases. This is is to take the point cloud from every one of the cameras and then do an Affine Transform. Basically, just rotate and move the point clouds, in 3D space, and then once you've done that, you append the point clouds together and just have one large point cloud".
The webinar also suggested a method for generating motion capture data with multiple cameras.
- Calibrate an inward-facing configuration of multiple cameras using the open-source Vicalib software.so that the extrinsic poses of each of the cameras can be received.
- Use a rigid transformation for each of the point clouds to align them in the same space.
- Run a 2D landmark detector on each of the cameras which is used to triangulate into a single estimate for all of the body parts in the captured sequences.
- This effectively provides markerless motion capture that can be used with VR, providing depth and color information of multiple people. It can also track their body parts and provide interaction at full frame rate.
Would using the following function help achieve this -
1. Calibrate camera1 and camer2 extrinsic relative to reach other from a common plan in the real-world(How?)
2. For camera1 (x,y,z) and extrinsic of camera1 calculated in  get from_point
3. For camera2 (x,y,z) and extrinsic of camera2 calculated in  get from_point
4. Both points should be same in the real world.
static void rs2_transform_point_to_point(float to_point, const struct rs2_extrinsics * extrin, const float from_point)
>>It sounds like what you are trying to achieve is to combine a point cloud from each camera into a single point cloud made up of the data from all the cameras. Is that correct, please?
Imagine two Venn diagrams represented by A and B which have some intersection in between them. In this case A and B are my cameras which have some overlap.
I need to work on their real world coordinates so I need a way to convert camera1 coordinates into real world and camera2 coordinates in real world but the extrinsic of both the cameras are such that the overlapping region coordinates of both cameras should come to the same (x,y,z) in real world to be able to point the same object from two perspectives. To do this, I need to come up with extrinsics of camera1 and camera2 relative to real world.
Does that clarify?
A comment on the end of the webinar posting also asked about using the extrinsics into the same space (go right to the top of the page if you want to read the webinar transcript from the start).