I'm writing a video processing software for Linux and I want to use hardware acceleration for decoding and encoding through VAAPI. So far I have implemented only the decoding part, which decodes frames on GPU and reads decoded images into main memory for further processing. The question I have is what type of memory should I expect when the decoded frame is acquired via (a) vaDeriveImage or (b) vaGetImage? Is the memory I get from vaMapBuffer a regular (write-back) memory or it may be a USWC memory somehow local to the GPU? In the latter case I may want to use a custom frame copying algorithm based on MOVNTDQA to optimize performance.
Although I haven't done the encoding side, I suspect the similar question can also be applied to that part since I will have to pass raw frames to the GPU. Will I have to use MOVNTDQ for that?