2 Replies Latest reply on May 22, 2013 9:51 AM by Fred_Intel

    dxva2 high system memory consumption on Intel HD2500 (i5-3570)


      I'm developing sofware using dxva2 interface to perform hardware decoding of multiple H264 (ModeH264_VLD_NoFGT) video streams on Intel HD graphics' MFX (Multi Format Codec Engine) integrated in Intel Ivy Bridge processors. My 32bit program works on Windows 7 64bit OS. I've encountered with very high system memory (virtual memory) consumption by user mode graphic driver igdumdim32: for each instance of dxva2 video decoder device (IDirectXVideoDecoderService::CreateVideoDecoder) igdumdim32 allocates large block of heap memory (tens of megabytes).  Here is stack of allocation:


      0a62e120 770e5673 ntdll!NtAllocateVirtualMemory

      0a62e210 770b3cee ntdll!RtlpAllocateHeap+0xcc3

      0a62e294 17493cd7 ntdll!RtlAllocateHeap+0x23a

      WARNING: Stack unwind information not available. Following frames may be wrong.

      0a62e2b4 174c75f2 igdumdim32!OpenAdapter+0xf2cd7

      0a62e2c0 174d42b5 igdumdim32!OpenAdapter+0x1265f2

      0a62e348 174bda05 igdumdim32!OpenAdapter+0x1332b5

      0a62e384 726422c0 igdumdim32!OpenAdapter+0x11ca05

      0a62e3d4 726423b7 d3d9!CreateDecodeDeviceLH+0x74

      0a62e404 01077ffa d3d9!CDxva2Container::CreateDecodeDevice+0x104

      0a62e458 010772e9 dxva2!CVideoDecoderDevice::CVideoDecoderDevice+0x27e

      0a62e48c 015eb850 dxva2!CVideoAccelerationService::CreateVideoDecoder+0x91



      I've found that size of the buffer depends on resoultion of video stream (specified in DXVA2_VideoDesc) by the following:

      bufferSize = SampleWidth / 16 * (SampleHeight / 16) * 0x1224

      For 5Mp (2592x1944) video frame we have: 162*121*4644=91031688 (~91Mb!!!).


      So, for each instance of video decoder for 5Mps video, driver requires at least 90Mb system memory. This is strongly limiting applicability of Intel MFX for multiple video stream decoding in 32bits applications: Theoretically, the power of MFX in latest processors allows tens of 5Mps to be decoded simultaneously and I've already convinced that it's possible, but the large memory usage can significally complicate this.


      Is this buffer size substantiated or may be it is allocated just in case and not fully used for real h264 video streams? What is magic constant 0x1224 hardcoded in driver? My be it is maximum theoretic coded h264 macroblock size?

      It seems that 91Mb for 5Mp video frame is too much for any data processing...