I have actually done more than a little bit of investigation into this area recently myself. All my reading and testing show that you cannot modify any of those settings in Windows Server 2008 or Windows 7. I would assume the same is true with Windows 8.
I too was digging into things finding out why performance was so poor between Windows and Linux, though in my case the Linux is running in a BMC and not a full processor. In my case I was able to adjust some parameters in the BMC and get about a 700% performance increase, however that again is for a BMC not a 'real' CPU. In my case I modifed net.core.tcp_rmem and net.core.tcp_wmem to align more with the Windows TCP Window sizing.
I noticed that Linux to Linux and Windows to Windows (in synthetic tests) were in line with expectations (basically line rate), but when you mix then things get ugly - my assumption is the OS's do different algorithms for trying to determine an optimal TCP Window size.
I'm very much looking forward to seeing if others have similar experiences.
In my limited experience with this card, it seems that using more Rx/Tx buffers boosts performance the most. If that does not do the trick, monitor per-CPU load with ProcessExplorer. I would set RSS Queues to 4, and specify starting RSS CPU on the second port to something other than 0 (it's 16 on the PC that I use, with 16 cores, 32 HTs).
Please report your findings, I don't believe this is optimal, but does the trick for me. Limitation here is MTU=1500. If I could use jumbo packets, I'd reduce the CPU interrupt and DPC load dramatically, and attain the 20 Gb/sec half-duplex speed.
Don't forget to test with RAM drive instead of an actual disk array to isolate NIC performance metrics.