If I had to create a workaround, I would avoid glArrayElement like the plague.
Instead, I would try to allocate an index - list with 6 x #QUADs (perhaps a static list,
to minimize reallocations), fill the index - list with the exact same indices you otherwise
use with glArrayElement, and then call
glDrawElements, or better still, glDrawRangeElements
at the end.
This will give the OpenGL driver the opportunity to reuse vertices and saves a lot
of transfer and transform overhead, and should result in MUCH higher performance,
provided there is a reasonable large number of primitives per draw call.
It perhaps could also fix the performance degredation you see.
Not speaking for Intel.