Here is a possible explanation.
1. SCC does not allocate a cache line on write. Only read misses will fill the cache.
2. A write miss will generate a non-burst write whereas a read miss will bring in a cacheline 32 bytes. In case of back to back writes the write buffer should generate a burst write.
When you enable WRITE_FIRST during initialization InputVariable[Offset] = OutputBuffer[Offset] will read all the source addresses (OutputBufer) into the cache as a cache line fill. After this the source reads will be cache hits but during your write to InputVariable all the writes will be byte writes going to main memory. I dont think back-to-back writes will occur since we are operating on char. Hence this will be slower.
When you do NOT enable WRITE_FIRST OutputBuffer[Offset] = InputVariable[Offset] will read all the destination addresses (InputVariable) into the cache. The reads to OutputBuffer will be a cache line fill but all your writes will be a cache hit. This should be much faster.
Let me know if this makes any sense or if I am analyzing it incorrectly.