The local APIC is part of the P54C itself, so accessing it does not result in externally-visible bus cycles (or messages on the SCC mesh). Therefore, the LUT entry is meaningless for these addresses, as they will never be visible to the Mesh Interface Unit (MIU).
The Bypass bit is completely independent of that, though. Before using or considering to use it, please note that this bit is bugged in current SCC silicon, so it is officially encouraged not to use this bit at all. Setting it may lead to random data corruption as explained here: http://communities.intel.com/docs/DOC-5405
The "local tile memory buffer" is located on the same tile as the requesting processor core. The EAS calls it "memory buffer", but it is generally called the MPB in other public documentation.
My understanding of the Bypass bit is that it was originally intended to provide a shortcut to the "nearest" MPB for a processor (both physical and in terms of latency). The SCC consists of 24 tiles, each of them having (besides other things) two "GaussLake" processors (derived from P54C), a message router, and a MPB. All components are connected to the tile's MIU, which translates between the processors' bus interfaces and the other components.
The MPB of a single tile can be accessed from the processors on this tile, as well as from other tiles or the MCPC via corresponding mesh messages. That is, if a processor on tile A wants to access the MPB of tile B, it simply performs a memory read or write cycle; its MIU then generates a message and sends it over the mesh to the target tile's message router, which forwards it to that tile's MIU, which then forwards it to the MPB.
If Bypass is set to 0, the exact same thing happens when a processor on tile A wants to access the MPB of tile A: a mesh message is generated and forwarded to the router, which recognizes that the message is already at its intended destination and returns the message to the MIU; the same happens for the reverse path (when the memory access is acknowledged).
There is a performance penalty for this case, as the router introduces a delay of 4 (mesh) clock cycles in both directions that could easily be avoided: the MIU could perform the operation itself, as the target MPB is directly connected to it. This is what Bypass was originally intended for: if it is set, the packet is not forwarded to the router, but instead handled locally. This way, it stays inside the tile (more specifically: inside the tile's clock domain), so it also doesn't need to cross the clock-crossing FIFOs to the message router, which would introduce further delays.
1 of 1 people found this helpful
When you set the bypass bit, data will bypass the local tile router and will (as conclusion of one of our experiments) always hit in the local tiles MPB. From the address you specify it will only use the lower bits and in that case (sort of) wrap around when the address is larger than 16KB.
However, I can not find the document anymore, but you should *NOT* use the bypass bit, as pointed out by the two Michael's during the MARC symposium in Braunschweig. It has some arbitration errors.
At least this document http://communities.intel.com/docs/DOC-5847 contains the message on Slide 4.
And I just found the original document: http://communities.intel.com/docs/DOC-5405
We've confirmed that we have a bug in the MPB bypass logic.
The bug is hit for bypass operations that get stalled due to core0/core1 arbitration.
During a bypass operation, a packet is forwarded directly to the MPB. However, a secondary signal is generated to prevent the packet from being put on the mesh.
When both cores issue requests near the same time, the MIU arbitration will stall one core while it services the other core.
Stalls on bypass requests cause the "mesh prevention signal" to not be asserted. This causes an unintended packet to be put on the mesh which ultimately can cause data corruption in the MPB. With this in mind, we recommend that the bypass feature not be used.
Severity ... Major ... Function works but has sever side effects.
P3 ... no fix planned.=============================================================================Edit: JanArne was just a few minutes faster ;-)
Thanks, Jan [edit: and Roy!], for the thorough response. "MPB" would definitely be less ambiguous than "memory buffer" in the EAS.
I am left with another question. Your explanation seems to indicate that the LUT entry for 0xfe is meaningless due to it being the memory-mapped location of the APIC, not due to the bypass bit being set. If that's so, my understanding of the LUT is a little off. I thought the LUT could be arbitrarily set so that any "core physical address" could be mapped to any system address. For example, I thought that I could swap the LUT entries for 0xfe and 0x45, and then I would be able to address the APIC at 0x45e00000 rather than 0xfee00000. Are there instead restrictions on which of the physical addresses can actually be mapped by the LUT? And if so, where do the "restricted addresses" actually map to?
1 of 1 people found this helpful
Your understanding of the LUTs is correct, with the one and only exception (that I know of) being the local APIC.
The LUT only influences memory accesses that would be visible on the processor's front-side bus in a traditional system; on the SCC, that means behind the L2 cache controller from the point of view of the processor. Everything before that point (APIC, L1, L2) uses core-physical addresses only and does not know about the LUTs or any SCC-specific functionality.
What is interesting about the local APIC is that it is a memory-mapped device, but present on the processor itself. For the SCC, this really means "the block labeled 'core 0/1' in the block diagram of a tile".
The presence of memory-mapped, on-core devices means that there are certain addresses in the core's 32-bit physical address space that can never be seen on the external bus. If software accesses these addresses (e.g., it writes an entry into its page table to map them somewhere into its linear address space, then performs a regular access), the access will be handled internally by the processor. Any external logic (which includes the L2 cache on the SCC, and everything beyond, like the MIU and message router) will never be able to get a bus transaction for such an address.
Now for the strange setting of LUT 0xfe. I also noticed some strange mappings on our setup (Bypass bit set, nonexistant tile coordinates, router ports nothing was supposed to be connected to, etc.), and I had a talk with the hardware developers on the Braunschweig SCC Symposium last year. The bottom line is that the physical address ranges mapped by these LUT entries are not accessed at all, so they are also never initialized with meaningful values; they contain random garbage. If you tried to access these addresses (e.g., on Linux, do an ioremap to build necessary page table entries, then use the returned pointer), the processor would hang because the corresponding bus cycle would never be answered.