I haven't used the performance counters on the SCC myself, but here is how I interpret that description.
The processor cores of the SCC are derived from the P54C, which has a traditional bus interface. So from the viewpoint of the core, if it wants to access memory or I/O devices, it performs "bus cycles". Put an address onto the address lines, (for writes) put data onto the data lines, assert external control signals and so on. Because the L1$ is on-core, accesses to it are not visible on the bus interface. However, for the P54C, the L2$ is off-core, so any access to it needs to go via that interface.
On the SCC, there is no common bus that connects cores to memory; there is a mesh network instead. However, cores cannot interface directly to the mesh; they need a "protocol converter" instead, that supplies a bus interface on one side and a mesh interface on the other. That is what the Mesh Interface Unit (MIU) does. There is one MIU per tile, that connects that tile's cores to that tile's router. The MIU contains the tile's configuration registers, a data path to the tile's message passing buffer, as well as (I think two completely independent) bus interfaces for the two cores.
Therefore, if a core wants to write memory, the following happens (I assume a non-cached non-mpbt memory write here): it puts the address and data onto its bus, then activates control lines for a non-cached memory write operation. This goes through the external L2$ and write-combining buffer (it is non-mpbt, so the WC buffer does nothing) to the MIU. The MIU converts the request into a packet, using information from the configuration registers (the LUT entry corresponding to that physical address) to fill in necessary information, like the upper bits of the target address, target tile and router port. That packet is then sent over the mesh. While the answer has not yet arrived, the MIU cannot allow the bus cycle initiated by the core to complete, so it keeps the core waiting (does not assert the READY line on the bus interface). Once the answer packet arrives, the MIU completes the bus cycle, and the core can continue.
While the MIU keeps the core waiting, the core's clock is still running. So from the description you quoted, I would assume it counts the number of core clocks between start and end of a bus cycle as described above; that is, the number of clocks the bus (between the core and MIU) is not idle.