9 Replies Latest reply on Aug 29, 2011 4:01 PM by jheld

    How do I/O ports work on an SCC core?




      I am looking for information on how I/O port accesses work on the SCC. I skimmed through the SCC Linux code and it seems it uses in/out instructions to communicate with the MCPC (in include/asm-i386/mach-mcemu/mcemu_debug.h). Is there any documentation on how this exactly works? I am particularly interested in how port accesses are communicated over the mesh and how the core knows where to route them.


      Regards, Julian

        • 1. Re: How do I/O ports work on an SCC core?

          There isn't any documentation on SCC Linux that I'm aware of. The documentation really is the source code and its comments. The code was written here and so if you have some very specific questions, we can get a response from the relevant engineer. If you are trying to do something specific with the code and it's not working as you expect, please share your observations here, and we can discuss them.

          • 2. Re: How do I/O ports work on an SCC core?

            On the off chance you mean the HW rather than the OS that Ted was referring to...


            OUT and IN instructions and their varients are routed by the HW to the FPGA "chipset" which currently ships them to the MCPC.


            The sccKit software has APIs that allow capturing and responding to them.  If you're going to want to use them and are willing to work with the sccKit, let me know and I'll dig out some example code.   Oh, and you'll likely need to right a driver in Linux as I believe it sets the IOPL to forbid ring3 access.

            1 of 1 people found this helpful
            • 3. Re: How do I/O ports work on an SCC core?

              Yes, I meant IN/OUT instructions and to put it in perspective, we are currently evaluating whether device emulation on the SCC is feasible. A desirable goal would be to run unmodified operating systems (not just SCC Linux).


              Being able to respond to I/O port accesses from the MCPC is definitely helpful and example code would be more than appreciated. But the original goal of my question was to find out whether you can route them not to the MCPC, but to another core on the mesh. Correct me if I am wrong, it sounds like this should be possible with modifications to the "chipset". The obvious advantage would be that the latency of handling I/O port accesses is greatly reduced.

              • 4. Re: How do I/O ports work on an SCC core?

                Still working on getting some tested example code, found lots of interface with the mesh,  but haven't got solid examples ironed out yet.


                Routing to another is difficult because the Pentium only knows about IN and OUT that it initiates.  There is no unsolicted IO port input defined.  What form did you imagine the accesses taking at the destination core?


                The MCPC does 'see' the packet interface and can receive and generate them.  It can take the IO and forward them in some form - presumably as an interrupt with a protocol for passing on the data through memory (e.g. MPB).    Given a proven method, it could possibly be encoded into the FPGA, but it would be competing for development time and space.


                Finally, since SCC isn't a prototype product, I'm not sure that the ability to run unmodified OSs is all that important.  It will be much more straightforward to modify the OS of choice, as we have done in order to boot Linux.  Are you looking for a particular one?

                • 5. Re: How do I/O ports work on an SCC core?

                  During the MARC symposium in Braunschweig last week, I had the same idea as you: To answer I/O port accesses from other cores on the SCC, you would have to forward them via some shared memory protocol. It's not pretty, but should work. For now, I guess, I am going to prototype this on the MCPC. That should be enough to evaluate it.


                  For the use case: We have modular device models and some BIOS code lying around. It is not trivial, but straight-forward to wire them together to create something that runs an out-of-the-box bootloader (e.g. GRUB or gPXE). From that point, at least our OS (L4 Fiasco) should run unmodified[1], which would be great for further experiments. I hope it is also sufficient to boot an umodified Linux. Of course, you would still need special drivers for efficient I/O (i.e. the upcoming ethernet and framebuffer features), but the diff to vanilla Linux shrinks considerably and you wouldn't have to touch the baroque bootstrap code. The bar for porting other OSs is reduced as well.


                  Btw, The symposium in Braunschweig has helped me a lot to improve my mental picture of how the SCC works. So thanks for doing that @ Intel.

                  • 6. Re: How do I/O ports work on an SCC core?

                    Finally, some example code.  Presuming you will create or modify an sccKit  application you'll need the followng content or its equivalent.

                    Note that somewhere there will be creation of an object to access the sccAPI,  something like this:


                    // Invoke sccApi interface to get access to the SCI messaging

                    sccAccess = new sccApi(log);


                    Now you need to define an object with a method to handle the signal.  Call it sccIOServer for now.

                    The handler method would look something like:


                    // define some static variables

                    uInt32 msgArray[12];

                    uInt32 *msgPointer = (uInt32 *) msgArray;


                    // define the method to handle IO requests

                    void sccIOServer::slotIoRequest(uInt32 * message)



                    int tmp;

                    mem32Byte mem;

                    int sif_port_recipient;

                    int sif_port_sender;

                    int transid;

                    int byteenable;

                    int destid;

                    int routeid;

                    int cmd;

                    int answerType;


                    UINT16 size; // number of bytes received

                    UINT16 start; // starting byte position in aligned 64b line

                    UINT32 value; // value read or written

                    UINT32 Cpu;    // CPU core doing the read/write

                    UINT32 port; // port number to be read, written

                    // parse the message from the Scc System Interface (SIF)

                    tmp = message[8];    // contains byteenable, transid, srcid, destid

                    sif_port_sender = (tmp >> 16) & 0x0ff;

                    transid = (tmp >> 8) & 0x0ff;

                    byteenable = tmp & 0x0ff; // bitmap of bytes in the cacheline that are valid

                    mem.addr = message[9];    // 32b of the 34bit physical target address

                    tmp = message[10];    // contains 2b of physical target address,

                    // 12b command type; 8b sccroute (x,y); 3b src/destid

                    mem.addr += ((ULONGLONG) (tmp & 0x03)) << 32;

                    destid = tmp >> 22 & 0x07;

                    routeid = tmp >> 14 & 0x0ff;

                    cmd = tmp >> 2 & 0x01ff;

                    sif_port_recipient = tmp >> 11 & 0x03;

                    // these are the actual data in the SIF frame.

                    mem.data[3] = ((ULONGLONG) message[7] << 32) + (ULONGLONG) message[6];

                    mem.data[2] = ((ULONGLONG) message[5] << 32) + (ULONGLONG) message[4];

                    mem.data[1] = ((ULONGLONG) message[3] << 32) + (ULONGLONG) message[2];

                    mem.data[0] = ((ULONGLONG) message[1] << 32) + (ULONGLONG) message[0];


                    // the SIF interface is asynchronous so we must handshake

                    message[10] = 0;    // handshake signal with SIF interface that we're done with the message array - don't touch again.

                    // determine a core number from the address

                    Cpu = PID(X_TID(routeid), Y_TID(routeid), destid);

                    if (sif_port_recipient != SIF_HOST_PORT)



                    port = mem.addr; // initialize with aligned address, we'll adjust that later

                    // data comes in as it would be on the P54C bus, eight byte aligned

                    // with byteenable as a bitfield indicating which bytes within that range are

                    // to be read/written

                    // determine the start position based on the byte enable

                    // after this loop, start is left at the start position, size is number of bytes

                    size = 0;

                    start = 0;

                    while ((start < 8) && byteenable) {

                    // if bit set in byteenable the increment the size

                    if (byteenable & 1) {


                    } else {

                    // else increment the address so we can tell the client an unaligned location

                    port += 1;



                    // shift the byteenable to next bit position

                    byteenable >>= 1;



                    // invoke read or write callbacks

                    if (cmd == NCIOWR) {

                    // it's an IO write so shift according to where the byteenable said it starts

                    value = mem.data[0] >> (start*8);

                    // make some use of what was written where and how big it is

                    printf ("Cpu %d, port %d, value %x, size %d\n", Cpu, port, value, size);

                    answerType = MEMCMP;

                    } else {

                    // it's an IO read so get value from somewhere and return it in mem.data

                    mem.data[0] = 0xdeadbeef;

                    answerType = NCDATACMP;



                    // Prepare SCEMI message with response...

                    msgPointer[10] = (uInt32) destid << 22;

                    msgPointer[10] += (uInt32) routeid << 14;

                    msgPointer[10] += (uInt32) answerType << 2;

                    msgPointer[10] += (uInt32) ((ULONGLONG) (mem.addr) >> 32);

                    msgPointer[9] = (uInt32) (mem.addr & 0x0ffffffff);

                    msgPointer[8] = (uInt32) sif_port_sender << 24;

                    msgPointer[8] += (uInt32) sif_port_recipient << 16;

                    msgPointer[8] += (uInt32) transid << 8;

                    msgPointer[8] += (uInt32) byteenable;

                    msgPointer[1] = (uInt32) mem.data[0];

                    msgPointer[0] = (uInt32) (mem.data[0] >> 32);


                    // Send SCEMI packet







                    Now, to get the notifications of the IO, use the Qt APIs to replace the

                    default handler with your own.  In the call, 'this' refers to the instance of sccIOServer.


                    // connect to sccApi processing of IO from SCC cores

                    disconnect(sccAccess, SIGNAL(ioRequest(uInt32 *)), 0, 0);

                    connect(sccAccess, SIGNAL(ioRequest(uInt32 *)), this, SLOT(slotIoRequest(uInt32 *)));

                    • 7. Re: How do I/O ports work on an SCC core?

                      Let's push this one more time...


                      First of all, thank you for this example code. I've been working on getting L4 Fiasco running on the SCC. So far, with this example code I managed to get the OS running and even a rudimentary emulation of a serial console. In order to properly document my work, it would be very nice to have some decent information about this I/O redirecting mechanism, i.e. how it works in detail, how the message data structure looks like, and so on. Is it possible to get this kind of information?


                      Thanks in advance,



                      • 8. Re: How do I/O ports work on an SCC core?

                        Since I had many problems with I/O processing during my work, I had a look at the official SoftUART implementation and found two minor differences between the code there and your example code posted above:


                        - There seems to be a "magic" acknowledgement by writing a 0 into some array

                        - "Automatic write response" is disabled while using SoftUART


                        These two changes made my I/O routine completely stable whereas without them I had frequent hangs and crashes. This fact makes me think that they work around a problem with many I/O requests like printf's that seem to somehow congest the i/o redirection mechanism. Is that the case?


                        So again, I'd like to get some more detailed information about this I/O handling for my documentation. Mainly I'm interested in three points:


                        - How are I/O requests captured and forwarded?

                        - What does the message look like?

                        - Why do these two changes mentioned above suddenly stabilize my program?


                        I'd really appreciate some help





                        • 9. Re: How do I/O ports work on an SCC core?

                          I presume you are referring in the first case to :

                          message[10] = 0;

                          // handshake signal with SIF interface that we're done with the message array - don't touch again.


                          That arises because the system interface thread in SccKit does not queue IO request datastructures.  It waits on the handshake location to confirm that the data has been dealt with in the callback before storing another IO's data.   Otherwise rapidly incoming data can overright existing before the message has been completely copied out.


                          I'm not familiar with the softUART so can't comment on the automatic write response.  I suspect it relates to the default method in SccKit that responds to unexpected writes.


                          All IO requests that are emitted by an SCC core are converted to read and write messages addressed to the system interface and therefore the FPGA.  The FPGA sends them over PCIe to the MCPC driver driver and SccKit.  The EAS and the SCCKit source are the best documentation that exists for such low level operation.