5 Replies Latest reply on May 18, 2011 6:07 PM by tedk

    Booting Custom MINIX Many Core OS on SCC hangs

    nieklinnenbank
      Dear MARC members,

      At the VU Amsterdam we are developing a modified version of MINIX to run as a many-core OS
      on the SCC. At this point we have modified MINIX such that it can start directly from an image in memory,
      thus without help from a traditional bootloader or BIOS. The problem we're having is that this image boots
      MINIX just fine in Qemu simulating the pentium, but on the SCC it "hangs" very early on in the boot code.

      We have two versions of the image, which is 'image' and 'image.obj'. The 'image' and 'image.obj' contain
      exactly the same code, except they are in a different format. The 'image' file is binary and 'image.obj' came from bin2obj.
      Both 'image' and 'image.obj' get loaded at exactly the same addresses (0x90000 and 0x100000).
      The 'image' file can be loaded just like a linux kernel directly into memory with Qemu (tested with 0.12.3).

      I've used the command below to boot it with Qemu:
      $ qemu -cpu pentium -serial stdio -kernel image -d cpu_reset,int -m 128
      SCC!
      loadminix(0x105e00)
      image is 3130880 bytes
      params set at 0xf000, 4 bytes
      hdr[0] @ 0x105e00 -> 0x0: name = kernel, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x1e000, data = 0x86a1, bss = 0x6611b, entry = 0x0, total = 0x8c7bc, syms = 0x0
      starting relocation of image at 0x403000
      hdr[1] @ 0x12c800 -> 0x403000: name = ds, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0xe000, data = 0x3570, bss = 0xe490, entry = 0x0, total = 0x3fa00, syms = 0x0
      hdr[2] @ 0x13e000 -> 0x443000: name = rs, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x11000, data = 0x4f70, bss = 0x342d0, entry = 0x0, total = 0x839640, syms = 0x0
      hdr[3] @ 0x154200 -> 0xc7d000: name = pm, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0xe000, data = 0x3784, bss = 0x64d9c, entry = 0x0, total = 0x96520, syms = 0x0
      hdr[4] @ 0x165c00 -> 0xd14000: name = sched, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x7000, data = 0x2524, bss = 0x2a5c, entry = 0x0, total = 0x2bf80, syms = 0x0
      hdr[5] @ 0x16f400 -> 0xd40000: name = vfs, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x18000, data = 0x4f24, bss = 0xadedc, entry = 0x0, total = 0xeae00, syms = 0x0
      hdr[6] @ 0x18c600 -> 0xe2b000: name = memory, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x9000, data = 0x1f6be4, bss = 0x8f1c, entry = 0x0, total = 0x228b00, syms = 0x0
      hdr[7] @ 0x38c400 -> 0x1054000: name = log, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0xa000, data = 0x2c64, bss = 0x1c7dc, entry = 0x0, total = 0x49440, syms = 0x0
      hdr[8] @ 0x399400 -> 0x109e000: name = tty, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x13000, data = 0x4910, bss = 0x28dd0, entry = 0x0, total = 0x606e0, syms = 0x0
      hdr[9] @ 0x3b1000 -> 0x10ff000: name = mfs, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x10000, data = 0x3784, bss = 0x1039c, entry = 0x0, total = 0x43b20, syms = 0x0
      hdr[10] @ 0x3c4a00 -> 0x1143000: name = vm, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x18000, data = 0x9050, bss = 0x1e5330, entry = 0x0, total = 0x226380, syms = 0x0
      hdr[11] @ 0x3e5e00 -> 0x136a000: name = pfs, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0xf000, data = 0x2da4, bss = 0xbaf1c, entry = 0x0, total = 0xeccc0, syms = 0x0
      hdr[12] @ 0x3f7e00 -> 0x1457000: name = init, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x8000, data = 0x22a4, bss = 0xedc, entry = 0x0, total = 0x2b180, syms = 0x0
      Invoking: minix(koff=0x106000, kcs=0x106000, kds=0x106000, bootparams=0xf000, paramsize=0x200, aout=0xf200)
      1234CSTART
      cstart
      value is 8bd3e
      value for bus is 'at'
      WE ARE AT!!!
      intr_init(1, 0)
      machine.ps_mca = 0
      main()
      initializing idle... MASK=0
      done
      initializing clock... MASK=0
      done
      initializing system... MASK=0
      done
      [output trimmed]

      The current boot image has the following components and load addresses in it's load.map for the bin2obj utility:

      0x00090000 bootr/bootr
      0x00100000 loadr/loadr
      0xfffff000 reset_vector/reset_vector.bin

      The reset_vector.bin is exactly the same as the one found  in the linuxkernel/reset_vector directory
      at the MARC SVN. It does a jump to 0x90200 where the bootr program continues. Bootr initializes
      protected mode. Then bootr jumps to the loadr program at 0x100000 where the MINIX specific initialization
      is done including setting up kernel boot parameters. Finally loadr will perform a ljmpl to the location where the
      MINIX kernel is loaded in memory, in our case 0x106000. This is the output when booting at the SCC:

      [same output, trimmed]
      hdr[11] @ 0x3e5e00 -> 0x136a000: name = pfs, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0xf000, data = 0x2da4, bss = 0xbaf1c, entry = 0x0, total = 0xeccc0, syms = 0x0
      hdr[12] @ 0x3f7e00 -> 0x1457000: name = init, magic = 0x1 0x3, flags = 0x10, cpu = 0x10, len = 32, version = 0x0, text = 0x8000, data = 0x22a4, bss = 0xedc, entry = 0x0, total = 0x2b180, syms = 0x0
      Invoking: minix(koff=0x106000, kcs=0x106000, kds=0x106000, bootparams=0xf000, paramsize=0x200, aout=0xf200)
      1234

      The above is the last output received when booting at the SCC. I started the image using the sccGui choosing the
      option 'Choose custom Linux image (file dialog)'. The possible location of EIP at that point
      should be in mpx.S (http://pastebin.com/cVRqTP5r) or start.c (http://pastebin.com/hhJ85VN0) at the beginning of cstart().
      If relevent, here below are the hardware registers I got in Qemu just after printing the '4' to the serial console:

      EAX=00000034 EBX=0000f000 ECX=00000000 EDX=000003f8
      ESI=00091096 EDI=0009108a EBP=0000ef1f ESP=00027ffc
      EIP=000000ed EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
      ES =0018 00106000 ffffffff 00cf9300 DPL=0 DS   [-WA]
      CS =0030 00106000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
      SS =0018 00106000 ffffffff 00cf9300 DPL=0 DS   [-WA]
      DS =0018 00106000 ffffffff 00cf9300 DPL=0 DS   [-WA]
      FS =0018 00106000 ffffffff 00cf9300 DPL=0 DS   [-WA]
      GS =0018 00106000 ffffffff 00cf9300 DPL=0 DS   [-WA]
      LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
      TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
      GDT=     0012b340 0000003f
      IDT=     00000000 000003ff
      CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000
      DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
      DR6=ffff0ff0 DR7=00000400
      EFER=0000000000000000
      FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
      FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
      FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
      FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
      FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
      XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
      XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
      XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
      XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000

      The SCC installation we have outputs this header when doing a telnet to the BMC server:

        Board Serial# 01095100089
        Usable GB ETH 0110
        Software:     1.10  Build: 1228  Oct 12 2010  18:18:01
        CPLD:         1.07
        HW-ID:        0x00
        POWR1220:     0xC0000001 (master), 0x40000001 (slave)
        DDR3 modules: Present: 0 1 2 3 4 5 6 7

      Futhermore, we have the latest SCCKit 1.4.0 installed which is able to boot the Linux SCC kernel successfully.

      I already tried to disable the L2 hoping to make it continue to boot, but without success. Note that the
      SCC is still responding as usual with the SCCKit commands. It might be possible that tripple fault is happening,
      but in this early stage of the boot code MINIX doesn't have interrupts enable yet so I dont see an exception printed, if any.

      At this point I have the following questions:
      1) What could be the reason(s) of the "hang"?
      2) Is there any way of debugging the issue? More specifically, is there a way to attach a debugger to the SCC
      so I can reliably see the state of the CPU when the problem occurs?
      3) Another curious thing is that I'm seeing these warnings after a while in sccGui:

      WARNING: Received unexpected IO-Packet (tracing only works on address 0x2f8 & 0x3f8):
      INFO: Unexpected IO packet 138 from RC to HOST -> transferPacket(0x00, CORE0, 0x0_000003f8, NCIORD, 0x20, 0x0000000000000000_0000000000000000_0000000000000000_0000003400000034);

      Looking forward to your helpful comments,

      Niek Linnenbank



        • 1. Re: Booting Custom MINIX Many Core OS on SCC hangs
          JanArneSobania

          The warning from sccGui indicates that your kernel tries to read I/O port 0x3f8+5 (Line State Register of the serial port at 0x3f8), but the request is not answered. Therefore, the core hangs at the corresponding "in" instruction, waiting for a reply message from the MCPC that will never arrive [*].

           

          Are you using the "well-known" PC COM1 in MINIX? If so, you may want to disable the corresponding driver and try again. If you need the port to communicate with the OS instance, it must be emulated in software from the MCPC. Our group has created such a patch for crbif (the MCPC driver that controls the SCC via the FPGA) that adds serial ports (4 per core, at the default PC addresses). You can download it here:

           

          http://www.dcl.hpi.uni-potsdam.de/research/scc/serial.htm

           

          As you are running sccKit 1.4.0, you can simply download the modified crbif 1.1.3 sources. There is no need to check the code out of the marcbug svn, as the version posted there is too old (1.1.0, which came with sccKit 1.3.0) and does not work with the newer FPGA bitstream.

           

          Please note that our serial port does not generate interrupts; it can be used in polling mode only. To get it to run with Linux, we had to change the definition of the serial ports to use an IRQ of 0, which instructs the driver to use polling. I don't know whether the MINIX driver supports this setting, or how to turn it on.

           

          [*] In fact, I got the same behaviour while developing our crbif patches. The NCIORD message is re-send up to 4 times, with ~8 seconds in between. If the reply does not arrive even after the fourth attempt, the bus transaction will complete with whatever data is present in the buffer. This allows the core to continue execution (as I said, after around 32 seconds), but the result is undefined and may confuse the driver.

          • 2. Re: Booting Custom MINIX Many Core OS on SCC hangs
            jheld

            Good suggestions on dealing with the I/O instruction.

             

            I'd also caution that you may face other platform dependencies.

            SCC Linux startup had to be modified to avoid device initialization
            (many are missing) and discovery interaction with the BIOS (there is none).

             

            PIT, CMOS clock, keyboard controller - are examples, don't know which may be relevant to MINIX.

            -Jim

            1 of 1 people found this helpful
            • 3. Re: Booting Custom MINIX Many Core OS on SCC hangs
              tedk

              It concerned me that you said the crbif driver was not updated. I checked our internal SVN and the checkin times between our internal SVN and marcbug were the same. However, the files are different, but the difference appears cosmetic. In any case, I checked into marcbug the latest mcpc_driver from our internal svn, and I'll verify that these are the files used by sccKit 1.4.0.

              http://marcbug.scc-dc.com/svn/repository/trunk/mcpc_driver/

              • 4. Re: Booting Custom MINIX Many Core OS on SCC hangs
                tedk

                The latest crbif driver source is in  crbif-dkms_1.1.3.deb. This deb file is part of the sccKit 1.4.0 released tar. You can extract the crbif source with

                dpkg -x crbif-dkms_1.1.3.deb src.

                 

                The latest crbif source is always part of the sccKit released tar. We are evaluating whether we want to continue to maintain the mcpc_driver directory in our public svn.