14 Replies Latest reply on Dec 11, 2011 2:44 AM by darence

    Mapping the MPB in the kernel

    darence

      Hi,

       

      I need to do some MPB management in the kernel code. To map it, I guess I shoud use ioremap() in some flavor. For instance, what rckmb does is:

       

      ioremap_prot(address, mpb_size, _PAGE_PMB|_PAGE_PWT);
      
      

       

      This works correctly, however it is very slow: several times slower than what can be achieved with RCCE_get. I tried to copy the permission flags that RCCE_get uses (from /dev/rckmpb) and it is much much faster (even too fast I would say), but it does not work correctly -- the messages get corrupted in very strange ways.

       

      edit:

      Well, after some reflection I came to a conclusion that _PAGE_PCD should not be set, as we do want the content to be cached. _PAGE_PWT (write-through) should also remain unset, provided we invalidate the L1 prior to put/get. _PAGE_PMB simply gets masked out underneath, and anyway I am not sure what it is for. So something like:

       

      ioremap_prot(address, mpb_size, 0);
      
      
      

       

      should work, but it is just erroneous for me. Maybe it is about the way I invalidate the cache so I will try to inspect that... Any comments are appreciated.

        • 1. Re: Mapping the MPB in the kernel
          tedk

          Where did you find that ioremap_prot line? What file are you looking at?

          • 2. Re: Mapping the MPB in the kernel
            tedk

            It's hard to know exactly what you're doing here, Darko. I cannot find an ioremap_prot ... not to say it's not there somewhere.

            The flags you refer to ... are these permission flags? RCCE has put and get flags, but I don't think this is what you mean. If you are dealing with RCCE flags, did you set them to be single bit or byte? About the message corruption ... do you think getting a message stuck in the write-combine buffer might contribute to the problem you are seeing?

             

            In any case, do you think there is a RCCE problem here or a rckmpb problem or an SCC problem here? If not, but what you want to do is understand what rckmpb does better, I can start talking to whoever wrote rckmpb.

            • 3. Re: Mapping the MPB in the kernel
              mwvantol

              Ted Kubaska wrote:

               

              It's hard to know exactly what you're doing here, Darko. I cannot find an ioremap_prot ... not to say it's not there somewhere.

              The flags you refer to ... are these permission flags? RCCE has put and get flags, but I don't think this is what you mean. If you are dealing with RCCE flags, did you set them to be single bit or byte? About the message corruption ... do you think getting a message stuck in the write-combine buffer might contribute to the problem you are seeing?

               

              In any case, do you think there is a RCCE problem here or a rckmpb problem or an SCC problem here? If not, but what you want to do is understand what rckmpb does better, I can start talking to whoever wrote rckmpb.

              Ted, he is talking about kernel code, so this is outside of RCCE. So it's not a RCCE or rckmpb or SCC problem as far as I understand

               

              Darence, I did the same in the rckmem.c driver if you are using the latest one. I map the MPB something like this;

               

              mpb_addr = ioremap_prot(OWN_MPB, MPB_SIZE, 0);

               

              Where I used these values;

              #define OWN_TILEID  0xf8000100
              #define FIRST_MPB   0xc0000000
              #define OWN_MPB     0xd8000000

              #define MPBADDRBITS 13
              #define MPBSIZE     (1<<MPBADDRBITS)

               

              This worked fine in the rckmem.c driver, and should not be slow, or at least for reading that is. However, this is not with the MPBT flag set so writing to the MPB this way would be slow. I think you have to replace the 0 with a _PAGE_PSE to enable the MPBT flag and therefore the Write Combine buffer. Note that _PAGE_PSE is aliased to _PAGE_MPE in some of the sources, but this does not make a difference.

              1 of 1 people found this helpful
              • 4. Re: Mapping the MPB in the kernel
                darence

                Well, I don't know how to set that flag (_PAGE_PSE). Because here is how ioremap_prot looks under the hood:

                 

                void __iomem *ioremap_prot(resource_size_t phys_addr, unsigned long size,
                                                unsigned long prot_val)
                {
                        return __ioremap_caller(phys_addr, size, (prot_val & _PAGE_CACHE_MASK),
                                                __builtin_return_address(0));
                }
                

                 

                _PAGE_CACHE_MASK is actually _PAGE_PWT | PAGE_PCD, so whatever flags you supply, it fill simply filter out these two and ignore the rest. So if I understand well, there is an MPBT bit in a page table entry, which should be set?

                 

                Anyhow, what baffles me is that when I make this ioremap (either with _PAGE_PSE or with 0, as it makes no difference), I am getting some inconsistent data. For instance, let's say one core pumps a sequence of numbers to the MPB: 1,2,3,4,5,6,7.... and the other core reads it (they are synchrtonized of course). What I get on the receiving core is like 1,1,1,4,4,6,6... Smells like a problem of cache invalidation.

                 

                And actually it is connected to what you said about the MPBT flag. If that flag is not set for the MPB data, the MPB cache invalidation instruction simply won't invalidate anything... which could very well explain what I am getting. Right? But that still does not solve the problem, that is, how to actually set this _PAGE_PSE guy in the kernel.

                • 5. Re: Mapping the MPB in the kernel
                  darence

                  Ted, the question is only how to map the MPB in the kernel properly. RCCE can be used from ordinary user processes, but working with MPB directly from the kernel is still a taboo So if you know someone who could help, it would be nice to see him/her here...

                  • 6. Re: Mapping the MPB in the kernel
                    mwvantol

                    darence wrote:

                     

                    Well, I don't know how to set that flag (_PAGE_PSE). Because here is how ioremap_prot looks under the hood:

                     

                    void __iomem *ioremap_prot(resource_size_t phys_addr, unsigned long size,
                                                    unsigned long prot_val)
                    {
                            return __ioremap_caller(phys_addr, size, (prot_val & _PAGE_CACHE_MASK),
                                                    __builtin_return_address(0));
                    }
                    

                     

                    _PAGE_CACHE_MASK is actually _PAGE_PWT | PAGE_PCD, so whatever flags you supply, it fill simply filter out these two and ignore the rest. So if I understand well, there is an MPBT bit in a page table entry, which should be set?

                    Ok, so you can just call __ioremap_caller instead perhaps, its not clean and portable across kernel versions probably, but if it does the job why not :). In fact in the 2.6.16 kernel that was supplied earlier I used the __ioremap call but this no longer worked in 2.6.38 so I replaced it with ioremap_prot there. the MPBT bit is the same as the PSE bit, so you need to set _PAGE_PSE. Setting _PAGE_PWT is also advisable I think, though I'm not 100% sure if it is necessary.

                     

                    As for the errors you received,  I think this is indeed cached results and an invalidate that does not work. Unless you try to read MPBT and noncached at the same time, we have also seen that giving some strange behavior earlier on returning the wrong data.

                    • 7. Re: Mapping the MPB in the kernel
                      darence

                      __ioremap_caller is a static function... so I would have to recompile the kernel. Even if I do it, I don't know what I will get since I don't see any special treatment of _PAGE_PSE inside... But there is probably a reason why it is static, and there must be a more elegant way to do it (without recompiling the kernel, or am I asking for too much ). I might try to hack the kernel page table after the mapping is created, there is probably a way to change the flags on the go.

                       

                      The last resort would be to mimic what RCCE does in mmap() by calling do_mmap(), but will create a user virtual address and change the page table of the active process, so this is just asking for trouble because the mapping is created for the running process: call it while another process is running and you are dead.

                      • 8. Re: Mapping the MPB in the kernel
                        mwvantol

                        darence wrote:

                         

                        __ioremap_caller is a static function... so I would have to recompile the kernel. Even if I do it, I don't know what I will get since I don't see any special treatment of _PAGE_PSE inside... But there is probably a reason why it is static, and there must be a more elegant way to do it (without recompiling the kernel, or am I asking for too much ). I might try to hack the kernel page table after the mapping is created, there is probably a way to change the flags on the go.

                         

                        Ah yes, static, sorry must have missed that. It was quite late last night when I tried to answer your question and had a quick glance at the kernel source. There is no special treatment of _PAGE_PSE required, as it is already a one bit flag that corresponds with the position that needs to be set in a page table entry. It is defined in the architecture specific arch/x86/include/asm/pgtable_types.h

                         

                        But yeah, the problem is, there does not seem to be an 'official' interface to do such things, or at least not that I am aware of. I guess the masking that is done in ioremap_prot is kind of a 'protection' against badly written code that attempts to set the wrong flags However, for us this is in the way at the moment as the _PAGE_PSE flag has a different meaning on the SCC's cores. Probably the most clean way to fix this issue is to add the _PAGE_PSE flag to the _PAGE_CACHE_MASK , also in arch/x86/include/asm/pgtable_types.h

                        • 9. Re: Mapping the MPB in the kernel
                          darence

                          I might give it a shot today. I tried some code for accessing page tables directly but it simply didn't work, so recompiling the kernel might be less of a hassle... If it turns out to work it should definitely go upstream (I mean for the SCC Linux). In its current state this can simple be called a bug... But let's see how it goes first.

                          • 10. Re: Mapping the MPB in the kernel
                            darence

                            Still no luck. I added _PAGE_PSE to _PAGE_CACHE_MASK and recompiled the kernel. The caching problem disappeared, but I am getting the old bad performance. Maybe some additional configuration is necessary to enable the WC buffer?

                            • 11. Re: Mapping the MPB in the kernel
                              darence

                              Wait a sec... You say you used this line:

                               

                              mpb_addr = ioremap_prot(OWN_MPB, MPB_SIZE, 0);

                               

                              Shouldn't it have had the same cache invalidation problem I described? Because that is what I observe when I set no flags (with the original kernel). One thing that is different though is the physical address provided to ioremap_prot: you use OWN_MPB and I calculate the MPB address for that tile using MPB_X0_Y0 and the appropriate offset depending on the tile and core. Is it any different?

                              • 12. Re: Mapping the MPB in the kernel
                                darence

                                Ok, here it is. After some very good explanations that Werner Haas gave us in Potsdam, it took like 15 minutes to fix the bug. Apart from the chages to the mask discussed above (adding _PAGE_PSE), there were some small modifications necessary to a switch statement in __ioremap_caller, which assumed the old incomplete mask. Now everything is mapped as it should (as far as I can see). I'll submit the patch to Bugzilla.

                                • 13. Re: Mapping the MPB in the kernel
                                  mwvantol

                                  Ah, that is great news! It indeed looked a bit more complicated in the newer kernel version where more explicit checks were done on the flags in the __ioremap_caller function. It was nice to meet you at MARC4.

                                  1 of 1 people found this helpful
                                  • 14. Re: Mapping the MPB in the kernel
                                    darence

                                    It was nice meeting you and Roy as well!

                                     

                                    Finally, here's the patch:

                                    http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=365

                                     

                                    Works for me, but let's see what the Intel guys will say. Meanwhile you can check if it works on the TV in your room, if still there.