4 Replies Latest reply on Feb 17, 2016 6:45 PM by mlandry

    DCMI V1.5 default of non-random backoff delay during DHCP discovery

    mlandry

      In the DCMI V1.5 specification, the non-random back-off delay is specified as the default for DHCP discovery.  Why was non-random selected as the default instead of random?  Random back-off delays on message retries seems like the better option for a DHCP server with many DHCP client as it would provide separation in the retry requests coming from the clients after a time-out.

       

      http://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/dcmi-v1-5-rev-spec.pdf

       

      The 840.00 release of the IBM Power systems is not in compliance with this one point of the DCMI V1.5.  I need to make a recommendation to the IBM Power team if a change should be made to move into compliance.  For this, I need to understand from the architect why the default was selected as "non-random" instead of staying with "random" which is the existing implementation in the DHCP clients.for RFC 2131`:

       

      "DHCP clients are responsible for all message retransmission. The

        client MUST adopt a retransmission strategy that incorporates a

        randomized exponential backoff algorithm to determine the delay

        between retransmissions."

       

      https://www.ietf.org/rfc/rfc2131.txt

       

      Message was edited by: Mike Landry

        • 1. Re: DCMI V1.5 default of non-random backoff delay during DHCP discovery
          stdalex

          Hi Mike,

           

          Agreed that random back-off delays are generally better in many installations. The non-random delay setting was chosen as the default

          configuration to simplify initial software testing and evaluation by providing a simpler and more deterministic value to sanity check and

          specify time-outs against. The setting is user-configurable, it can be switched to random as the user wishes.

           

          Since the setting is user configurable, I’m not sure where the issue lies other than it may be an additional parameter among the other parameters that the

          user would want or need to configure after receiving the system. If a system manufacturer has a customer that prefers that their systems arrive with a

          different initial setting, I see no spec conformance issue with manufacturers providing their systems that way, since that could be viewed the same as

          if the customer did it themselves.

           

          If you have more questions I would be happy to discuss this further with you.

           

          -Stewart Dale

          Intel(R) DCM Team

          intel.com/dcm

          • 2. Re: DCMI V1.5 default of non-random backoff delay during DHCP discovery
            mlandry

            Stewart,

             

            The IBM development team, when changing the FSP to be DCMI V1.5 compliant, failed to realize that the building block DHCP client code was an affected item for the change.  I could not find a RFC for the dhcp client that describes the changes for the non-random back-off mode so the DCMI V1.5 specification is the only place to find that information, correct?

             

            The IBM building block developer felt the wording describing the change was ambiguous.  Using two sentences to describe it would make it less awkward.  However, I relayed your intent to him and he has implemented the non-random backoff correctly so that it advances to the back-off maximum in a non-random fashion.

             

            Here is the wording in question regarding the multiplier:

             

            "byte 1 – DHCP Initial timeout interval, in seconds

             

            This parameter sets the amount of time between the first attempt to reach a server and the second attempt to reach a server. Each time a message is sent the timeout interval between messages is incremented by twice the current interval multiplied by a pseudo random number between zero and one if random back-off is enabled, or multiplied by one if random back-off is disabled.

             

            The recommended default is four seconds. "

             

            Here is the interpretation chosen:

             

            "byte 1 – DHCP Initial timeout interval, in seconds

             

            This parameter sets the amount of time between the first attempt to reach a server and the second attempt to reach a server. Each time a message is sent the timeout interval between messages is incremented by twice the current interval multiplied by a pseudo random number between zero and one if random back-off is enabled,.

            If random-backoff is disabled,  the timeout interval between messages is incremented by twice the current interval.

             

            The recommended default is four seconds. "

             

            The developer was not sure where to place the "or" in the first part of the sentence but once I had relayed intent for this feature as you described it to me, the correct interpretation became obvious.

             

            The current status of this feature and the IBM Power FSP (service processor for the P8 power systems) is that IBM has shipped the product with the setting in the random-backoff mode.  I believe this is the best setting for the Power systems customers, and does not break compliance with DCMI V1.5 because can be switched by the customer as needed.  That last part needs to be true for the next service pack for the system.  Currently, the DHCP client in the product has no method of doing "non-random" backoff.  So the service pack will have two affected items:  1) DHCP client with capability to switch between non-random and random.  2) DHCP configuration file changes to control the mode based on the DCMI V1.5 ipmi command that has been run.

             

            The MCP building block team for the DHCP client was not following a RFC for the change and was using only the information in the DCMI V1.5 spec.  The short-coming of this approach was that the parameter to control the non-random mode was not defined in the DCMI spec and had to be invented by the developer as far as the name of the flag for the configuration file.

             

            "In the dhclient.conf file, add "random-backoff true" or "random-backoff on" to enable the random backoff. "random-backoff false" or "random-backoff off" should disable random backoff."

             

            My concern is every DHCP  client developer is going to implement the flag with a different name.  Below I will share with you the new logic and the new flag name and would appreciate your comments.  I think my team is breaking an industry protocol here by not following a DHCP client RFC for making the changes.  In this situation, should the IBM team be authoring a new RFC so that a standard name for the parameter in the DHCP configuration file is used?

             

            This work is in  LTC bugzilla 133908 if you have access.

             

            "

            ...

            Here's an RPM for you to test.

            /ausgsa.ibm.com/gsa/ausgsa/projects/m/mcp-fsp/shared/fld85/dhcp-133908-212016.tar

             

            Here is the bulk of the logic:

              if (increase) {

              if (!client->interval)

              client->interval = client->config->initial_interval;

              else

              if(client -> config -> random_backoff)

              client->interval += random() % (2 * client->interval);

              else

              client->interval += client->interval;

             

              /* Don't backoff past cutoff. */

              if (client->interval > client->config->backoff_cutoff)

              if(client -> config -> random_backoff)

              (client->config->backoff_cutoff / 2) + (random() % client->config->backoff_cutoff);

              else

              client->interval = client->config->backoff_cutoff;

              } else if (!client->interval)

              client->interval = client->config->initial_interval;

             

              log_info ("Random backoff is %d, and the backoff was %ld", client->config->random_backoff, (long)client->interval)

             

            In the dhclient.conf file, add "random-backoff true" or "random-backoff on" to enable the random backoff.

            "random-backoff false" or "random-backoff off" should disable random backoff.

             

            I added a log message to output the state of the random backoff toggle and what the current interval is. I figured this would help while this bug is open. Let me know if this causes issues while testing.

             

            There is some ambiguity here: "Each time a message is sent the timeout interval between

            messages is incremented by twice the current interval multiplied by a pseudo random

            number between zero and one if random back-off is enabled, or multiplied by one if random

            back-off is disabled."

            I took this to mean that if random-backoff is off, then add the interval to itself. It could also be interpreted to mean that you only multiply the interval by 1 and not actually increment it at all? This would mean that it will be stuck in an infinite loop until there is a response, as it will never hit backoff cutoff. If that's the implementation that you want, let me know and I'll change it.

            ...

            "

             

            Regards,

             

            Mike Landry

             

             

             

             

            • 3. Re: DCMI V1.5 default of non-random backoff delay during DHCP discovery
              allan_intel

              Discussions about Data Center, please refer at: The Data Stack

               

              Allan.

              • 4. Re: DCMI V1.5 default of non-random backoff delay during DHCP discovery
                mlandry

                Stewart,

                 

                The changes to the DHCP client to conform to the DCMI V1.5 in terms of the external interface, should the new parameter for "non-random mode" DHCP discovery be defined in the DCMI V1.5 spec or should it be defined in a RFC for the DHCP client?

                 

                If you think the RFC approach is best, I can engage the IBM team so they write this document to define the new DHCP configuration file parameter.  But if you want to define it in the DCMI V1.5 spec, that would be acceptable also.

                 

                Regards,

                 

                Mike Landry