Intel’s Data Center Strategy is on track to create $650 million of value for Intel’s business by year 2014.  By adopting the latest generation of Intel® Xeon® processors, and deploying advanced storage, networking and facilities innovations, we have realized 35 percent of those savings already.

 

After working in Intel's Server Group for many years as a end user product marketing manager with IT customers to understand how they use technology to create business value, it was cool to be able to take the plunge into the world of IT@Intel.

 

With over 100,000 servers to manage across 97 data centers the complexity inside Intel IT is pretty dramatic.  However, once the purpose of the data centers is understood (Intel IT has four general business areas that data centers support: Design, Office, Manufacturing, Enterprise), it is clear that a top line strategy is required to balance investment, streamline operations to maximize efficiency.  Because technology, business needs and economic times change (often very rapidly), that strategy has to be dynamic as well.

 

This video series produced by the IT experts who were willing to tell the story behind the key elements of the Intel IT Data Center Solutions:

 

  • Strategy
  • Facility
  • Compute
  • Network
  • Storage
  • Video Tours Inside four of our data centers

 

Explore these resources and I hope you find it half as interesting as I do.

 

Chris

Hi all,

 

I've just recently changed jobs from the client world (Intel vPro Technology) to the Server domain at Intel.  One of the first exciting platforms I've been able to check out is coming out today and I wanted to take a moment and share this out.   This new platform is from Lenovo, it's a TS200v (shown below)

TS200v Image.jpg

 

Here is what I found late last night on PC mag - http://www.pcmag.com/article2/0,2817,2360212,00.asp talking about the features & capabilities.  For me this answers a question that I have received for the last 5 years.  "When is AMT going to hit workstations/servers?"  Well the time is now, the time is today.

 

Check out Lenovo's post - http://shop.lenovo.com/SEUILibrary/controller/e/web/LenovoPortal/en_US//special-offers.workflow:ShowPromo?LandingPage=/All/US/Landing_pages/ThinkServer/08/Introducing

 

If you want to read more on AMT 6.0, here's the developer link that talks about the features - http://communities.intel.com/community/openportit/vproexpert/blog/2010/02/04/intel-amt-60-new-features

 

However nothing's complete with out a video..   Check out Jason D & I scoping this out in my new lab..

 

 

 

BOOYAH!!!

 

Oh wait. it's not a Workstation..  It's a SERVER!!!!

JGreene

Crashing the AntiVirus Party

Posted by JGreene Feb 19, 2010

Another week, another headline involving a significant data breach.  This week’s leading headline is another fine example of the power of malware as a tool to burrow in to individual and corporate systems to harvest  information.  The case outlined here show a broad-based stealth attack that went on for 18 months and impacted hundreds of companies by capturing data, transaction records, passwords and more!  This continues to build on the trend of cybercrime:  Gone are the days of smash-and-grab one-time break-ins.  Today’s cybercrooks like to break in quietly and find a nice, comfortable way to “stick around” in the infrastructure looking for opportunities to gather data or perhaps subvert control of the system for other uses (such as spam generation or denial of service).  Obviously, this can be a massive problem.  And it is proving lucrative.

While the intentions and modus operandi of cybercriminals are evolving, the problem of malware is not new.  The industry has been fighting viruses and the like for nearly as long as there have been computers.  In recent years, the problem has grown to epidemic proportions as access to technology and worldwide connectivity (thank you Internet) has grown.  Sure the growing abundance of malware is feeding the problem.  Recent estimates by companies such as Symantec are now estimating that there is more malicious software developed than “good” software. This is driven partly by profit, as I noted, cybercrime can be lucrative thanks to a robust underground market. It is also significantly driven by the ease of finding unwitting victims to both the simple scams (“Hi there Mr. Minister from some remote country offering me a share of millions of dollars”) to the elegant and sophisticated social engineering using hacked or spoofed email accounts or undetectable “drive-by downloads” .  There is no question that people are a huge part of the problem, but only part of the problem.

The other part of the problem is the way we try to implement the solution today.  Most companies count on antivirus as their primary tool to stop the malware threat. It works on a principle of blacklisting. Once a virus (or many other types of malware) is found by the many researchers of the antivirus community, it is analyzed and a “signature” for that virus is developed.  The virus signatures are added to a database that is the underpinnings of the antivirus software packages.  When you are “updating your antivirus software” you are usually getting updates to the signature database so that your local engine can recognize the latest. The AV software monitors code on your system and will quarantine or block any of these signatures that try to execute on the system—hence the term blacklist. This has been relatively effective for a long time, but the barbarians at the gate are gaining an edge.  There are two big factors working against the future success of this approach

The first is the delay between the release of the malware and the detection, analysis, signature development and distribution/update of the signature.  This delay can be quite long, depending on how often one updates their antivirus.

Second, and perhaps more challenging is that malware developers know this process, and have made themselves a much more formidable foe through the use of some new techniques.  The first one is encryption: malware can have portions of its code encrypted making it impossible or difficult for the antivirus community to analyze. Another challenge is that some new malware has the ability to “self-morph” or randomly alter its code structure so that even once it is analyzed it can change such that it will no longer match the signature developed for it.

As you would expect, this creates a major challenge for a model predicated on stopping “known bad” code.  All too often, we can’t know that the code is bad until it is too late.  This model is akin to the terror watch list used in the air transportation system.  Here they are looking out for proven or suspected persons that are a threat to the system.  This generally works and is probably the only sustainable method for working in such a public facility.  But when the threat can be most grave and the circumstances are altered, then different types of security evaluations are done. Consider the case of presidential or congressional events. Some recent high profile snafus aside, here the government uses a different method of control.  For such events, they take the opposite approach. Instead of only stopping “known bad” elements from access, the approach is to only allow “known good” persons to access the event, through the use of background checks, interviews and invitations.

This different model can generally be referred to as whitelisting. This concept is gaining some measure of favor in IT today as they struggle to keep up with the growing threats highlighted above. Whitelisting has its challenges too: after all, few companies could accurately list every piece of software or firmware that is allowable or in use in their enterprise.  The question is: how can one use this and still get value?  That is where Intel® Trusted Execution Technology (Intel® TXT) can help.

Intel® TXT provides the ability to create and enforce the evaluation of the launch environment of a platform.  It provides limited scope (launch time evaluation, not full run time and evaluates the environment only up to the launch of an enabled OS or Hypervisor) so that IT does not have to maintain a catalogue of all possible software. While this scope might sound limiting at first blush, the reality is that it provides valuable additional protection in two basic aspects. First, the protection focus here is indeed on the very foundational components of the system software stack—where damage could be greatest. Additionally, remember that runtime malware like AV only catches a certain percent of malware—even on the most up-to-date systems. Intel® TXT can help limit its damage by providing an opportunity to catch it upon re-launch of the environment. This is particularly helpful given the long-term nature of modern attackers.

With malware that is trying to gain root-level access and control of a system (such as a rootkit), a function like Intel® TXT is added value in that it is evaluating the launch environment and is able to detect when such software tries to launch before the OS or hypervisor.  Since this malware code will be in the flow, it will no longer match the “known good” configuration and it can be blocked form executing. Control of the system can be preserved. In our presidential event example, Intel® TXT would be the equivalent of Secret Service checking the approved invitee list and would prohibit uninvited and unscreened guests from crashing the event.

AntiVirus does work, and it is advisable to use it and to keep it current. But it is not enough.  It is becoming less effective over time, so it should be complemented by other techniques and tools to address the gaps in protection that are inevitable as technologies and use models evolve.  In the end: new use models allow new threats which require new defenses.  TXT helps meet that need for new defensive capabilities, providing a whitelisting-based model of platform evaluation and control at launch time that is a complementary layer to the blacklisting-oriented models already deployed.

In our prior blog entry we saw that a common technique to reduce lighting power costs in residential and commercial buildings is to turn lights off in unused rooms.  This concept is so widely accepted that we rarely give it a second thought, let alone challenge it.   If that’s the case, why has this concept not been applied to servers in a data center, blazing away, drawing electricity 24 by 7, 365 days a year even at times when there is no work to be done?  There are more extreme cases of “dead servers walking”, servers that are no longer associated with useful applications, but have not been unplugged.

 

Two approaches are commonly applied to reduce lighting power consumption in residential or commercial buildings: turning lights off and using dimming mechanisms.

 

Turning lights off yields the greatest power savings, assuming the room is not to be used.  There is still a small amount of residual power being drawn to power pilot lights or motion sensors to turn on the illumination if someone enters the room.

 

Dimming the lights reduces power consumption when a room is in use it is possible to reduce the illumination level while allowing people to occupy the room for the intended purpose.  For instance, illumination in certain areas may not be needed because mixed daylight is in use, zonal lighting on work areas is sufficient, or because the application calls for reduced lighting, such as in a restaurant or dining room.  Power saved through dimming will be less than turning lights off.

 

Similar mechanisms are available in servers deployed in data centers. Servers can be shut down and restarted under application control when not needed.  We call the action of shutting down servers for power management purposes server parking.  This is the equivalent of turning lights off in a room.  The capability for “dimming lights” in a server is embodied by the Intel® Enhanced SpeedStep® technology or EIST and Intel® Intelligent Power Node Manager technology or Node Manager.  EIST reduces power consumption during periods of low workload and Node Manager can cap power, that is, reduce power consumption at high workload levels under application control.

 

In tests performed at our lab, the 2-socket white box Urbanna server provisioned with Intel® Xeon® 5500 Series processors, 6 DIMMs and one hard drive have a power consumption of about 50 percent of the power consumption at full load, about 150 watts out of 300, this is when the effect of EIST.  If the server is working under full load, the 300 watts consumed at full power can be reduced by about 30 percent down to 210 watts or so. 

 

There is a “dimming” effect from power capping due to the voltage and frequency scaling mechanism used to implement power capping.  However, the tradeoff between performance and power consumption is more complex than the relationships in the lighting example. If the server is not working at full load, there may be enough cycles left in the server to continue running the workload without an apparent impact on performance.   In this case, the penalty is in the amount of performance headroom available should the workload pick up.  The solution to this problem is simple.  If the extra headroom is called for, the management application setting the cap can remove it and the full performance of the server becomes available in a fraction of a second.

 

There is also a richer set of options for turning off servers than there are for turning lights off.  The ACPI standard defines at least three states suitable for server parking: S3 (sleep to memory), S4 (hibernation where the server state is saved in a file) and S5 (soft off, where the server is powered down except for the circuitry to turn it on remotely under application control.)  The specific choice depends on hardware support; not all states are supported by a specific implementation.  It also depends on application requirements.  A restart from S3, if supported by the hardware, can take place much faster than a restart from S5.  The tradeoff is that S3 is somewhat more energetic than S5 because of the need to keep the DIMMs charged.

 

A widespread use of server parking is not feasible with traditional where a hard binding exists between the application components and the hardware host because bringing any of the hosts offline could cripple the application.  This binding gets relaxed for virtualized cloud environments that support dynamic consolidation of virtual machines into a subset of active hosts.  The sub-pool of active hosts is grown or shrunk for optimal utilization levels.  Vacated hosts are parked, the equivalent of turning lights off in a room, and as in the lighting example, once a server is in parked state the server can’t run applications.

 

Unlike branch circuits used for lighting where the workload is sized to never exceed the circuit’s capacity, branch circuits feeding servers may be provisioned close to capacity.  One possible application for Node Manager is to establish a guard rail for power capping to kick in if the power consumption gets close to the limit.

Virtualization and cloud computing bring costs down by enabling the reuse and sharing of physical and application resources  leads to a more efficient and higher degree of utilization for that particular resource.

 

Most IT organizations today are under enormous pressure to keep their budgets in check. Their costs are going up, but their budgets are flat to decreasing as illustrated in the figure below.  This is more true than ever in this period of economic and financial crisis. The situation is not sustainable and eventually leads to unpleasant conditions such as slower technology refresh cycles, reduced expectations for IT value delivered and layoffs. The service re-use inherent in cloud computing promises long lasting relief from the cost treadmill.

KTBR-Legacy.png

Conceptually, a portion of IT budgets is used to maintain existing projects.  It's the portion dedicated to maintain office productivity applications help desk or the organization that provides telephone services. This portion is important because is the part that “keeps the business running” (KTBR). In most IT organizations, the KTBR portion takes the lion’s share of the budget. The downside is that the KTBR is backward looking, and it’s only the leftover portion that can be applied to grow the business. There is another problem: the KTBR portion left unchecked tends to grow faster than IT budgets overall, and the situation can't stay unchecked forever.

 

 

A number of strategies have been used in IT organizations to keep the KTBR growth in check. Perhaps the most oft used in the past few years is the outsourcing of certain applications such as payroll and HR applications such as expense reports and the posting of open positions in the corporation.  When outsourcing (and perhaps off-shoring) is brought in, costs actually go up a notch as reorganizations take place and contracts are negotiated. Once the outsourcing plans are implemented costs may go down, but still have the problem of sustainability. Part of the initial cost reductions comes from salary arbitrage, especially when the service providers in lower cost countries.  Unfortunately the cost benefit from salary arbitrage tends to diminish with time as these countries advance technically and economically.

 

KTBR-outsourcing.png

 

A third alternative comes from technology refreshes as shown below.

 

KTBR-techrefresh.png

 

The introduction of a new technology, lowers the cost of doing business, seen as a cost dip. Costs can be managed through aggressive “treadmill” of technology adoption, but this does not fix the general uptrend, and not many organizations are willing or even capable of sustaining this technology innovation schedule.

 

Finally, the adoption of cloud computing will likely lead to a structural and sustainable cost reduction for the foreseeable future due to the synergies of reuse. As in the outsourcing case, there is an initial bump in cost due to the upfront investment needed and while the organization readjusts and goes through the learning curve.

KTBR-costreduction.png

Cloud computing reduces both capital and operational expenses through multiple factors:

 

  • Economies of scale: The service provider becomes an expert in the field and can deliver the service more efficiently at lower administrative costs than any other provider, possibly at a lower price than the cost of implementing the same service in house.  (OpEx)
  • The infrastructure is shared across multiple tenants. (CapEx)
  • Application software licensing costs are shared across tenants.(OpEx)
  • The environment is virtualized allowing dynamic consolidation.  Servers are run at the most efficient utilization sweet spot, and hence fewer servers overall are required to deliver a given capability. (CapEx)
  • The traditional IT infrastructure is highly siloed.  Once these silos are broken, there is no need to overprovision to meet peak workloads. (CapEx)
  • Expensive and slow capital procurement processes are no longer necessary. (CapEx)
  • The IT organization can defer server purchases and decommision data centers as in house capabilities are phased out in favor of cloud services (CapEx)

Network World just warned IT to prepare for tremendous network traffic during the SuperBowl.  Peak Demands happen routinely inside IT organizations and IT has to be ready.  In order to be ready, IT must prepare in advance by having a strategy to handle and manage peak demand.

 

Intel IT deals with peak demands inside our business constantly and this sizing paper talks about some of the impacts that govern our server sizing decisions each year.

 

Follow Intel IT on twitter

With the 1st quarter of 2010 upon us, Intel is very focused on the launch of their latest Xeon processor.  This newest large scale design, codenamed Nehalem EX, will increase compute horsepower, address huge RAM memory allocations, and include all sorts of advanced RAS features for high performance computing in the mission critical space.  Yet amidst all of this hoopla, we might forget that data center managers are still grappling with the fact that they are running out of power and cooling capacity.  The need to balance performance with power consumption continues to be a balancing act in data centers across the globe.

 

 

Luckily, Node Manager and Data Center Manager continue to be supported with this latest Intel processor and you are still able to set very sophisticated policies to control power use within a system, rack, row, or even the entire datacenter.  The main focus of this technology continues to be allowing higher densities based on capping power usage within the rack.  When we are capable of regulating power use, we can design data centers more efficiently.  We reduce over-provisioning and limit stranded power. Data center managers benefit from the following use cases:

 

  • Energy cost rebates from provider based on the ability of a customer to not exceed a specific power consumption level.

  • Reducing datacenter hot spots through policies that are triggered based on thermal sensors.  The reduction of processing frequencies reduces power consumption and reduces thermal output.

  • Increasing server density in a collocated environment.  The ability to cap power gives greater confidence of maintaining a per rack power cap where a customer is billed by the rack for compute resources and required to stay below a certain power maximum.

 

 

Recently I have been working with a cloud computing provider.  Based off work Intel has done around power capping using Node Manager and real internet workloads, we have seen compelling data that demonstrates power savings while still maintaining performance SLA’s around response time, latency, and query success rates that ensure a continued quality end-user experience.  Also, based off customer generated costs per watt calculations, savings from power capping was estimated as high as $100 USD dollars per server per year.  Take those savings and multiply them by the hundreds or thousands of servers in these data centers and the savings become very exciting. Has anyone created their own cost saving estimates for the reduction in cooling, floor space, based on increased rack densities, and reductions in stranded power, when server power use is decreased 1w, 5w, or 10+ watts? . . . Something to think about in our spare time while we wait for the launch of Nehalem

 

EX.

 

Mark

Filter Blog

By author:
By date:
By tag: