Most of the large IT shops are running their applications on computer systems with different CPU architecture. These different systems include not just RISC or IA, but also 32bit and 64bit processors. These are not exactly different CPU architecture all the times, but in most of the cases making a decision to move from one to the other present similar challenges. So why do we decide to move from one architecture to the other, and most importantly what do we need to take in to consideration when making such decision? For start I will just outline some of these challenges as we had to face them in high volume manufacturing area. No organisation that requires large scale computing environment will have same issues and challenges so there is no one solution for everyone. I do believe however that we all have to answer same set of questions before we start planning for migration, executing the plan and subsequently supporting our new environment.


When starting to think about migration to a different architecture make up a list of all these questions and start recording your answers. This will give you sufficient information to figure out if you really have to go through all this work, is it really worth it. And trust me, it is a huge amount of work you will have to go through…

 

  1. Why are we doing this? Try to answer what are your main drivers in simple statements. Nothing technical is required here, really trying to pinpoint your problem statement. It could be as simple as application is EOL on your current architecture, or on the other side it could be “it is a political” decision. Later one is usually easiest to justify and most of below is irrelevant for this type of reasoning.
  2. Is the application going to work on other architecture? Important piece here is to try and outline what is required for your application to be ported to new architecture, how much work is required. It may prove at the end that it is just too much work/cost and it is not worth doing it.
  3. Cost of ownership of new equipment? It is often forgotten by IT professionals that cost of buying new equipment is not the only cost you need to worry about. On-going service maintenance of the equipment can quickly add up to a sum that one never budgeted for, and the worst thing about it is that you have to pay it regularly for the lifetime of the equipment (I plan to write a separate blog on service contracts and maintenance, interesting subject in itself).
  4. Operational requirements post migration? This is where you need to answer series of questions on your on-going support. These are some of the questions you may want to ask yourself: is your operational support model going to have to change once you migrate to new equipment; Is support organisation sufficiently skilled to support new environment; Is the headcount adequate?
  5. How will the migration impact my day-to-day business? This is one of the most important implementation questions, once you already decided that the move to new architecture is going to happen. If your organisation runs an application (or more than one of them) that are not time sensitive and your business can survive happily with few hours of downtime then this is easy. On the other hand if you are running an operation that is highly time sensitive, you cannot afford prolonged downtime because your business will incur huge losses, this is where it is getting interesting and migration strategy is something you have to spend considerable amount of time preparing.

My experience here is in high volume manufacturing area, environment with multiple different CPU architectures running different operating systems supporting highly time sensitive applications. I will try and answer some of the above questions as we addressed them in our environment in the next few blogs, so stay tuned and I hope you will find it useful.

By Joe Sartini

As both and automation engineer and IT Automation manager for many years, I’ve both contributed to and monitored how many IT standard operating procedures [or SOPs] can introduce errors into a system. The challenge for many IT operations teams is how to eliminate the human induced errors and provide closed loop feedback systems to process developers on how to create and maintain more robust project insertions. Every IT engineer I’ve ever known has great intentions to make SOP changes flawlessly; however, we tend to find that a fair proportion of our operational incidents are as a result of human errors during the change process. Intel Factory Automation has strict change control procedures to help engineers through the change process and protect them from the human errors. Aside from all the change control processes that exist in many organizations, I believe, a key to success in this area is to automate as much as possible and where not feasible, is to utilize an automated checklist.

Let me give you an example to illustrate the issues which can be experienced by many IT orgs and a way to avoid or mitigate by putting more IT solutions in the manual processes that will always exist.

The Problem Suppose you have an engineer performing a standard server build or decommission, in each case your engineer would deem this to be a fairly straight forward task, and you as an IT org I’m sure have a documented standard operating procedure depending on hardware model and O/S rev, right?. The problem can arise when our engineers are multitasking on many projects at once, under time constraints. In their mind, the trivial server build/decom SOP needs to be completed before they rush to their next important meeting. So they’re in the data centre[DC] with no access to the SOP instructions unless they print it out or login to a PC in the DC to view it, needless to say, the engineer in a hurry and has performed this task many times in the past, will proceed on memory to perform that same task. However, assuming something has changed in the process since they last performed the build/decom or let’s say, nothing’s changed but they simply forget to perform a task, like, let’s say disabling a SAN switch port for the decommissioned server. Down the road the issue arises where we run into SAN switch port capacity problems which shouldn’t exist. It’s possible that an IT org needlessly purchases more switches to handle the perceived capacity problem or they have another engineer perform capacity analysis comparison of server assets versus active port usage to find that something doesn’t add up. More time gets needlessly spent trying to find the unused ports and disabling them since engineers in the past have forgotten to disable the ports through the Server decommission SOP.

One Solution From my experience as an IT engineer and manager, I focus on IT automated checklists for SOPs. Utilizing simple, easily configurable IT web based solutions, the IT manager/engineer can develop checklists for all your SOPs which require engineers to check the box using an online form which can be centrally tracked via standard/simple IT reports. In this case, the IT manager can monitor the completion/success of his SOPs via %PAS reports. Furthermore, the engineer knows that his name is tracked against the tasks with timestamps, so he/she is more inclined to follow the checklist and complete all tasks. The beauty of an online checklist is that the engineer can access it wherever they have an internet connection, e.g. LAN, Wifi, etc and can utilize any form factor device e.g. PC, laptop, MID, iphone etc. The IT manager can also easily run reports against the time it takes on average to perform each of their SOPs to help them with resource allocation per task and also feedback to development teams on TTM for new project insertions etc. In the example above, the engineer who was in a rush to a meeting and in the datacentre would have accessed their checklist via laptop, phone etc and clicked each box as they completed the task. Say for example he still forgot to de-assign the switch port, or more typically didn’t have time to complete all tasks in one visit to the DC. In this case the checklist would not be 100% complete and in the daily/weekly operational review they’d notice that this SOP in still in flight and would follow-up with engineer to complete the checklist as it’s all been centrally traced and closed loop until actual completion of all tasks.

Let’s take the pen & paper and human guess work out of our IT operations and use our IT skills to develop foolproof solutions to our daily routines. In this way, we’ll have a better chance of removing the human errors.  I’m sure you’d agree we need as much time as possible to handle the h/w & s/w errors that affect our operations availability & reliability.

For a sometime there has been no doubt that cloud computing offers many benefits for the traditional data center. For that reason most of the traditional data centers migrated to cloud computing architecture. In addition, it has become easier to migrate exiting servers to be part of a cloud. So why have all data centers not migrated?
There are some valid reasons why not, those including ROI (which will be discuss on the next blog) especially when we’re talking about production environment that have zero tolerance for downtime. In this blog I’ll talk about the risk and downtime
Here are some of the challenges we face when migrating production environment:


1. As I mention above – why migrate? Most stakeholders will reject that change; for them “if it works, leave it”. What action needs to be taken to satisfy their needs  after the change


2. Of course you do not migrate all servers, so which servers will you?


3. How do we do this migration transparent to the stakeholders? After all, we want the stakeholder to have same level of support….
4. How can we avoid downtime?


5. How can we prevent the migration from being the scapegoat for unrelated failures after the migration?


There are no clear answers to those question, but I’ll try to give some tips that can answer some of those, or at least give a direction
When someone wants to migrate existing production physical servers to be virtual, they should consider the following for planning design and implementation:


1. Note that virtualizing production servers is a major change, so consider in advance what to migrate, enroll your stakeholder in the process to understand business impact and get buy in.


2. Do not migrate every server by default. Choose well in advance the server to be migrate and avoid unwanted migrations. Start with the server criticality; with the application owners, you should define if the server is critical enough and should it get “personal” treatment and not be part of the farm


3. For the same reason as #2, check if to migrate server by resource utilization - in case the server utilization is too high up to the host resource capabilities, the host will probably host only that server, and there is no real reason for that.


4. When designing the virtual environment to host the migrated servers, leverage capacity planning process and understand the resource requirement for each application (capacity planning is a process to check overall capacity usage of physical servers). Although the capacity planning results with low resource utilization you must take into account the current resource of the servers and the server’s owner requirements. There might be a reason for the amount of physical resource, and we don’t want to have lack of resource in production server, not even for 1 minute.


5. As we’re talking about the production environment, we don’t want to be surprised, add to your plan the future growth of your factory and add resources accordingly. Check with your management the production forecast for the years ahead, and together with the servers\applications owner check future resource needs and design the virtual environment accordingly.


6. Although it shouldn’t be a consideration, note that the migration process will require system downtime. Although the migration can be done on-line (some operating system require a server restart) it’s preferable to have schedule server\application downtime for each migration. So understand with your management the possibilities for downtime and plan your migration accordingly.


To summarize, like every technology improvement, when we’re talking about production environment, we need to look at all considerations and find the answer to them.
I hope you find these tips helpful. Please share any tips you have or let me know of any additional concerns


Have a fun and safe migration

Filter Blog

By author:
By date:
By tag: