Friday, January 11, 2008

One thing to try if DPM does not work in ESX Server 3.5

I was experimenting with the new DPM (Distributed Power Management) feature in VMware Infrastructure 3 the other day, and was having some problems.

I had 4 servers in my cluster (DRS (Distributed Resource Scheduling) and HA were enabled, but DPM was NOT). The systems were:
2 Dual Socket Quad Core Intel
2 Quad Socket Quad Core Intel

They were all attached to the same iSCSI storage. I had one (1) VM created and running and it was able to VMotion between ALL machines. HA is setup so that it can handle 1 system failure. I manually had three of the servers enter Standby Mode and then Powered them back on through the VirtualCenter MUI. When I tried the fourth server, it failed to enter Standby. I enabled DPM in Automatic Mode, and nothing happened. I thought it must be something like it needed some time before it kicked off so I went to lunch. After coming back, all physical boxes were still powered on.

I left it overnight. The machines were still powered on. To get DPM working, I performed the following:
  • I disabled DPM
  • I, again, did the "Enter Standby -> Poweron" operation for ALL servers
  • This time it was successful for ALL servers
  • Lastly, I then re-enabled DPM

YEAH! At this point, once I enabled DPM, it sent first one server to standby, and then moved the 5 VMs off of the second server and set that one to standby.

Thank you to the DPM developers for all of their assistance. Also, an interesting tidbit from one of the developers:

...if DPM thinks some machines in the cluster can't come out of standby but others can, it can still work. It just will consider only machines that can come out of standby as candidates to evacuate and power down...

I hope this helps.

1 comment:

Steve Chambers said...

This would make a great proven practice - "My DPM configuration" or something - on :-)