VMware HP AMS service bug

Recently I ran into a problem with a HP DL380 G8 VMware ESX 5.5 host. It was not possible to logon trough SSH and we could not vMotion Virtual Machines anymore. It seemed like the SSH daemon stopped working. Starting the SSH deamon again did not work. In the vmkernel.log file the following error was displayed:

WARNING: Heap: 3058: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed. caller: 0x41802a2ca2fd

I discovered VMware KB2085618 article with the title “ESXi host cannot initiate vMotion or enable services and reports the error: Heap globalCartel-1 already at its maximum size.Cannot expand” which sounded exactly like our problem and seems to be caused by a memory leak in the HP-AMS service.

The KB article suggests to log in trough SSH and stop the HP-AMS service. The problem was the SSH daemon was not running anymore and could not be started.

We had to plan a maintenance window to reboot the ESX host. After the reboot we can connect trough SSH to the host and disable the HP-AMS service by following this steps:

  • Log in to the host using SSH.
  • Run this command to stop the HP service (does not persist on reboot): /etc/init.d/hp-ams.sh stop
  • Run this command to remove the VIB: esxcli software vib remove -n hp-ams
  • Reboot the host.

After removal the problem did not acquire again.