Monthly Archives: March 2010

Linux – How to store sar output on mysql database.

>This approach will store sar information on mysql database. This script is ongoing process which will be enrich by the time.

Currently this script is serving the purpose of the following things.

#Complete sar information [ sar -u ]
#Network_statistics_information [ sar -n DEV ]
#Process Queue Length and Load Avg [ sar -q ]
#Memory & Swap Space Utilization Statistics [ sar -r ]

Database information

GRANT ALL PRIVILAGES ON systat.* TO ‘sysstat’@’localhost’ IDENTIFIED BY ‘sysstat123’;

CREATE TABLE sarinformation
hostname VARCHAR(20),
datestamp VARCHAR(15),
time VARCHAR(8),
timeformat VARCHAR(2),
cpu VARCHAR(3),
pct_user DECIMAL(10,2),
pct_nice DECIMAL(10,2),
pct_system DECIMAL(10,2),
pct_iowait DECIMAL(10,2),
pct_steal DECIMAL(10,2),
pct_idle DECIMAL(10,2)

CREATE TABLE net_stat_info
hostname VARCHAR(20),
datestamp VARCHAR(15),
time VARCHAR(8),
timeformat varchar(2),
iface VARCHAR(7),
rxpck_persec DECIMAL(10,2),
txpck_persec DECIMAL(10,2),
rxbyt_persec DECIMAL(10,2),
txbyt_persec DECIMAL(10,2),
rxcmp_persec DECIMAL(10,2),
txcmp_persec DECIMAL(10,2),
rxcst_persec DECIMAL(10,2)

CREATE TABLE queue_len_n_load_avg
hostname VARCHAR(20),
datestamp VARCHAR(15),
time VARCHAR(8),
timeformat varchar(2),
runq_sz int,
plist_sz int,
ldavg_1 DECIMAL(10,2),
ldavg_5 DECIMAL(10,2),
ldavg_15 DECIMAL(10,2)

CREATE TABLE mem_n_swap_space_utilization_stat
hostname VARCHAR(20),
datestamp VARCHAR(15),
time VARCHAR(8),
timeformat varchar(2),
kbmemfree int,
kbmemused int,
Per_memused DECIMAL(10,2),
kbbuffers int,
kbcached int,
kbswpfree int,
kbswpused int,
Per_swpused DECIMAL(10,2),
kbswpca int)

Created a script as below


#Author Satyendra Singh
#This script will store sar data on database
#This script will take the latest sar file i.e sar23 and put the data on sysstat database;
#The latest sar file i.e sa23 will be copy on temporary location /tmp/x and after parsing sar file will be deleted from temporary location.

# This is highly recommended that, DO NOT PUT ANYTHING ON TEMPORARY LOCATION /tmp/x

SAR=`which sar`
HEAD=`which head`
AWK=`which awk`
COPY=`which cp`
LATEST_DATA=`ls -tlr $SAR_LOG | tail -1 | awk ‘{print $9}’`
for file in `dir $WORKDIR`;
DATESTAMP=`$SAR -f $file | $HEAD -1 | $AWK ‘{print $4}’| $AWK -F ‘/’ ‘{print $3″-“$1”-“$2}’`
#Complete sar information
$SAR -f $file | sed “$ d” | tr -s [:blank:] | sed -n ‘1h;2,$H;${g;s/ /,/g;p}’ | sed ‘/Average:/ d’ | sed “s/^/$HOSTNAME,$DATESTAMP,/” | sed ‘$d’ > “$FMATDIR”/”$file”-cpuutilization.csv
$SAR -n DEV -f $file | sed “$ d” | tr -s [:blank:] | sed -n ‘1h;2,$H;${g;s/ /,/g;p}’ | sed ‘/Average:/ d’ | sed “s/^/$HOSTNAME,$DATESTAMP,/” | sed ‘$d’ > “$FMATDIR”/”$file”-network_statistics_information.csv
#Process Queue Length and Load Avg
$SAR -q -f $file | sed “$ d” | tr -s [:blank:] | sed -n ‘1h;2,$H;${g;s/ /,/g;p}’ | sed ‘/Average:/ d’ | sed “s/^/$HOSTNAME,$DATESTAMP,/” | sed ‘$d’ > “$FMATDIR”/”$file”-queue_length_and_load_avg.csv
#Memory & Swap Space Utilization Statistics
$SAR -r -f $file | sed “$ d” | tr -s [:blank:] | sed -n ‘1h;2,$H;${g;s/ /,/g;p}’ | sed ‘/Average:/ d’ | sed “s/^/$HOSTNAME,$DATESTAMP,/” | sed ‘$d’ > “$FMATDIR”/”$file”-memory_and_swap_space_utilization_statistics.csv

for file in `dir -d *`;
/usr/bin/mysql -usysstat -psysstat123 -h localhost -D sysstat -e “LOAD DATA LOCAL INFILE ‘$PARSEDIR/${file}’ INTO TABLE `echo

$file | sed ‘s/.csv//g’ | awk -F_ ‘{print $2}’` FIELDS TERMINATED BY ‘,’ IGNORE 1 LINES;”

rm -fr $FMATDIR/*.csv
rm -fr $WORKDIR/sa*

queue_len_n_load_avg Table output

| hostname | datestamp | time | timeformat | runq_sz | plist_sz | ldavg_1 | ldavg_5 | ldavg_15 |
| satya | 2010-01-25 | 12:00:02 | AM | 0 | 0 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 12:10:01 | AM | 0 | 197 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 12:20:01 | AM | 2 | 199 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 12:30:01 | AM | 2 | 199 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 12:40:01 | AM | 1 | 198 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 12:50:01 | AM | 2 | 199 | 0.00 | 0.02 | 0.00 |
| satya | 2010-01-25 | 01:00:01 | AM | 2 | 199 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 01:10:02 | AM | 2 | 199 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 01:20:01 | AM | 1 | 198 | 0.03 | 0.01 | 0.00 |
| satya | 2010-01-25 | 01:30:01 | AM | 0 | 197 | 0.00 | 0.01 | 0.00 |
| satya | 2010-01-25 | 01:40:01 | AM | 0 | 197 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 01:50:01 | AM | 1 | 199 | 0.00 | 0.00 | 0.00 |
| satya | 2010-01-25 | 02:00:02 | AM | 2 | 199 | 0.00 | 0.00 | 0.00 |

VMware – Some VMware Q&A based on discussion

>Some VMware Q&A based on discussion

This is continuous updating process which will be enrich with more details by the time. This is just an initiative which is taken into consideration from the hard work of one of the best friend Prabhjit Singh.

****Hats off to Prabhjit Singh for this wonderful job. ***

Q- .What are the issues you have faced and resolved in your experience?

A. Not able to ping VM’s after migrating on new Host.

Resolution: Enable trunking on ports connected to Hardware Switch or checked if it has been migrated on same portgroup.

B.Not able to backup using the script incorporated with vmware-cmd command (Getting error message while removing the snapshot)

Resolution: We added #!/bin/bash in the starting of the script which act like a interpreter and also provide the complete path of vmware-cmd command i.e /usr/bin/vmware-cmd.

Q.What is Vm-Support?

VMware has a service support script named /usr/bin/vm-support, which gathers up all the relevant files required to support and to debug your ESX server.

The resulting files is named: esx–.tgz, and it contains a couple of hundred files that support will use to debug your system. This is almost the ESX equivalent of a support forced blue screen in Windows.

Note – ESX Server 3.x includes a version of vm-support newer than 1.14.

vm-support is script creates debugging information about the server. vm-support has three main uses: gathering general debugging information, gathering performance information, gathering information about a specific virtual machine.

Q.What is the difference between ESX 2.x and ESX 3.x?

VMFS3 format supported

64 bit Operating System Support on VM

VM format from VM2 to VM3 (The VM3 format is is used by virtual machines created with ESX Server version 3. VM3 enhancements include improved snapshot support, and support for new hardware).

NAS and iSCSI Support

VMware DRS

VMware HA

Hot-Add Virtual Disk

Q- What are all the VMFS metadata files.

These are the following metadata VMFS files which keeps the VMFS cluster file system information. These files basically store the Luns information and used to help while upgrading VMFS or recovery from crash.

.fdc.sf – file descriptor system file
.sbc.sf – sub-block system file
.fbb.sf – file block system file
.pbc.sf – pointer block system file
.vh.sf – volume header system file

Q.What is Vmnix?

VMnix is a customized linux kernel based on the Redhat 7.2 distribution. Specific kernel options are specified in VMnix that optimize the console operating system for running virtual machines. Various scripts are initialized in the startup process of the console operating system which call and load the VMkernel when the system is booted into runlevel 3. Once the system has completely loaded the VMkernel, resource control and virtual machine management is passed from the COS to the VMkernel.

Note- Vmnix is service console modules loaded in kernel to interact with vmkernel module to manage this hyperwisor-1 server with unix like console which is not present in ESXi

Q.What is Vmware core dump?

ESX requires additional partitions on local storage in addition to the console operating system partitions. The first of these partitions is a 100MB Core Dump partition. In an instance where the VMkernel crashes, it will dump down a core dump and log file to this partition, which can be extracted and sent to support for troubleshooting.

Note- This /vmkcore partition which store crash dump are represented by ‘fc’ filesystem type. ‘The ‘fb’ filesystem type is elaborated by ‘VMFS’ file system i.e Luns etc

Q.What is Vmkernel?

The VMkernel performs a number of functions, but one of the main jobs it has is to manage the interaction of virtual machine hardware and the physical hardware of the server. It acts as the “go-between” for scheduling resources for VMs on an as needed and as configured basis.

Q.What is Service console?

Service console is the management interface used to manage your ESX server. It’s a customized version of Redhat 7.2 and is run by the vmnix kernel module. The service console lets you touch and interact with it directly, and it allows access to modify configurations and manage the environment.

Q. What is the version of Linux kernel in ESX?

VMnix is a customized linux kernel based on the Redhat 7.2 distribution.

Q.What’s the location of log files on ESX server?

/var/log ( But you can customized it)

The next partition that we recommend be created is the /var operating system. This mounts underneath the root file system and is the default directory for all log files. In addition, this directory is used for temp files when creating unattended install scripts via the VMware Web Interface. Creating a separate partition for log files prevents a troublesome system from filling up the entire root partition with error logs, which would prevent the system from booting. If the /var partition fills, the system will still be bootable, and the directory may be cleaned up or the logs may be reviewed to determine the cause of the errors. Generally 250MB should be sufficient for log files, as VMware regularly rotates log files to prevent the /var directory from filling, but the additional 750 MB is allocated in case you decide to use the unattended script creation process.

Q. What is memory ballooning?

Ballooning is an automatic process and is used when memory resources are low. Memory ballooning allows you to have your guest dynamically change it’s memory usage by removing unused memory during runtime. To ‘use’ ballooning you must have the VMware Tools installed on your guest OS.


info balloon

balloon 400

This command will request your guest machine to change it’s memory allocation to the specified amount in MB which is 400MB.

Q- What is the use memory ballooning for a VM?

It reduces the impact on your guest OS regarding memory usage on your host by giving up unused memory back to the host.

Q. What boots first, VMkernel or Service console?

service console is a first to boot.. later VMkernel takes control

Q. What is the min memory requirement for Service console?

Service console reserves 272 MB of Ram by default

Q- How much max memory can be assigned to Service console.

800 MB

Q -Minimum memory requirement for installation of esx srver 3.5?? 1 GB

Max memory supported is 256 GB

Q-Minimum no. of processesors reuired for ESX 3.5 installation


Q-Minimum size of VMFS-3 FILE SYSTEM???

1200 MB or 1.2GB Luns Size

Q. What are the partitions required for installation of ESX server?

/boot – Change from 100MB to 384MB to allow extra space for any future ESX upgrades.

Swap: Change from 544MB to 1600MB, this should be twice the amount of memory that is dedicated to the Service Console. The default amount of memory devoted to the Service Console is 272MB which is why the default swap size is 541MB544MB. The recommended amount of memory for the Service Console is the maximum of 800MB which would require a 1600MB swap partition.

/var/log: Change from 2GB to 5GB to allow for extra space for log files.

Additionally you can create the following additional partitions to further segregate your drive to help protect the root directory from filling up.

/home – Create a partition of 2GB for any home directories that are created for local users on the ESX host.

/tmp – Create a partition of 2GB for the directory that is used to store temporary files.

/var – Create a partition of 4GB for the directory that is used to hold administrative log and configuration files.

Q- How to enable the root login in the ESX server which is OFF by default?

Go to the service console on the physical server and log in

vi /etc/ssh/sshd_config

Change the line that says PermitRootLogin from “no” to “yes”

Do service sshd restart

Login into Vmware ESX server as a normal user using Putty or windows command prompt if you have Cygwin installed with ssh client or you can login as root into VI3 console directly

Q- Service console has more CPU utilization. How can we check that from command line

esxtop – display ESX Server resource utilization statistics

Q. What is snapshot?

A snapshot is a picture of your system at the time the snapshot is taken. Think of it as an image of your computer’s hard drive. Besides just the data on the hard drive, the VMware configuration for that virtual machine and the BIOS configuration are also saved when you take a snapshot.

The snapshot files that are created contain only the changes that have occurred to the virtual machine since the snapshot was taken. Thus, over time, the snapshot files will grow as the machine is used more and more.

Q-. What Snapshot files are created?

When a snapshot is created a number of files are created in the directory for that virtual machine.

• -SnapshotX.vmsn (Where X is the number of the snapshot taken) This file stores the state of the virtual machine when the snapshot was taken.

•-SnapshotX.vmem (Where X is the number of the snapshot taken) This file stores the state of the virtual machine memory when the snapshot was taken.

-nnnnnn.vmdk (where nnnnnn is the number of the disk image, not corresponding to the snapshot number) These are log files which store changes to the virtual machine, since snapshot was taken. There may be many of these files over time

Linux – How to configure Nagios Failover with NSCA & Crontab

>This Nagios Failover Configuration monitoring tools configured here to monitor the services on every 5 seconds and send out the alert based on service i.e. ssh, http etc.
NSCA Installation
-Nagios should be previously installed and configured -External commands should be enabled and configured for Nagios previously -Master Nagios server and slave Nagios server should be identical in Nagios terms. Whenever you made any changes on Master Nagios server, Kindly sync with Slave Nagios server.

Getting the source

The first step would be to get the source code for NSCA. You can get this from any Sourceforge mirror. I will list the mirror that I use but you can find this by searching the Sourceforge website. You can download this either using an internet browser, or if you are using a machine without a GUI or test based browser, you can retrieve the file using the wget command: [root@localhost] wget


After you have downloaded the compressed source, you need to unpack it to make it useable. Move the file to a working directory, perhaps your home directory: [root@localhost] tar xzf nsca-version-number.tar.gz The xzf are options for unpacking gzip files. x stands for extract, z for gzip type, and f to specify the filename.

Creating the binaries

To create the binaries for the NSCA add-on you is simple assuming you already have a C compiler such as gcc or preferably g++ installed. First run the configure script located in the base directory $DOWNLOADPATH$/nsca-2.6 with no arguments passed. After the script ends, assuming the computer met all requirements and finished without error, you can then compile with make.
[root@localhost] make all

This should proceed to create two binaries and some daemon scripts. The most important of these for us is the nsca binary.

NSCA Installing

After compiling, we will put the compiled binary and the sample nsca config file in their respective nagios locations. This is not necessary, as it can run from any location as long as the config file is correct, but for the sake of clarity we will put the nsca binary in nagios’ bin directory and the config file in nagios’ etc directory.
[root@localhost] cp $DOWNLOADPATH$/nsca-version-number/src/ncsa $NAGIOSHOME$/bin
[root@localhost] cp $DOWNLOADPATH$/nsca-version-number/sample_config/nsca.cfg

Configuring NSCA

The next step would be to configure the NSCA daemon. The config file is actually very selfexplanatory.

And if you have followed the documentation on the Nagios website to setup Nagios and external commands, the paths should all be right. Assuming you want all the default options, there is no configuration necessary here. The only options that I set are the server address and debug options.

The server address option lets you specific an IP to bind to. This is used when there is more than one network interface card. It allows the daemon to determine which interface it should monitor by choosing the IP address on that network segment. To do this, uncomment the following line and enter that interface’s IP address on the network segment you wish to monitor. server_address=192.168.1.z # My local IP address If you have only one network interface, then this is not necessary. Second, the debug option is useful for logs. The NSCA daemon writes it logs to the standard syslog facility, so you can usually find messages in /var/log/messages. Enabling the “debug” option in the NSCA daemon config file, will cause more verbose information to be written to the logs. This is especially useful to see if the packets are being received at all. But I would enable it in any case simply to have a log of what actually comes through even if it is not interpreted by Nagios. To enable debug, change the 0 to 1 in the following line:


Finally, we need to set the permissions for the cfg file. After saving and closing the nsca.cfg, set the owner and group to whatever you are running user you are running nsca as. Second, add read permissions for the rest of the group.
Once you are through the testing phase, it is highly recommended that you use a password with NSCA. Remember you must enter the same password in BOTH the nsca.cfg as well as your client (most likely send_nsca.cfg).
[root@localhost] chown nagios.nagcmd /usr/local/nagios/etc/nsca.cfg
[root@localhost] chmod g+r /usr/local/nagios/etc/nsca.cfg

Running NSCA

The next step would be to start up NSCA. Keep in mind that at this point, although it will be passing information to Nagios, Nagios doesn’t yet have any service to process this information, so it will simply throw out the data as irrelevant. This is where having debug information show up in our syslog is useful. So we will call the executable nsca and provide it with the location of its conf file.
[root@localhost] /usr/local/nagios/bin/nsca –c /usr/local/nagios/etc/nsca.cfg
NSCA should now be running. Now to test it, we have two options. First of all, I like to port scan, so I scan the machine to see if the port specified (if left as default 5667) is open. This indicates that the program is in fact running and ready to receive information. If the port isn’t found open then you may have some other issue such as a firewall (iptables?) blocking it. After determining that NSCA is running and accesable, we can try to send it some data. We can try from the same machine, or from another host using the send_nsca binary that was compiled at the same time we compiled nsca. There are also plenty of third party software titles that incorporate send_nsca that you can use. I’ll show an example using send_nsca from the local machine. Assuming you haven’t yet put any password on the nsca host yet, we don’t need to configure anything for send_nsca. Send_NSCA by default uses tab delimited format, since often we cannot enter tabs if in a GUI command prompt, the work-around is to create a file containing our packet to send and pipe it to send_nsca. The format for a service check packet using NSCA is [tab][tab][tab][newline]. So create a normal text file named test with the following: localhost TestMessage 0 This is a test message. [return] Save the file and then run send_nsca.
[root@localhost] $DOWNLOADPATH$/nsca-version-number/src/send_nsca localhost –c
Host Name: ‘localhost’, Service Description: ‘TestMessage’, Return Code: ‘0’, Output: ‘This is
a test message.’
May 23 15:46:49 localhost nsca[1731]: [ID 862360 daemon.error] End of connection…

Implementation Part

We require two identical configuration nagios servers where one will act as a master nagios server i.e 192.168.1.x and Second will overtake if Master nagios server has problem

Nagios servers.

Master Nagios Server – 192.168.1.x
Slave Nagios Server – 192.168.1.y

We have schedule a script called ‘’ with the following content
#/root/scripts/ > /dev/null
/usr/bin/printf “%st%st%st%sn” “$MY_HOSTNAME” “$2” “$return_code” “$4” | $SEND_NSCA -H $NAGIOS_SERVER_IP -c $SEND_NSCA_CFG
RETVAL=`echo $?`
if [ $RETVAL -ne 0 ]

/etc/init.d/nagios stop
/etc/init.d/nagios start
/etc/init.d/nagios stop

on 192.168.1.y to run in every minute based on ncsa query to 192.168.1.x and check
if Master nagios server is running properly.

If 192.168.1.x Nagios is not running, it will start Nagios service on 192.168.1.y and as soon as Nagios resumes service on 192.168.1.x, it will shutdown 192.168.1.y Nagios on the basis of crontab and start monitoring 192.168.1.x Nagios again.

Crontab entry.

*/1 * * * * /root/scripts/ > /dev/null > 2>&1

192.168.1.x Changes

1) you need to add the following entry in 192.168.1.x start section
/usr/local/nagios/bin/nsca -c /usr/local/nagios/etc/nsca.cfg ( see below in script file)
2) you need to add the following entry in 192.168.1.x stop section
/bin/ps -aef | grep -i nagios | grep -v grep | awk ‘{print $2}’ | xargs kill -9

/etc/init.d/nagios Entry

# chkconfig: 345 99 01
# description: Nagios network monitor
# File : nagios
# Author : Jorge Sanchez Aymar (
# Changelog :
# 1999-07-09 Karl DeBisschop
# – setup for autoconf
# – add reload function
# 1999-08-06 Ethan Galstad
# – Added configuration info for use with RedHat’s chkconfig tool
# per Fran Boon’s suggestion
# 1999-08-13 Jim Popovitch
# – added variable for nagios/var directory
# – cd into nagios/var directory before creating tmp files on startup
# 1999-08-16 Ethan Galstad
# – Added test for rc.d directory as suggested by Karl DeBisschop
# 2000-07-23 Karl DeBisschop
# – Clean out redhat macros and other dependencies
# Description: Starts and stops the Nagios monitor
# used to provide network services status.

status_nagios ()

if test ! -f $NagiosRun; then
echo “No lock file found in $NagiosRun”
return 1

NagiosPID=`head -n 1 $NagiosRun`
if test -x $NagiosCGI/daemonchk.cgi; then
if $NagiosCGI/daemonchk.cgi -l $NagiosRun; then
return 0
return 1
if ps -p $NagiosPID; then
return 0
return 1

return 1

killproc_nagios ()

if test ! -f $NagiosRun; then
echo “No lock file found in $NagiosRun”
return 1

NagiosPID=`head -n 1 $NagiosRun`
kill $2 $NagiosPID

# Source function library
# Solaris doesn’t have an rc.d directory, so do a test first
if [ -f /etc/rc.d/init.d/functions ]; then
. /etc/rc.d/init.d/functions
elif [ -f /etc/init.d/functions ]; then
. /etc/init.d/functions


# Check that nagios exists.
test -f $NagiosBin || exit 0

# Check that nagios.cfg exists.
test -f $NagiosCfg || exit 0

# See how we were called.
case “$1” in

echo “Starting network monitor: nagios”
su -l $Nagios -c “touch $NagiosVar/nagios.log $NagiosSav”
rm -f $NagiosCmd
$NagiosBin -d $NagiosCfg

/usr/local/nagios/bin/nsca -c /usr/local/nagios/etc/nsca.cfg

if [ -d $NagiosLckDir ]; then touch $NagiosLckDir/$NagiosLckFile; fi
sleep 1
status_nagios nagios

echo “Stopping network monitor: nagios”
killproc_nagios nagios -9
/bin/ps -aef | grep -i nagios | grep -v grep | awk ‘{print $2}’ | xargs kill -9
rm -f $NagiosLog $NagiosTmp $NagiosRun $NagiosLckDir/$NagiosLckFile $NagiosCmd

status_nagios nagios

printf “Running configuration check…”
$NagiosBin -v $NagiosCfg > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo “done”
$0 stop
$0 start
$NagiosBin -v $NagiosCfg
echo “failed – aborting restart.”
exit 1

printf “Running configuration check…”
$NagiosBin -v $NagiosCfg > /dev/null 2>&1;
if [ $? -eq 0 ]; then
echo “done”
if test ! -f $NagiosRun; then
$0 start
NagiosPID=`head -n 1 $NagiosRun`
if status_nagios > /dev/null; then
printf “Reloading nagios configuration…”
killproc_nagios nagios -HUP
echo “done”
$0 stop
$0 start
$NagiosBin -v $NagiosCfg
echo “failed – aborting reload.”
exit 1

echo “Usage: nagios {start|stop|restart|reload|force-reload|status}”
exit 1


# End of this script

Known Issue.-
Since This NCSA monitoring configuration failover based on per minute crontab setting so it may be sometimes if 192.168.1.x Nagios resumes service and 192.168.1.y Nagios is running then 192.168.1.y Nagios processes are killed next time when crontab reaches 1 minute cycle.