PDA

View Full Version : iowait


qxh
Sun 14th Mar '04, 10:13am
iowait appears to be taking up more and more of the cpu, sometimes it's only a few percent, others it's sixty to seventy percent. the cpu is usually maxed out. i have just rebooted it.

09:09:35 up 7 min, 1 user, load average: 0.67, 1.02, 0.62
242 processes: 239 sleeping, 2 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 2.7% 0.0% 1.1% 2.3% 0.9% 5.3% 87.2%
Mem: 1030556k av, 389472k used, 641084k free, 0k shrd, 25912k buff
283760k active, 47444k inactive
Swap: 2097136k av, 0k used, 2097136k free 163456k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
9846 nobody 15 0 7928 7928 4716 S 0.5 0.7 0:00 0 httpd
9885 nobody 15 0 7040 7040 4432 S 0.5 0.6 0:00 0 httpd
12 root 15 0 0 0 0 SW 0.3 0.0 0:02 0 kjournald
9838 nobody 15 0 8796 8796 5432 S 0.3 0.8 0:00 0 httpd
9844 nobody 15 0 8008 8008 5264 S 0.3 0.7 0:00 0 httpd
10147 nobody 15 0 6944 6944 4328 S 0.3 0.6 0:00 0 httpd
9826 root 15 0 1268 1268 828 R 0.1 0.1 0:00 0 top
9858 nobody 15 0 5876 5876 3288 S 0.1 0.5 0:00 0 httpd
9871 nobody 15 0 7368 7368 4188 S 0.1 0.7 0:00 0 httpd
10148 nobody 15 0 6216 6216 3560 S 0.1 0.6 0:00 0 httpd
10355 nobody 15 0 3520 3520 1528 S 0.1 0.3 0:00 0 httpd
10385 nobody 15 0 6000 6000 3528 S 0.1 0.5 0:00 0 httpd
1 root 15 0 472 472 420 S 0.0 0.0 0:03 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
4 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kswapd
5 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kscand
7 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
73 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
557 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
3776 root 15 0 544 544 468 S 0.0 0.0 0:00 0 syslogd
3780 root 15 0 428 428 372 S 0.0 0.0 0:00 0 klogd
8452 root 15 0 1116 1116 628 S 0.0 0.1 0:00 0 cupsd
8489 root 15 0 764 764 524 S 0.0 0.0 0:00 0 sshd
8503 root 15 0 604 604 488 S 0.0 0.0 0:00 0 xinetd
8512 root 15 0 904 904 696 S 0.0 0.0 0:00 0 antirelayd
8522 root 17 0 1824 1824 508 S 0.0 0.1 0:00 0 chkservd
8535 mailnull 15 0 780 780 552 S 0.0 0.0 0:00 0 exim
8539 mailnull 25 0 728 728 496 S 0.0 0.0 0:00 0 exim
8543 root 15 0 872 872 660 S 0.0 0.0 0:00 0 antirelayd
8606 root 15 0 17080 16M 548 S 0.0 1.6 0:00 0 spamd
8615 root 15 0 584 584 508 S 0.0 0.0 0:00 0 crond
8631 root 25 0 1104 1104 956 S 0.0 0.1 0:00 0 mysqld_safe
8837 mysql 15 0 18428 17M 1192 S 0.0 1.7 0:00 0 mysqld
8869 root 15 0 3072 3072 1096 S 0.0 0.2 0:00 0 httpd
8892 mysql 15 0 18428 17M 1192 S 0.0 1.7 0:00 0 mysqld
8893 mysql 15 0 18428 17M 1192 S 0.0 1.7 0:00 0 mysqld
8944 named 25 0 3388 3388 1468 S 0.0 0.3 0:01 0 named
8950 mysql 15 0 18428 17M 1192 S 0.0 1.7 0:00 0 mysqld
8953 mysql 15 0 18428 17M 1192 S 0.0 1.7 0:00 0 mysqld
8964 mysql 15 0 18428 17M 1192 S 0.0 1.7 0:00 0 mysqld

It's a P4 3.0Ghz with 1024mb RAM.

qxh
Sun 14th Mar '04, 10:14am
here we go:

14:16:51 up 14 min, 1 user, load average: 1.26, 0.94, 0.69
227 processes: 225 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 16.3% 0.0% 4.7% 1.5% 4.3% 72.9% 0.0%
Mem: 1030556k av, 586648k used, 443908k free, 0k shrd, 39768k buff
364884k active, 151160k inactive
Swap: 2097136k av, 0k used, 2097136k free 287200k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
9021 mysql 15 0 37844 36M 1232 D 4.1 3.6 0:01 0 mysqld
10868 nobody 15 0 7192 7192 4428 S 2.7 0.6 0:00 0 httpd
9869 nobody 15 0 8184 8184 4832 S 1.5 0.7 0:00 0 httpd
9879 nobody 15 0 7572 7572 4336 S 1.1 0.7 0:00 0 httpd
9884 nobody 15 0 7560 7560 4352 S 1.1 0.7 0:00 0 httpd
10767 root 15 0 5568 5568 2136 S 0.9 0.5 0:02 0 php
10547 nobody 15 0 8968 8968 5580 S 0.7 0.8 0:00 0 httpd
9871 nobody 15 0 9772 9772 6520 S 0.5 0.9 0:00 0 httpd
9876 nobody 15 0 8724 8724 6008 S 0.5 0.8 0:00 0 httpd
9887 nobody 15 0 9280 9280 6036 S 0.5 0.9 0:01 0 httpd
9889 nobody 15 0 8360 8360 5196 S 0.5 0.8 0:00 0 httpd
10134 nobody 15 0 8156 8156 4788 S 0.5 0.7 0:00 0 httpd
9133 mysql 15 0 37844 36M 1232 S 0.3 3.6 0:00 0 mysqld
9841 nobody 15 0 10936 10M 7584 S 0.3 1.0 0:00 0 httpd
9855 nobody 15 0 7928 7928 5236 S 0.3 0.7 0:00 0 httpd
9868 nobody 15 0 8416 8416 5388 S 0.3 0.8 0:00 0 httpd
9881 nobody 15 0 9464 9464 5896 S 0.3 0.9 0:00 0 httpd
9892 nobody 15 0 8588 8588 5236 S 0.3 0.8 0:00 0 httpd
9898 nobody 15 0 7076 7076 4496 S 0.3 0.6 0:00 0 httpd
10876 nobody 15 0 7940 7940 4584 S 0.3 0.7 0:00 0 httpd
8944 named 25 0 3160 3160 1240 S 0.1 0.3 0:02 0 named
8992 mysql 15 0 37844 36M 1232 S 0.1 3.6 0:00 0 mysqld
9144 mysql 15 0 37844 36M 1232 S 0.1 3.6 0:00 0 mysqld
9153 mysql 15 0 37844 36M 1232 S 0.1 3.6 0:00 0 mysqld
9163 mysql 15 0 37844 36M 1232 S 0.1 3.6 0:00 0 mysqld
9765 root 15 0 1280 1280 908 R 0.1 0.1 0:00 0 sshd
9906 nobody 15 0 8324 8324 5640 S 0.1 0.8 0:00 0 httpd
10738 nobody 15 0 8936 8936 5312 S 0.1 0.8 0:00 0 httpd
10862 nobody 15 0 3612 3612 1608 S 0.1 0.3 0:00 0 httpd
10863 nobody 15 0 3612 3612 1608 S 0.1 0.3 0:00 0 httpd
10884 root 15 0 1236 1236 828 R 0.1 0.1 0:00 0 top
1 root 15 0 472 472 420 S 0.0 0.0 0:03 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
4 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kswapd
5 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kscand
7 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
8 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
12 root 15 0 0 0 0 SW 0.0 0.0 0:05 0 kjournald
73 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
557 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
3776 root 15 0 544 544 468 S 0.0 0.0 0:00 0 syslogd

KrON
Mon 15th Mar '04, 4:26pm
I'm running into the same problem. We're using apache 2.0.48, php 4.3.4, turck-mmcache, and thttpd for static images.

The machines are Dual 2.8ghz xeons w/ 1gb of ram, and dual SATA disks. The maxclients settings are very low (between 50-75), because we only do about 12-15 hits per second on dynamic content.

Our load will spikes through the ROOF (like 150), and I'll have to stop apache to get it to calm back down again. When it spikes, I see:

cpu user nice system irq softirq iowait idle
total 1.0% 0.0% 0.3% 0.0% 0.0% 98.6% 0.0%
cpu00 0.7% 0.0% 0.3% 0.0% 0.0% 98.8% 0.0%
cpu01 0.5% 0.0% 0.3% 0.0% 0.0% 99.0% 0.0%
cpu02 2.3% 0.0% 0.3% 0.0% 0.0% 97.2% 0.0%
cpu03 0.4% 0.0% 0.2% 0.0% 0.0% 99.4% 0.0%


I've been troubleshooting this for a few weeks now, and I believe it's either that our disks are the bottleneck, or that ram is. I lowered the maxclients way down to keep the machines from swapping, but they are still spiking randomly.

I think that the database server backs up for a second, and apache starts blocking when it hits maxclients, and that is causing the iowait, but I don't know which route to go, scsi disks or more ram. We really don't hit the disks that hard, so I find it hard to believe that SATA is the problem on these webservers.

qxh
Mon 15th Mar '04, 4:42pm
RH ES?

http://www.webhostingtalk.com/showthread.php?threadid=229306

KrON
Mon 15th Mar '04, 6:08pm
RH ES?

http://www.webhostingtalk.com/showthread.php?threadid=229306

Yup. Piece of crap. I am compiling 2.6.4 as we speak. I just finished reading through that thread, and by the end I was kicking myself for relying on these crappy stock kernels (I usually roll my own). I thought to myself, well it IS enterprise, they probably have some good stuff rolled in there.

I'll let you know how it goes, but I don't envision any problems, I have several 2.6 servers and countless other 2.4 systems in production w/o problems.