PDA

View Full Version : Request for optimisation - strange queries in slow log


kmike
Thu 13th Jan '05, 8:06am
We have a fairly big board (3100K posts, 53k users). It is running almost smoothly, except for one problem which plagues our db server: there're some queries in mysql slow log which are really shouldn't be there. For example, from yesterday's slow log:


# Time: 050112 20:34:11
# User@Host: xxxx[xxxx] @ [12.13.14.15]
# Query_time: 9 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
UPDATE LOW_PRIORITY user
SET lastactivity = 1105580042
WHERE userid = 28929;
# Time: 050112 21:48:11
# User@Host: xxxx[xxxx] @ [12.13.14.15]
# Query_time: 9 Lock_time: 0 Rows_sent: 3 Rows_examined: 12
SELECT filename, filesize, visible, attachmentid, counter, postid, IF(thumbnail
= '', 0, 1) AS hasthumbnail, LENGTH(thumbnail) AS thumbnailsize
FROM attachment
WHERE postid IN (-1,1236301,1236314,1236403,1236461,1236492,1236497, 1236533,1236914,1238763,1239156,1241394,1242044,12 44250,1244321,1244668)
ORDER BY dateline;
# Time: 050112 21:49:29
# User@Host: xxxx[xxxx] @ [12.13.14.15]
# Query_time: 16 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
SELECT title, data
FROM datastore
WHERE title IN ('options', 'forumcache', 'usergroupcache', 'stylecache', 'rankphp', 'smiliecache', 'bbcodecache', 'mailqueue', 'hidprofilecache');
# Time: 050112 23:03:11
# User@Host: xxxx[xxxx] @ [12.13.14.15]
# Query_time: 9 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
INSERT INTO session
(sessionhash, userid, host, useragent, idhash, lastactivity, location, bypass, styleid
,inforum, inthread, incalendar, badlocation
)
VALUES
('a4f9908f49230572f69aa82d8edae510', 941,
'198.81.26.43', 'Mozilla/4.0 (compatible; MSIE 6.0; AOL 9.0; Windows NT 5.1; SV1)', '8beaf3a82482022236939504e872c62b',
1105588982, '/forums/forumdisplay.php?f=11&page=2&sort=lastpost&order=&pp=25&daysprune=20', 0, 0
,11, 0, 0,0
);
# Time: 050112 23:03:12
# User@Host: xxxx[xxxx] @ [12.13.14.15]
# Query_time: 10 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
UPDATE LOW_PRIORITY user
SET lastactivity = 1105588982
WHERE userid = 44143;
# Time: 050112 23:40:14
# User@Host: rcgroups[rcgroups] @ [12.156.2.43]
# Query_time: 13 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
INSERT INTO threadviews (threadid)
VALUES (317350);



You get the idea. These queries shouldn't lock anything, and are not locked according to "Lock_time:" value. Especially puzzling are session related queries.
These "hiccups" are sometimes causing apache processes to stack up, raising web frontend load.

There aren't any peaks of db server load or disk I/O activity associated with these queries:

sar -q output for corresponding time periods:

00:00:00 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
.....
20:30:00 1 121 0.26 0.35 0.35
20:35:00 1 124 0.58 0.42 0.38
20:40:00 1 124 0.58 0.46 0.39
....
21:35:00 2 126 0.85 0.60 0.44
21:40:00 2 127 0.38 0.44 0.41
21:45:00 1 127 0.57 0.51 0.44
....
23:00:00 1 122 0.20 0.43 0.50
23:05:00 3 123 0.35 0.45 0.48
....
23:35:00 1 139 0.92 0.52 0.41
23:40:00 0 132 0.33 0.38 0.38
23:45:00 2 132 0.53 0.43 0.39


sar -b output:

00:00:00 tps rtps wtps bread/s bwrtn/s
...
20:25:00 24.97 8.34 16.63 99.72 180.72
20:30:00 24.58 6.93 17.65 82.16 194.09
20:35:00 25.41 7.59 17.83 89.61 195.07
...
21:30:01 29.59 11.39 18.20 136.69 200.47
21:35:00 37.50 11.11 26.38 132.68 291.42
21:40:00 29.71 12.19 17.52 148.79 189.79
...
22:55:00 33.71 14.13 19.58 164.77 213.14
23:00:00 29.43 10.63 18.80 134.20 204.61
23:05:00 29.23 7.81 21.42 93.20 234.48
....
23:35:00 30.87 9.21 21.66 107.58 236.41
23:40:00 23.51 8.92 14.59 111.87 159.18
23:45:00 29.85 8.79 21.06 112.63 232.48


I believe the server is fairly tweaked, but I'm all open to suggestions - maybe I've missed something obvious?



1. Dedicated db server, only mysql running

2. Specs:
Dual 2.4GHz Xeon HT
2Gb RAM
4x7200RPM IDE drives in RAID5
Redhat9 (we plan to upgrade to FC2 or FC3 with kernel 2.6.x ASAP)
Mysql 4.0.23

3. No InnoDB tables
4. Official RPMs provided by MySQL
5. top stats:

top - 06:19:30 up 19 days, 13:02, 4 users, load average: 0.27, 0.27, 0.26
Tasks: 117 total, 1 running, 116 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.3% user, 2.7% system, 0.0% nice, 97.0% idle
Cpu1 : 4.7% user, 2.7% system, 0.0% nice, 92.7% idle
Cpu2 : 0.7% user, 2.3% system, 0.0% nice, 97.0% idle
Cpu3 : 1.3% user, 0.7% system, 0.0% nice, 98.0% idle
Mem: 2064644k total, 2053736k used, 10908k free, 31204k buffers
Swap: 2040244k total, 177572k used, 1862672k free, 1630412k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25231 mysql 12 0 325m 280m 1532 S 6.0 13.9 0:22.22 mysqld
25588 mysql 12 0 325m 280m 1532 S 6.0 13.9 0:11.39 mysqld
10326 mysql 11 0 325m 280m 1532 S 2.0 13.9 4:21.81 mysqld
26564 root 16 0 884 884 668 R 2.0 0.0 0:00.02 top
1 root 8 0 456 428 396 S 0.0 0.0 0:18.35 init
2 root 9 0 0 0 0 S 0.0 0.0 0:09.55 keventd
3 root 19 19 0 0 0 S 0.0 0.0 0:03.72 ksoftirqd_CPU0
4 root 19 19 0 0 0 S 0.0 0.0 0:00.64 ksoftirqd_CPU1
5 root 19 19 0 0 0 S 0.0 0.0 0:00.40 ksoftirqd_CPU2
6 root 19 19 0 0 0 S 0.0 0.0 0:00.47 ksoftirqd_CPU3
7 root 9 0 0 0 0 S 0.0 0.0 11:08.98 kswapd
8 root 9 0 0 0 0 S 0.0 0.0 0:31.53 kscand/DMA
9 root 9 0 0 0 0 S 0.0 0.0 55:35.09 kscand/Normal
10 root 14 0 0 0 0 S 0.0 0.0 87:47.07 kscand/HighMem
11 root 9 0 0 0 0 S 0.0 0.0 0:00.49 bdflush
12 root 9 0 0 0 0 S 0.0 0.0 0:48.79 kupdated
13 root -1 -20 0 0 0 S 0.0 0.0 0:00.00 mdrecoveryd
19 root 9 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
22 root 9 0 0 0 0 S 0.0 0.0 0:08.62 kjournald
78 root 9 0 0 0 0 S 0.0 0.0 0:00.00 khubd
234 root 9 0 0 0 0 S 0.0 0.0 0:00.17 kjournald
235 root 9 0 0 0 0 S 0.0 0.0 1:27.74 kjournald
236 root 9 0 0 0 0 S 0.0 0.0 5:51.34 kjournald
237 root 9 0 0 0 0 S 0.0 0.0 0:25.89 kjournald
715 root 9 0 48 4 0 S 0.0 0.0 0:00.00 mingetty
716 root 9 0 48 4 0 S 0.0 0.0 0:00.01 mingetty
717 root 9 0 48 4 0 S 0.0 0.0 0:00.00 mingetty
718 root 9 0 48 4 0 S 0.0 0.0 0:00.00 mingetty
719 root 9 0 48 4 0 S 0.0 0.0 0:00.01 mingetty
869 root 9 0 48 4 0 S 0.0 0.0 0:00.00 mingetty
21208 supporte 10 0 1548 816 496 S 0.0 0.0 0:05.07 screen
21209 root 17 0 876 752 520 S 0.0 0.0 0:15.74 bash
21211 supporte 9 0 320 64 60 S 0.0 0.0 0:00.01 bash
21213 supporte 9 0 720 444 440 S 0.0 0.0 0:00.14 bash
2617 supporte 9 0 1212 276 272 S 0.0 0.0 0:01.20 mysql
6489 daemon 19 19 560 524 484 S 0.0 0.0 0:00.08 atd
6720 root 19 19 484 416 412 S 0.0 0.0 0:00.01 gpm
6739 root 18 19 596 548 500 S 0.0 0.0 0:01.86 crond
6784 root 19 19 556 524 472 S 0.0 0.0 0:02.51 syslogd
6788 root 19 19 424 368 364 S 0.0 0.0 0:00.01 klogd
6815 root 18 19 748 688 612 S 0.0 0.0 0:01.07 xinetd
6857 root 19 19 1712 1196 1004 S 0.0 0.1 0:13.06 sendmail
6867 smmsp 18 19 1420 972 856 S 0.0 0.0 0:00.28 sendmail
8157 root 9 0 968 812 704 S 0.0 0.0 0:15.85 sshd
10298 root 9 0 980 836 832 S 0.0 0.0 0:00.00 mysqld_safe
10318 mysql 9 0 325m 280m 1532 S 0.0 13.9 1:31.76 mysqld
10319 mysql 9 0 325m 280m 1532 S 0.0 13.9 0:02.42 mysqld
10320 mysql 9 0 325m 280m 1532 S 0.0 13.9 1:20.43 mysqld
10321 mysql 9 0 325m 280m 1532 S 0.0 13.9 0:02.32 mysqld
10322 mysql 9 0 325m 280m 1532 S 0.0 13.9 0:06.27 mysqld
10325 mysql 9 0 325m 280m 1532 S 0.0 13.9 4:12.63 mysqld
23745 mysql 10 0 325m 280m 1532 S 0.0 13.9 1:27.29 mysqld
24350 mysql 9 0 325m 280m 1532 S 0.0 13.9 0:56.74 mysqld
24529 mysql 9 0 325m 280m 1532 S 0.0 13.9 0:51.96 mysqld
24644 mysql 10 0 325m 280m 1532 S 0.0 13.9 0:42.14 mysqld
24784 root 9 0 2008 1928 1592 S 0.0 0.1 0:00.02 sshd
24786 supporte 9 0 2464 2396 2016 S 0.0 0.1 0:00.03 sshd
24787 supporte 9 0 1024 952 804 S 0.0 0.0 0:00.07 screen
24914 mysql 9 0 325m 280m 1532 S 0.0 13.9 0:28.21 mysqld
...... (80 more mysqld's)


Preventing the question of rather high swap usage - it stabilizes on that value after a few days of uptime, sar -W shows almost zero swapping - avg 0.3 page/sec at peak time.

6. my.cnf

[mysqld]
back_log=128
tmpdir=/usr/mysql/tmp
ft_min_word_len=3
long_query_time=8
max_connections=300
key_buffer_size=224M
myisam_sort_buffer_size=64M
join_buffer_size=1M
read_buffer_size=1M
read_rnd_buffer_size=1M
sort_buffer_size=2M
table_cache=1536
thread_cache_size=64
wait_timeout=1800
connect_timeout=5
max_allowed_packet=16M
max_connect_errors=16
tmp_table_size=32M
query_prealloc_size=16384
query_cache_limit = 512K
query_cache_size = 48M
query_cache_type = 1
skip-innodb
log-slow-queries = /usr/mysql/mysqld.slow.log
server-id=1
skip-name-resolve
skip-external-locking
log-bin
binlog-ignore-db = mysql
binlog-ignore-db = test
binlog-ignore-db = phpads
max_binlog_size=120M
replicate-do-db = forums2
replicate-do-db = fogbugz

[myisamchk]
ft_min_word_len=3
tmpdir=/usr/mysql/tmp
key_buffer=128M
sort_buffer=128M
read_buffer=16M
write_buffer=16M

[mysqld_safe]
open_files_limit = 8192


This server also replicates two very low-traffic db's, but the problem were there even before replication setup.

7. mysql status

+--------------------------+------------+
| Variable_name | Value |
+--------------------------+------------+
| Aborted_clients | 124 |
| Aborted_connects | 603 |
| Bytes_received | 2005496256 |
| Bytes_sent | 655370742 |
| Com_admin_commands | 3 |
| Com_alter_table | 2 |
| Com_analyze | 0 |
| Com_backup_table | 0 |
| Com_begin | 0 |
| Com_change_db | 1580240 |
| Com_change_master | 0 |
| Com_check | 0 |
| Com_commit | 0 |
| Com_create_db | 0 |
| Com_create_function | 0 |
| Com_create_index | 0 |
| Com_create_table | 629 |
| Com_delete | 91462 |
| Com_delete_multi | 1 |
| Com_drop_db | 0 |
| Com_drop_function | 0 |
| Com_drop_index | 0 |
| Com_drop_table | 20426 |
| Com_flush | 2 |
| Com_grant | 0 |
| Com_ha_close | 0 |
| Com_ha_open | 0 |
| Com_ha_read | 0 |
| Com_insert | 1260663 |
| Com_insert_select | 56 |
| Com_kill | 0 |
| Com_load | 0 |
| Com_load_master_data | 0 |
| Com_load_master_table | 0 |
| Com_lock_tables | 0 |
| Com_optimize | 0 |
| Com_purge | 0 |
| Com_rename_table | 0 |
| Com_repair | 0 |
| Com_replace | 43908 |
| Com_replace_select | 0 |
| Com_reset | 0 |
| Com_restore_table | 0 |
| Com_revoke | 0 |
| Com_rollback | 0 |
| Com_savepoint | 0 |
| Com_select | 7053655 |
| Com_set_option | 46 |
| Com_show_binlog_events | 0 |
| Com_show_binlogs | 0 |
| Com_show_create | 48 |
| Com_show_databases | 3 |
| Com_show_fields | 471 |
| Com_show_grants | 0 |
| Com_show_keys | 1 |
| Com_show_logs | 0 |
| Com_show_master_status | 0 |
| Com_show_new_master | 0 |
| Com_show_open_tables | 0 |
| Com_show_processlist | 1 |
| Com_show_slave_hosts | 3 |
| Com_show_slave_status | 1 |
| Com_show_status | 13 |
| Com_show_innodb_status | 0 |
| Com_show_tables | 2184 |
| Com_show_variables | 13 |
| Com_slave_start | 0 |
| Com_slave_stop | 0 |
| Com_truncate | 0 |
| Com_unlock_tables | 0 |
| Com_update | 3241108 |
| Connections | 1572205 |
| Created_tmp_disk_tables | 13764 |
| Created_tmp_tables | 115884 |
| Created_tmp_files | 302 |
| Delayed_insert_threads | 0 |
| Delayed_writes | 0 |
| Delayed_errors | 0 |
| Flush_commands | 1 |
| Handler_commit | 0 |
| Handler_delete | 201375 |
| Handler_read_first | 1263542 |
| Handler_read_key | 1257946078 |
| Handler_read_next | 1627686487 |
| Handler_read_prev | 18178446 |
| Handler_read_rnd | 130484965 |
| Handler_read_rnd_next | 633030589 |
| Handler_rollback | 0 |
| Handler_update | 6190963 |
| Handler_write | 18077357 |
| Key_blocks_used | 217962 | Approx. 95% of key_buffer_size in use
| Key_read_requests | 2821061676 |
| Key_reads | 262450 |
| Key_write_requests | 6710893 |
| Key_writes | 2274141 |
| Max_used_connections | 208 |
| Not_flushed_key_blocks | 0 |
| Not_flushed_delayed_rows | 0 |
| Open_tables | 1530 | 100% of table_cache in use
| Open_files | 1697 |
| Open_streams | 0 |
| Opened_tables | 3560 |
| Questions | 24537432 |
| Qcache_queries_in_cache | 13878 |
| Qcache_inserts | 6973628 |
| Qcache_hits | 9665613 |
| Qcache_lowmem_prunes | 79216 |
| Qcache_not_cached | 80024 |
| Qcache_free_memory | 26210016 | 23556.3K (approx. 47.9%) of query_cache_size in use
| Qcache_free_blocks | 3249 |
| Qcache_total_blocks | 31466 |
| Rpl_status | NULL |
| Select_full_join | 431 |
| Select_full_range_join | 21 |
| Select_range | 1208995 |
| Select_range_check | 0 |
| Select_scan | 471712 |
| Slave_open_temp_tables | 0 |
| Slave_running | ON |
| Slow_launch_threads | 9 |
| Slow_queries | 898 | (execution time > 8 secs)
| Sort_merge_passes | 148 |
| Sort_range | 1338900 |
| Sort_rows | 492135122 |
| Sort_scan | 311571 |
| Table_locks_immediate | 22522677 |
| Table_locks_waited | 199713 |
| Threads_cached | 43 |
| Threads_created | 5435 |
| Threads_connected | 23 |
| Threads_running | 3 |
| Uptime | 179513 | 2 days 1 hr 51 mins 53 secs
+--------------------------+------------+


Key Reads/Key Read Requests = 0.000093 (Cache hit = 99.999907%)
Key Writes/Key Write Requests = 0.338873
Connections/second = 8.758 (/hour = 31529.405)
KB received/second = 10.910 (/hour = 39276.104)
KB sent/second = 3.565 (/hour = 12834.926)
Temporary Tables Created/second = 0.646 (/hour = 2323.968)
Opened Tables/second = 0.020 (/hour = 71.393)
Slow Queries/second = 0.005 (/hour = 18.009)
% of slow queries = 0.004%
Queries/second = 136.689 (/hour = 492079.990)
Query Cache Hit Rate = 9665613/16719268 (57.811221%)
Waited Query Locks Ratio = 199713/22722390 (0.878926%)
Temp Tables Created on Disk Ratio = 13764/129648 (10.616438%)



8. vB isn't the only thing accessing this db server, but other scripts' queries are almost non-existant compared with vB.

9. Forum users: average 1100 at day time, peak was about 1500 per 900secs cookie timeout

10-11. Does php and apache config matter here?

eva2000
Fri 14th Jan '05, 10:21am
How are you web + db servers connected ? What are your web server hardware specs ? what your maxclient value for web server httpd.conf ? are you using PHP 4.3.10 for web servers ? I'll get a vB developer to comment on this. One potential problem i can see is using slow IDE disks for raid 5 + using mysql binary logging for replication.

For 1,000 - 1,500 vB users online at default cookie timeout you should really beef up your database server's total memory to around 4GB of ram, from my experience it really does help ALOT especially if you retweak my.cnf to allocate more of the 4GB to memory buffers.

Using SCSI disks on db server would also help tremendously.

try this my.cnf for size also

[mysqld]
skip-innodb
skip-name-resolve
skip-external-locking
back_log=128
tmpdir=/usr/mysql/tmp
ft_min_word_len=3
long_query_time=8
max_connections = 650
key_buffer = 64M
myisam_sort_buffer_size = 64M
myisam_max_extra_sort_file_size = 268435456
myisam_max_sort_file_size = 2147483647
join_buffer_size = 1M
read_buffer_size = 1M
sort_buffer_size = 2M
table_cache = 1500
thread_cache_size = 128
wait_timeout = 3600
connect_timeout = 10
tmp_table_size = 128M
read_rnd_buffer_size = 524288
bulk_insert_buffer_size = 16M
max_allowed_packet = 16M
max_connect_errors = 10
query_cache_limit = 1M
query_cache_size = 64M
query_cache_type = 1
log-slow-queries = /usr/mysql/mysqld.slow.log
server-id=1
log-bin
binlog-ignore-db = mysql
binlog-ignore-db = test
binlog-ignore-db = phpads
max_binlog_size=120M
replicate-do-db = forums2
replicate-do-db = fogbugz

[myisamchk]
ft_min_word_len=3
tmpdir=/usr/mysql/tmp
key_buffer=128M
sort_buffer=128M
read_buffer=16M
write_buffer=16M

[mysqld_safe]
open_files_limit = 8192

kmike
Fri 14th Jan '05, 5:50pm
Web and db server are connected via switch, they're on the same subnet.
Web server specs are the same as for db server.
MaxClients is capped at 200.
We're not using php 4.3.10, as it has a nasty bug which causes major forum slowdown. We use php 4.3.2 with security patches from RHEL3.

Unfortunately, we can't turn off mysql binary log - it is used to replicate db to backup server and to another db server which runs forum fulltext searches only.

IDE vs SCSI disks - point taken. But it's not happening in the very near future. Besides, I don't think IDE drives are what causes these weird slowdowns. Did you see sar outputs? Even single 5400RPM IDE disk can handle that 200kb/sec throughput.

To further illustrate this, here're sar -d stats for another db server (same specs, only difference is RAID1) which handles fulltext searches. It takes much more I/O load without even breaking the sweat:

00:00:00 AM DEV tps rd_sec/s wr_sec/s
.....
12:30:01 PM dev8-0 308.44 2386.23 112.05
12:40:01 PM dev8-0 107.24 777.85 108.29
12:50:01 PM dev8-0 145.69 1068.51 129.04
01:00:01 PM dev8-0 117.01 841.89 131.41
01:10:01 PM dev8-0 134.67 1014.57 97.56
01:20:01 PM dev8-0 235.07 1784.06 132.35
01:30:01 PM dev8-0 108.13 778.67 120.56
01:40:02 PM dev8-0 218.54 4335.31 106.62
01:50:02 PM dev8-0 143.99 1075.13 144.34
02:00:01 PM dev8-0 125.09 900.74 134.82
02:10:01 PM dev8-0 83.44 584.74 127.57
02:20:01 PM dev8-0 55.04 317.37 161.41
02:30:01 PM dev8-0 178.54 1332.64 135.16


Now, stats for server in question (with weird slow queries):

00:00:00 DEV tps rd_sec/s wr_sec/s
.....
12:30:00 dev8-0 30.40 145.54 201.16
12:35:00 dev8-0 31.59 115.36 245.18
12:40:00 dev8-0 33.05 203.00 182.13
12:45:00 dev8-0 34.31 159.50 225.14
12:50:00 dev8-0 32.00 135.43 227.73
12:55:00 dev8-0 32.57 142.00 222.79
13:00:00 dev8-0 32.87 188.57 186.48
13:05:01 dev8-0 34.42 188.12 202.79
13:10:03 dev8-0 27.37 121.96 190.20
13:15:00 dev8-0 37.95 188.88 234.42
13:20:00 dev8-0 29.85 161.77 176.92
13:25:00 dev8-0 27.07 136.91 172.89
13:30:01 dev8-0 30.35 179.08 173.43
13:35:00 dev8-0 31.60 174.42 201.16
13:40:00 dev8-0 27.08 133.66 171.06
13:45:00 dev8-0 32.24 133.72 228.35
13:50:00 dev8-0 26.87 115.71 192.00


4G RAM - another point taken.
I'll try proposed my.cnf and see what comes out.

kmike
Sat 15th Jan '05, 2:39am
It didn't help.

AlfonsoC
Sun 16th Jan '05, 7:18am
One thing who is dragging your machine and can make some performance difference is to switch from RAID5 in 4 disks to RAID 1+0 (the best) or 0+1 (best) in a 4 disk config.

Saw the mdrecoverid daemon on your top output, so... at least you compiled on your kernel the MD driver. Your RAID config is on hardware or software?

Think about this: every user who visits your page makes slight modifications on your databases (don't know percentages, but... can we talk about a 80% reads / 20% writes). Unless you're running a true hardware RAID5 controller who offloads the XOR ops, yout CPU's need to make those ops.

Further tweaking of the software could lead you to an improvement on the DB performance, but hardware needs to be tweaked too.

SCSI disks are fine and are very desirable, but the cost of a true SCSI hardware raid controller plus the disks makes some times the cost very steep, I'd take a look for a SATA RAID controller paired with some pretty SATA disks, cost vs. performance is much better on the SATA side.

eva2000
Sun 16th Jan '05, 9:23am
It didn't help.
care to elaborate ? meaning slow queries still there ? I've asked a vB dev to look in on this thread.. so please bare with us

btw what apache version you're using ? linux kernel version ?

kmike
Sun 16th Jan '05, 2:35pm
One thing who is dragging your machine and can make some performance difference is to switch from RAID5 in 4 disks to RAID 1+0 (the best) or 0+1 (best) in a 4 disk config.

Saw the mdrecoverid daemon on your top output, so... at least you compiled on your kernel the MD driver. Your RAID config is on hardware or software?


RAID is 3ware card, which is true hardware RAID. mdrecoverid just sits there doing nothing - it is compiled in RH's built kernels.
I don't think I/O is main bottleneck here. Did you see sar I/O stats? Did you see sar load average stats? Machine is basically sitting there idle! It rarely reaches 1.0 average, which is less than 50% of load for dual HT box.


care to elaborate ? meaning slow queries still there ? I've asked a vB dev to look in on this thread.. so please bare with us
btw what apache version you're using ? linux kernel version ?


Yeah, slow queries still there, nothing has changed.
Apache 2.0.51, kernels 2.4.20-30.8.legacysmp.

kmike
Mon 17th Jan '05, 3:04am
I'd take a look for a SATA RAID controller paired with some pretty SATA disks, cost vs. performance is much better on the SATA side.

WRT SATA RAIDs, many of SATA RAID chips, and most of integrated ones (like Intel's ICH5/ICH6, Silicon Image's series) are in fact software RAIDs - the RAID functions are implemented by OS driver. So I'd be very careful when giving such advice, at least with provided caveat.

The differences between SATA RAID chips and status of Linux support for them are outlined here: http://www.linuxmafia.com/faq/Hardware/sata.html

AlfonsoC
Mon 17th Jan '05, 8:42am
WRT SATA RAIDs, many of SATA RAID chips, and most of integrated ones (like Intel's ICH5/ICH6, Silicon Image's series) are in fact software RAIDs - the RAID functions are implemented by OS driver. So I'd be very careful when giving such advice, at least with provided caveat.[/url]

I'm not talking of such RAID chips. Most of those chips only work fine for windows machines, but when you go out of MS world their performance is at least discrete.

When I'm talking about 'True hardware raid' controllers (please re-read my post...) I'm talking about 3Ware cards and something on such league. Let's start comparing comparable hardware not SCSI RAID vs. Chipset RAID controllers.

Have 3 Linux machines (Dual opterons with >2Gb RAM and true hardware RAID controllers - 2 with 3Ware 9500-8 and one with 3Ware 8506 controllers) and their perf is simply stellar.

Maybe a custom compilation of a 2.6 Kernel would alleviate this. What version of drivers are you using?

kmike
Thu 20th Jan '05, 2:38am
AlfonsoC:
Sure you meant no harm, but "hardware" in your post appertains to SCSI RAID, not SATA RAID. Given that they're present on almost every SATA enabled board, and 95% of them are not hardware RAIDs, I'd put abovementioned caveat nearby :)


On to server, after upgrading it to FC2 and inadvertently rebooting into 2.6.10_FC2, it seems things got even worse. Slow queries rate raised to 55/hour, while it was rarely over 10/hour before.
I'm looking forward for other suggestions.

kmike
Thu 20th Jan '05, 5:21am
Correcting myself - there was a huge rush of members after some forum downtime, I'd say 1500 or more online users for over an hour, no wonder queries got slower.

So now we're back to "normal" slow query rate. Bottom line: kernel 2.6 didn't help us, either (and I had a hidden hope it would...).

kmike
Mon 24th Jan '05, 5:44am
Correcting myself once more after several days of uptime - situation has certainly improved with 2.6.10 kernel, I get no more than 3 "strange" slow queries per hour now. But still not 100% fixed.

There're some spectacular differences in sar stats, too. Kernels 2.6.x are much more superior in block caching department than 2.4.x. Compare this to 2.4.x's sar output several posts above:

12:00:00 AM DEV tps rd_sec/s wr_sec/s
...
12:30:00 PM dev8-0 28.99 56.77 208.05
12:35:00 PM dev8-0 36.60 60.38 292.44
12:40:00 PM dev8-0 30.77 58.32 225.67
12:45:00 PM dev8-0 37.29 77.41 268.14
12:50:00 PM dev8-0 35.37 74.52 250.58
12:55:00 PM dev8-0 26.05 43.89 199.72
01:00:00 PM dev8-0 30.77 64.45 216.62
01:05:00 PM dev8-0 33.42 68.07 244.20
01:10:00 PM dev8-0 29.37 57.65 215.50
01:15:00 PM dev8-0 37.50 82.62 263.88
01:20:00 PM dev8-0 31.65 96.90 191.88
01:25:00 PM dev8-0 27.88 77.63 174.57
01:30:00 PM dev8-0 29.82 67.41 208.11
...


Avg 50% reduction in actual disk reads! Awesome.
Context switches numbers dropped by 1/3 across board, too.
And finally, 2.6.x is more hesitant to put things to swap than 2.4.x - I'm yet to see more than 4M swapped, compare this to some 150+M for 2.4.x. Though this behaviour is configurable via /proc/sys/vm/swappiness.

kmike
Tue 8th Feb '05, 9:48am
Continuing my monologue.

I was thinking the problem is caused by some subtle hardware issues, so idea came up to try to move our db to another server. We've got an exactly the same server freed recently (it was used as web frontend before), and tonight I moved forum db there.

It is running for 6 hours now, there were 26 slow queries with execution time > 5 secs, most of them look innocent enough - number of scanned rows greater or nearing 1000.

But this one certainly shouldn't be there:

# Query_time: 7 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
INSERT INTO attachment
(userid, dateline, filename, filesize, visible, filehash, posthash, thumbnail)
VALUES
(12287, 1107867243, 'motor mount.jpg', 54246, 1, 'f4451a9e10e5a9f413af2f3f42f90ec3', '1f05fee312a08e12fe709281c96817f3', '');

Also, some queries with examined rows count around 800 look suspicious enough, too. And we didn't hit peak traffic time yet.

So we're back to step one. I'm out of ideas. Could it be that this specific series of servers is faulty?

*Edit* some minutes later.
here we go:

# Query_time: 8 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
INSERT INTO attachmentviews (attachmentid)
VALUES (375270);

eva2000
Tue 8th Feb '05, 8:09pm
i think you're over thinking things as it's normal in vB operational sense to have a few slow queries - if it doesn't affect your forum/server performance to end users (perceptually) it should be fine.

kmike
Wed 9th Feb '05, 4:06am
I'm not disputing that slow queries are inevitable, especially with a lot of big threads.
But do you think UPDATEs on session table should last more than 0.01 sec?
And it's sometimes taking up to 6 seconds for us!

Overall forum performance is quite good, but there're some occasional hiccups which I attribute to these slow queries.

Another strangeness: some processes linger in the "end" state in processlist output:

Id User Host db Command Time State Info
135432 tttt t.t.t.t:60257 mydb_com Query 1 end UPDATE session\n\t\tSET useragent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)', lastactivity = 1107872218, styleid = 0, bypass = 0\n\t\t ,inforum = 18, inthread = 27856, incalendar = 0, badlocation = 0\n\t\t WHERE sessionhash = '20ad637805dc2f283a990647276fd26a'

Nothing in mysql docs on this process state.