In an attempt to stay on top of our ever-busy forum, we recently enlisted the help of the mysqlperformanceblog.com guys (Peter, Vadim, and Aurimas) to come in, and help us figure out how to speed up vBulletin.
A bit of background to better help those reading this why this is necessary. Our forum (http://pbnation.com) is currently the 4th largest vBulletin forum (16th across all forums listed on big-boards.com) and is consistently in the top 10 as far as weekly posts are concerned.
We have been running vBulletin since the site started in 2001, and have always been pretty active in the community as far as performance optimizations, hacks, etc. At some point we had to start looking beyond the level of advice we could get from the community, and seek out professional help.
Enter the mysql guys. Peter was the former head of the mysql performance group (at mysql ab), and was the obvious choice. The following is some of the findings. I am sharing this information for two reasons. First, in hopes that those changes that are applicable to the majority of vB users will be considered for integration into a future release of vB, and second to help those of us who are struggling to keep our sites afloat.
So, lets get started here. Please note that a lot of this info will not pertain to most sites, as some of these problems only manifest themselves under heavy load.
First off, everyone attempting to profile slow queries in vB needs two tools to do so. If you are capable of compiling MySQL from source, grab the mysqlperformanceblog groups microslow patch, and the mysql slow query log analyzer. By setting your long_query_time to 0 with the microslow patch enabled, and letting it run for a day or so, you'll end up with one really giant slowqueries log file to parse. The slow query log tool produces (in order based on frequency) useful info like:
### 931000 Queries
### Total time: 67768.8358730002, Average time: 0.072791445620838
### Taking 0.000845 to 33 seconds to complete
### Rows analyzed 12 - 369
This is a perfect way to determine what is running slowly, how often it gets hit, and how many rows were analyzed. After we ran this setup for a day, we ended up with 13gb of sql queries that took 1 second or longer, with a few that really stood out.
1) The first:
This query gets hit quite a bit, and often does a filesort because the visible column doesn't contain an index. This is easily fixed by adding an additional index to the post table:Code:SELECT postid FROM post AS post WHERE threadid = XXX AND visible = XXX ORDER BY dateline LIMIT XXX, XXX;
2) This next query is a bit tricky, but just goes to show you how sometimes less isn't better. We have a hack that locks and archives old posts into post_ tables that are selected against using a MySQL merge table. So, even though we have 30 million posts, only about 10 million of those are active any given time in the "post" table. Even with only 10 million rows, this query is very slow:Code:ALTER TABLE post ADD KEY `th_search` (`threadid`, `visible`, `dateline`);
Why is it slow? On our server, the available memory necessary to hold all of these queries being executed simultaneously is not enough, and these queries often create temporary tables on disk to sort the results. We all know memory is faster than disk, so why not create a temporary table in memory, split the query up, and temporarily store the results there? This way, we're executing more queries, but by keeping less results in memory rather than hitting the disk. The result is less table locking, less disk I/O and the query is much faster overall. So, onto the fix:Code:SELECT post.*, post.username AS postusername, post.ipaddress AS ip, IF(post.visible = XXX, XXX, XXX) AS isdeleted, user.*, userfield.*, usertextfield.*, icon.title as icontitle, icon.iconpath, avatar.avatarpath, NOT ISNULL(customavatar.userid) AS hascustomavatar, customavatar.dateline AS avatardateline, customavatar.width AS avwidth, customavatar.height AS avheight, editlog.userid AS edit_userid, editlog.username AS edit_username, editlog.dateline AS edit_dateline, editlog.reason AS edit_reason, postparsed.pagetext_html, postparsed.hasimages, sigparsed.signatureparsed, sigparsed.hasimages AS sighasimages, sigpic.userid AS sigpic, sigpic.dateline AS sigpicdateline, sigpic.width AS sigpicwidth, sigpic.height AS sigpicheight, IF(displaygroupid=XXX, user.usergroupid, displaygroupid) AS displaygroupid, infractiongroupid, 'XXX' AS fieldXXX, 'XXX' AS fieldXXX FROM post AS post LEFT JOIN user AS user ON(user.userid = post.userid) LEFT JOIN userfield AS userfield ON(userfield.userid = user.userid) LEFT JOIN usertextfield AS usertextfield ON(usertextfield.userid = user.userid) LEFT JOIN icon AS icon ON(icon.iconid = post.iconid) LEFT JOIN avatar AS avatar ON(avatar.avatarid = user.avatarid) LEFT JOIN customavatar AS customavatar ON(customavatar.userid = user.userid) LEFT JOIN editlog AS editlog ON(editlog.postid = post.postid) LEFT JOIN postparsed AS postparsed ON(postparsed.postid = post.postid AND postparsed.styleid = XXX AND postparsed.languageid = XXX) LEFT JOIN sigparsed AS sigparsed ON(sigparsed.userid = user.userid AND sigparsed.styleid = XXX AND sigparsed.languageid = XXX) LEFT JOIN sigpic AS sigpic ON(sigpic.userid = post.userid) WHERE post.postid IN (XXX,XXX,XXX,XXX,XXX,XXX,XXX,XXX,XXX) ORDER BY post.dateline;
a) First we create a temporary HEAP table:
b) Second, we insert the postids matched with post.postid IN (XXX,...,XXX) into the tmp_sort table:Code:CREATE TEMPORARY TABLE tmp_sort ( `postid` int(10) unsigned NOT NULL, `dateline` int(10) unsigned NOT NULL ) ENGINE=HEAP;
c) Third, we join against the tmp_sort table with the other joins in the query:Code:INSERT INTO tmp_sort SELECT postid, dateline FROM post AS post WHERE post.postid IN (XXX,XXX,XXX);
3) The next query is like the last one. In this case executing three queries instead of one results in a speedup of 3-5x over the original query:Code:SELECT post.*, post.username AS postusername, post.ipaddress AS ip, IF(post.visible = XXX, XXX, XXX) AS isdeleted, user.*, userfield.*, usertextfield.*, icon.title as icontitle, icon.iconpath, avatar.avatarpath, NOT ISNULL(customavatar.userid) AS hascustomavatar, customavatar.dateline AS avatardateline,customavatar.width AS avwidth,customavatar.height AS avheight, editlog.userid AS edit_userid, editlog.username AS edit_username, editlog.dateline AS edit_dateline, editlog.reason AS edit_reason, postparsed.pagetext_html, postparsed.hasimages, sigparsed.signatureparsed, sigparsed.hasimages AS sighasimages, sigpic.userid AS sigpic, sigpic.dateline AS sigpicdateline, sigpic.width AS sigpicwidth, sigpic.height AS sigpicheight, IF(displaygroupid=XXX, user.usergroupid, displaygroupid) AS displaygroupid, infractiongroupid, 'XXX' AS fieldXXX, 'XXX' AS fieldXXX FROM tmp_sort AS tmp LEFT JOIN post AS post ON(post.postid = tmp.postid) LEFT JOIN user AS user ON(user.userid = post.userid) LEFT JOIN userfield AS userfield ON(userfield.userid = user.userid) LEFT JOIN usertextfield AS usertextfield ON(usertextfield.userid = user.userid) LEFT JOIN icon AS icon ON(icon.iconid = post.iconid) LEFT JOIN avatar AS avatar ON(avatar.avatarid = user.avatarid) LEFT JOIN customavatar AS customavatar ON(customavatar.userid = user.userid) LEFT JOIN editlog AS editlog ON(editlog.postid = post.postid) LEFT JOIN postparsed AS postparsed ON(postparsed.postid = post.postid AND postparsed.styleid = 35 AND postparsed.languageid = 1) LEFT JOIN sigparsed AS sigparsed ON(sigparsed.userid = user.userid AND sigparsed.styleid = 35 AND sigparsed.languageid = 1) LEFT JOIN sigpic AS sigpic ON(sigpic.userid = post.userid) ORDER BY tmp.dateline;
Again, this query doesn't look like much, but when you have a ton of PMs, and an active user base, this is quite the hog. The fix is three fold:Code:SELECT pm.*, pmtext.*, icon.title AS icontitle, icon.iconpath FROM pm AS pm LEFT JOIN pmtext AS pmtext ON(pmtext.pmtextid = pm.pmtextid) LEFT JOIN icon AS icon ON(icon.iconid = pmtext.iconid) WHERE pm.userid=XXX AND pm.folderid=XXX ORDER BY pmtext.dateline DESC LIMIT XXX, XXX;
a) Create a heap table:
b) Populate tmp_pm:Code:CREATE TEMPORARY TABLE tmp_pm ( pmid int(10) unsigned NOT NULL, pmtextid int(10) unsigned NOT NULL, dateline int(10) unsigned NOT NULL ) ENGINE=HEAP;
c) Perform the select against the tmp table:Code:INSERT INTO tmp_pm SELECT pm.pmid AS pmid, pm.pmtextid AS pmtextid, pmtext.dateline AS dateline FROM pm AS pm LEFT JOIN pmtext AS pmtext USING (pmtextid) WHERE pm.userid = XXX AND pm.folderid = XXX;
4) PMs are a mess when you have millions of them. A simple change to the index (adding readtime to the index that is now just userid) speeds up this query quite a bit:Code:SELECT pm.*, pmtext.*, icon.title AS icontitle, icon.iconpath FROM tmp_pm AS tmp LEFT JOIN pm AS pm ON (tmp.pmid = pm.pmid) LEFT JOIN pmtext AS pmtext ON(tmp.pmtextid = pmtext.pmtextid) LEFT JOIN icon AS icon ON(icon.iconid = pmtext.iconid) ORDER BY tmp.dateline DESC LIMIT XXX, XXX;
The fix is simple:Code:SELECT SUM(IF(readtime <> XXX, XXX, XXX)) AS confirmed, SUM(IF(readtime = XXX, XXX, XXX)) AS unconfirmed FROM pmreceipt WHERE userid = XXX;
So, that was our initial round of changes. The performance benefit of all steps was clearly evident, but wasn't quite the boost we were looking for, so we went back for round two. Prior to round two, we were running some of our tables as InnoDB so we could get around the nasty row locking issues we had. Since we use the Sphinx search engine instead of the vBulletin engine, we didn't have to worry about losing the fulltext search functionality by switching.Code:ALTER TABLE pmreceipt DROP KEY `userid`, ADD KEY `userid` (`userid`, `readtime`);
Our initial round of changes had us switching the post, thread and user tables over to InnoDB. This got rid of the row locking we were seeing, but quickly gobbled up the 4gb of memory we had our innodb_buffer_pool_size set to, and was making optimization #1 from above slower than it could be. So, we took the plunge, upgraded our server to 16gb of ram from 8gb, and bumped our innodb_buffer_pool_size up to 8gb.
In addition, it became clear that there were more tables slowing us down due to locking issues. We found that switching the avatar, customavatar, editlog, icon, pm, pmtext and userfield tables to InnoDB helped clear up the locking issues a bit.
Furthermore, we also determined that the following query could benefit from an additional modfication of the stock index.
In order to avoid more disk intensive filesorts, we found that extending the userid index to include the dateline speeds up the query which tends to get big, even with the LIMIT.Code:### 1256 Queries ### Total time: 7326.435305, Average time: 5.83314912818471 ### Taking 0.007848 to 84.328134 seconds to complete ### Rows analyzed 3 - 21497 SELECT postid FROM post AS post INNER JOIN thread AS thread ON(thread.threadid = post.threadid) WHERE post.userid = XXX AND thread.forumid IN(XXX,XXX,XXX,XXX) ORDER BY post.dateline DESC LIMIT XXX;
To remedy:
We also identified a few queries that could benefit from some tuning as well, but that didn't have easy fixes (relating to pagination & limits on things like showthread). In addition, we got some MySQL conf tips that I'm not going to post here as they are sort of specific to the size of your data, and your hardware. Anyway, I'd be interested in hearing from those of you familiar with this sort of tuning.Code:ALTER TABLE `post` DROP INDEX `userid`, ADD INDEX (userid, dateline);
Thoughts? Comments? Suggestions?





Bookmarks