Article Contributors: Mike Anders (IBxAnders), Pritesh Shah (priteshjshah.com)
Liked this post? Want to contribute vBulletin optimization / tech advice and knowledge? Please join Team Skunkworks on Twitter, http://www.twitter.com/INETSkunkworks.
WARNING: Project Difficulty Level is EXPERT
I’d like to begin this technical article with a serious word of warning for administrators who are thinking about installing this product suite. The vBulletin caching concept itself, as well as installation and maintenance of the various components required are highly experimental and technical in their nature. High proficiency in system administration is required for this project – please do not attempt otherwise, as no support or warranties are made or will be provided. Always have a good confirmed backup of your site before attempting to make changes to the site or the server.
Cache Rules Everything around Me
Caching is a very popular mechanism for allowing web-application to grow and serve high numbers of users (scalability). Cache is all around you, Facebook, LinkedIn and even vBulletin all use some form of caching mechanisms. Let’s briefly explain “caching” in the context of web-application scaling and more specifically those applications living in a LAMP environment. (Don’t know what LAMP is? Read about it here).
Web applications usually have a limited feature set as well as overall functionality that is highly predictable and scripted. This means that when a high-number of visitors start using the application – very similar, very repetitive tasks begin to queue and consume the server’s computing power. As an example, let’s take the vBulletin “Recent Threads” widget. This widget is a PHP function that checks the database for new threads and prints the output on the homepage. Every time a visitor loads the homepage, the widget executes 10 queries, requests information from the database and re-builds the list. If 100 users re-load the homepage at the same time – the server has to theoretically execute 1000 MySQL queries and rebuild the widget 100 times. What if there are 1000 users? The computing requirement is huge! This problem is solved with a simple “caching” mechanism. Since we know that the recent threads widget is not mission-critical to the functionality of the site we can assign a time-expiration value to it, let’s use 600 seconds (10 minutes). This means that we will build the widget only one time every 10 minutes using just 10 MySQL queries and save the output somewhere. The “saved output” is considered the widget’s “cache”. Every request that will be made within the 10 minute window will receive the widget’s pre-saved “cache” as a response. As you can see, by using some simple caching logic in the widget we’ve theoretically decreased MySQL queries for 100 concurrent users from 1000 queries to just the 10 original queries needed to build the initial cache. Now, before I get ostracized by experts in the field - please allow me to note that I will be using very basic, very theoretical examples to make a point and explain the general concept of caching. True performance metrics are a function of the environment in which they are performance as well as adherence to a scientific benchmark process. This article will not be focusing on those topics in hopes of being more accessible to all readers but at the expense very technical descriptions and methodologies. I think we’ll do fine without that, but will gladly listen and engage those interested.
Caching vBulletin: The Basic Concept
Earlier in our article we’ve used the “Recent Threads” widget example to illustrate the concept of caching as well as introduce the concepts of time and expiration. Clearly, we’ve observed a tremendous computing power savings when utilizing a simple caching mechanism on the widget.
But, can the simple caching mechanism be applied to all of vBulletin?
As it turns out – Yes!
To do this we need to identify the logic for caching. In the example earlier, we decided that we did not particularly care if the content in the widget was old – accordingly, we assigned an expiration value of 10 minutes. Let’s apply the same 10 minute cache window to all parts of vBulletin. This means that visitors will see the site as it was 10 minutes ago with no real-time updates. We immediately observe that this will not work for site visitors that are logged into vBulletin because they require real-time updates when they create a new thread, upload a picture or post a reply. We also observe that users that are “not logged in” do not have the ability to post content, they don’t expect real-time updates and have no immediate way to tell that the site they are viewing is on a 10 minute delay.
At this point we have our logic. For now, we can completely skip users that are logged-in and show them the normal vBulletin site which is updated in real-time when someone creates new content. For users that are “not logged in” – let’s create a 10 minute cache window using a super efficient memory proxy mechanism. We can determine the user login status by creating specific cookies using the “vBBoost” vBulletin product.
This now allows us to take popular pages in vBulletin and efficiently cache the entire output without the need to overload the database server for non-logged in users.
Let’s follow the practical example and diagram below.
Practical Examples: Caching Forum Index Page
In our first example, let’s imagine that we have a non-cached big vBulletin site with concurrency of 1000 users. (Please note that concurrency is not 1000 users in the “who is online” widget, but rather 1000 users requesting something at the same time). So, at this very moment, we have 1000 people loading the forum index page. It takes 15 queries to build one forum listing page output for just 1 of the 1000 visitors. In theory, this means that the server needs to execute 15000 MySQL queries to build 1000 forum index pages.
In the second example, our big vBulletin site is equipped with vBulletin Boost and Varnish technology. Just like in the first example there are also 1000 concurrent users requesting the forum index page. We observe that 500 of the 1000 visitors are logged-in, leaving 500 non-logged guests. Let’s skip all caching logic for users that are logged in and execute 7500 queries to build 500 forum index pages. We will use 15 queries to build cached non-logged-in page output and users that are not-logged in can use 10 minute old cached memory output. In total, we used 7515 queries to service 1000 users; where before we used 15000 queries. In our theoretical example the savings are a decrease of almost 50% in required MySQL queries.
How is Cached Output Generated?
The cache-write condition is simple and does not apply to all pages. In our examples, a cache write condition is triggered when a non-logged-in user loads a page for the very first time. Once the page is loaded using MySQL queries – its output is placed into memory storage for 10 minutes. After the initial page load no MySQL queries are executed for 10 minutes, instead – the page output is served from the memory directly. After the 10 minute expiration window the process needs to be repeated automatically by the system. This method allows us to cache only the content that is popular and will benefit the server’s performance since many non-logged-in visitors are requesting it.
*Why did you choose Varnish?
Admittedly, part of the appeal were Varnish’s great reviews from the community and impressive benchmark tests. We picked it for its relative ease of configuration and installation as well as extremely low CPU overhead requirements.
Generic Installation Process Overview
Please note that the instructions below are a general overview of the installation process. Every server will have its own specific installation parameters, challenges and requirements. As I’ve mentioned earlier in this article – please make sure you have a backup of your site and you proceed only if you are an expert Unix administrator.
- First, you will need to install “Varnish Cache” on your web-server. (http://varnish-cache.org/). Please note that additional packages such as “http-devel” / “mod_rpaf” might be required for a successful installation. This depends on your existing server configuration.
- After Varnish is installed (but not active) you will need to configure it. Attached you will find a command line shell script that will generate a configuration file for your site and server with appropriate custom settings. (To be executed from command line, php boost.php). We’ve gone ahead and pre-filled additional options to make Varnish play nice with vBulletin.
- If using Apache web-server, you will need to disable gzip/deflate rules in httpd.conf
- … and disable GZIP compression in vBulletin AdminCP.
- In Apache httpd.conf, activate the mod_rpaf module:
LoadModule rpaf_module modules/mod_rpaf-2.0.so
- Change Apache configuration to listen on Port 81 instead of 80.
- Install the vBulletin Boost Product XML.
- In MySQL, truncate the vBulletin “session” table; this will force everyone to log out and eventually set an appropriate Varnish cookie.
- Restart Apache Web Server
- ... and start Varnish
- You can verify what’s running on various ports via lsof –i:80 and lsof –i:81 commands.
Tips & Tricks
This section will be updated periodically.
- Use “varnishhist” plotter as a debug tool to monitor caching performance.