View Full Version : Suggestion: Write static HTML pages to improve speed
mainframe
Wed 16th May '07, 10:46am
Hi vBulletin,
Static HTML pages are served much faster then PHP pages.
When the content changes all the time it would be better to use PHP because writing new HTML pages on the server would costs more server resources. But when pages don't change very often the use of static HTML should improve the speed dramaticly.
My suggestion for vBulletin is to implement a system that writes all pages into a directory in static HTML format which should be served to the user if the page isn't changed. if the page is changed the system should update the static page by writing the new content to the static file.
users with extra options that shouldn't be saved to a static file should always load the PHP file. But the system can be used to serve static pages to guests and search engines.
Sincerly Yours,
Danny
http://www.funfiles.ws/
Wayne Luke
Wed 16th May '07, 12:17pm
Not possible unless we remove 2/3rds of the features in vbulletin which are shown dynamically depending on a variety of factors. This includes all the current AJAX support, hidden forums, deletion notices, expiring redirects, and a lot more.
mainframe
Wed 16th May '07, 12:34pm
I know most of the pages are dynamicly, but http://www.vbulletin.com/forum/showthread.php?t=230048 (this thread) for example is viewed 100 times without any changes. A PHP script can write a static copy of the page and show this to guests and update it when there's a change made.
Ofcourse moderators should always view the php version due to the extra code like inline moderation.
Hope you get what I mean...
Regards,
Danny
Wayne Luke
Wed 16th May '07, 12:52pm
vBulletin was originally created because static HTML pages didn't hold up to the demands of larger websites. This is how UBB.Classic worked. It created static pages for threads and it would often start having failures at 100,000 posts or more.
mainframe
Wed 16th May '07, 2:38pm
Wonder why it failed.. It should store the data of 100.000 posts into the database like it is doing now and write out the files from the data in the mysql table just to serve them when there's no dynamic content to load.
In many cases the dynamic php file should be loaded, but in those cases it's not needed it would defenitly save server resources.
I'm always looking for ways to improve things even if it's only 0,1% so if the write static files on the server to use in cases dynamic content isn't needed uses less server resources than using dynamic pages every time it's worth a look.
Maybe a system should check if there's no dynamic content or change needed and then include the static html from file if available. But it would add a directory with thousands of html files on the server.
Regards,
Danny
Reeve of Shinra
Wed 16th May '07, 4:49pm
If you do a search through suggestions, there is a similiar topic regarding archiving old threads to a seperate database or to HTML flat files.
I think the debate was settling on a seperate database for efficiency since, as Wayne noted, it would take more server resources to generate and update various sets of flat files.
feldon23
Wed 16th May '07, 6:12pm
Seriously, read the history of UBB Classic and how vBulletin was created as a replacement for it. UBB used to own the Internet forum market. Now people say "UBB, who's that?". They totally lost the Internet forum market because they failed to embrace databases until it was too late.
You point to showthread.php as an example of something that would benefit from being a flat HTML file. A guest, registered user, moderator, and administrator all see different things when they view this page. So you would have to have, at minimum, 4 copies of this page and serve different copies to different users. And every time there is a change, the server would have to rewrite those 4 files. Rewriting 4 files is slower than making 1 hit to the database. And you would still lose a LOT of features.
Now, I would like to see a true Archive, where posts are served from a cache using a Perl front-end. But that would be locked posts that the public can view. No permissions, no editing, all the features turned off. The current "Archive" (not even sure why people call it an Archive since the proper name is "Printable Version") does not really save much on server resources.
Simetrical
Wed 16th May '07, 9:44pm
Well, heavier caching might be beneficial for performance, certainly. The cache could not be used for moderators or admins, of course (who would likely be few enough that you'd just not bother caching their requests), but in particular, probably at least 50% of visitors to typical boards are unregistered, and they'll all see the same thing. Caching those pages' HTML completely for unregistered users might provide significant performance gains for them, requiring a single database query for a chunk of plain text. Once you get into registered users, they can have preferences to change the display of pages (are they subscribed? PMs enabled? buddies? ignore lists? sigs/avs/images on? linear/threaded/hybrid? posts per page? time zone, DST, start of week? skin? language? etc.), so it might not be worth caching for them.
Flat files would probably not be the best implementation of caching, however. The database is likely to be at least as fast and flexible, and of course APC or Memcached or similar could be used if available. Something tailored to high-speed caching like Squid could be used for even greater efficiency, avoiding PHP altogether. It would likely be ideal to store the pages gzipped, which would save on disk/memory space and is probably how you're transmitting them anyway.
When something forces the cache to be cleared, you don't need to regenerate the page right away. Instead, you can regenerate it on the next view, which you'd do anyway under the current system. The only issue is if a change to the interface or whatnot forces a large number of pages to be cleared from cache, in which case a very large DELETE statement might need to be run. On the other hand, you could always just TRUNCATE if it's excessively large and rebuild everything on next request.
Dynamic features don't have to suffer at all. The client side is all JavaScript, which is stored in the filesystem even now. Server-side you're just talking some URLs that won't be cached, which will obviously exist whatever caching you do, or some URLs that will have their cache purged frequently (which depending on the rates you're talking about may or may not be worth caching in the first place).
Of course, this would all take a fairly huge effort to write, and its main benefit would be to very large sites, since for small sites caching often uses up too much disk/memory relative to request rate to be worth it. I don't expect to see it in vBulletin anytime soon given its market. It's perhaps worth noting that this has been a major feature of MediaWiki since day one, given its very different target audience (which is why I know something about it). The Wikimedia server farm uses 45 Squids for reverse proxy caching, in addition to ~150 Apaches, which noticeably decreases page load time for anonymous users — try comparing page load speed when logged in versus out. But as I say, probably it's not important enough for vB to bother with. I think big boards would prefer a long list of other things first.
the geek
Thu 17th May '07, 11:37am
The big problem that comes into play with this is the flexibility and speed of the templating system.
If you have a cache of 100,000 static pages and add another style. You would double your cache size.
Change a stylevar or any line in an affected template and you would then need to rebuild the entire cache again.
A cache of 100,000 posts * number of styles * number of usergroup combinations = A lot more overhead then I personally wish to think about.
Saying that, there is some gain to be had by looking to cache different aspects that seldom change (navbar, header, footer, etc...). Im not sure how beneficial it would be, but there you go :D
Simetrical
Thu 17th May '07, 7:32pm
The big problem that comes into play with this is the flexibility and speed of the templating system.
If you have a cache of 100,000 static pages and add another style. You would double your cache size.
No, because you would still have one default style, which an overwhelming majority of users (at least unregistered users, who as I noted are the main target that can benefit from this) would use. You would probably not want to cache styles used by only 1% of viewers.
Saying that, there is some gain to be had by looking to cache different aspects that seldom change (navbar, header, footer, etc...). Im not sure how beneficial it would be, but there you go :D
Perhaps, yes. Parsed posts and some other things are already cached. Possibly those are too, I don't know.
Wayne Luke
Thu 17th May '07, 7:44pm
Templates are stored in a pre-parsed format. e.g. All the conditionals are turned into their PHP counterparts, variables expanded where needed and such.
Reeve of Shinra
Wed 23rd May '07, 10:56pm
A co-admin had the insane idea of running the DB straight from RAM with query's being cached and updated to the HDD every couple of seconds or so... sounds insane but sounded like it might just work too.
Simetrical
Thu 24th May '07, 10:47pm
A co-admin had the insane idea of running the DB straight from RAM with query's being cached and updated to the HDD every couple of seconds or so... sounds insane but sounded like it might just work too.
This is pretty much a default feature on a lower level than vB. MySQL will attempt to store stuff in memory depending on its configuration (e.g., query cache, index cache, and so on). Furthermore, the operating system will cache recently-accessed files in available memory automatically. All writes will still go to the hard disk, of course, but I expect they'll be buffered pretty optimally on the MySQL/OS level. No need for the application to buffer database writes, necessarily. I don't know if MySQL has any capacity for the full tables being explicitly stored in memory, or whether that would be a significant performance improvement.
merk
Fri 25th May '07, 4:04am
Full tables can be stored in memory if you use the HEAP engine. Its not a good idea because if MySQL crashes you'll lose that data (afaik).
I agree with Simetrical in regards to using more caching rather than writing static pages. But it would definatly blow out in size but for boards requiring their own servers, disk space is cheap.
sabret00the
Sat 26th May '07, 4:46pm
I've cached my shoutbox's old post in html format here (http://www.ebslive.com/active/chancery.php). given the nature of shoutbox's it's a vast improvement on server load, but it's costly in terms of space. plus i feel it's a lot slower than my regular forum pages.
Powered by vBulletin™ Version 4.0.2 Copyright © 2010 vBulletin Solutions, Inc. All rights reserved.