PDA

View Full Version : Why are templates stored in database table? *Very Bad* design.


danbeck
Sun 25th Jun '00, 5:49am
I'm curious, why are html templates stored in a mysql database table. For one page, no matter how insignificant, not only do you have to make database calls for the thread and post data, you have to make uneccessary calls just for html that would easily be stored in files. On a heavily loaded system, it seems ludicrous. The profile page itself makes a seperate call fro icq, aim and yahoo templates. Why couldn't you have just used an if/else statement in the main html template instead of seperating the data causing you to have to make seperate calls.

Did you test the efficiency of these calls before making this design decision? The current design is not well suited to large installations with a lot of hits. Your forum is nice, but it requires way to much processor and memory to do what it's trying to do. The MySQL engine is a way to store a lot of information for quick retrieval.. it is not intended to replace a filesystem or anything else and it will perform pitifuly if it's expected to.

UserName
Sun 25th Jun '00, 5:57am
I don't know enough about MySQL to argue with you, but I can tell you, for a fact, that this software uses FAR less resources on my machine than UBB did - and UBB itself is used on some pretty big sites. I don't doubt that vB could be more efficient, but I have a strong feeling that this software could handle a *huge* site. I currently do over 500 meg a day through my vB and I rarely see load averages above 1.0 (that's on a server averaging close to 50 gig a day total.)

danbeck
Sun 25th Jun '00, 6:38am
Yes, it does use far less resources than UBB.. mostly for the simple reason that it's not vanilla Perl based.. a UBB installation using mod_perl instead of vanilla Perl would be just as fast as vBulletin. The less database access you can perform, the better of your server will be and the faster it will be. There is a point at which less access makes for more programming time and less functionality, but there is also a point where you are using the database to store the time of day because you don't want to use strftime in your code.

Martin
Sun 25th Jun '00, 7:00am
you do make a very good point. If the templates were stored in flat text files, would the saving of strain be on the server as a whole be less, or just on mySQL?

If not flat text, then as PHP files? If you stored them as PHP files, wouldn't that lead to problems in and of itself?

Please excuse my ignorance here. PHP is still fairly new to me, as is mySQL.

JimF
Sun 25th Jun '00, 7:12am
I have seem some "large" boards handle vB very well, despite the templates being stored in the db tables. I don't know the reason why it is designed the way it is, so I can't tell you any technical reasons. But I have'nt noticed any decrease in performance with the current setup. If I'm not mistaken, the only downside MySQL has is the amount of simultaneous writes that it can do at a certain point in time. Since you're not "writing" to the templates on each page load, the MySQL shouldn't have too much effect on it.

But I could see the physical reasons why this could be a problem, and I wonder if it would significantly speed up the vB by having them stored as files?

danbeck
Sun 25th Jun '00, 7:30am
Well.. I'd like to see myself. For the reasons.. my guess is that it makes template editing easier on the programmers.. but it's nothing that they couldn't have done with files. The gettemplate() function they use could just as easily be a fopen() wrapper than an mysql_query() wrapper.. just not as easy.

wandrer
Sun 25th Jun '00, 8:26am
Unless you know which is faster:

opening a file and reading in its contents

- or -

connecting to mysql and issuing a select statement

any arguement about speed is for nothing.

What is the difference between 100 file accesses and 100 connections to a mysql database in microseconds ?

danbeck
Sun 25th Jun '00, 9:07am
Opening a file and reading it's contents is faster... if you know any thing about the MySQL engine, it has to do that also, unless your query can return everything it needs by hitting the index. Additionally, there is an overhead of actually locating the file in the index. It's extremely fast, but not as fast as just opening a file.

And even still.. there is an overhead of making the mysql connection unless you happend to have some sort of connection persistance setup.

It may seem stupid to you to be worried abou this, but if you ever have a large site, it's important to use your resources efficiently, not just throwing caution to the wind. If your mysql server isn't seperated from your webserver, you will see that mysql takes up more processing time than apache.. and all apache is doing is reading files from this disk. (in simple terms)

And it's not all about that 100 microseconds of speed.. it's about slamming the processor unecesarily when it could be working on a complex query or just serving more webpages...

Again.. to small sites.. big deal, use the MySQL engine as much as possible.. to large sites.. caching systems, optimized queries and smart use of the engine is paramount.

GUI
Sun 25th Jun '00, 9:17am
So would it be more efficient to have the whole board read and write to files rather than to a SQL database?

wandrer
Sun 25th Jun '00, 10:24am
Again.. to small sites.. big deal, use the MySQL engine as much as possible.. to large sites.. caching systems, optimized queries and smart use of the engine is paramount.

Assuming that you already have Zend Optimizer installed, and you dont change the 'pconnects' (persistant connections) to 'connects' in vBulletin, and mysql server is set up for the highest performance (as directed by mysql.com), and your system has enough memory to perform, and apache is tuned for high performance (as directed by apache.org) , and you have a fast disk subsystem, and you have an adequate cpu backing the website --- then you might want to look at other things in vbulletin to see if they can be optimized.

For the most part, one or more of the above could be done to enhance the website experience/performance that would give better website improvement than trying to optimize sql queries.


<edit> and PHP is compiled into apache and not run as a CGI </edit>


[Edited by wandrer on 06-25-2000 at 09:24 PM]

JimF
Sun 25th Jun '00, 10:30am
Originally posted by GUI
So would it be more efficient to have the whole board read and write to files rather than to a SQL database?

The whole board? The templates I can see being better stored as a file, since they are used on every page load. But to store all the posts and everything else in files rather than a db?? Come on now, that is so 1990's... there was a company back then called UBB that did that. I can't think of one good reason to have the board operate totally on static files, except for compatibility issues.

Shaman
Sun 25th Jun '00, 10:46am
Files are fine, but at some point you get what is called "head hysterisis".

It's when the heads on the disk cannot keep up with the files being read and written because they are all over the disk, and the heads have to roam all over the place.

At some point, the drives get to a point where they are simply unable to keep up.

Databases are much smarter about how they read and write files. Writes tend to be done sequentially and very efficiently to disk, and they tend not to write to disk during busy periods unless there is an event that forces the write (part of the reason that your DB server should be the most reliable hardware you can find, with the most reliable power).

While there is overhead associated with DB calls, generally pipelining and persistent connections can make a database nearly as fast as if you had a low load on a files-based system. PHP4, of course, makes up for a HUGE discrepancy in speed compared to CGI, which can make loads much higher - but forget that part for a minute.

Because VB makes DB persistent connections (which sold me on it the hour I found out about it) you can pipeline many requests through a single connection to the DB. And, I've looked at VB's code - it treats MySQL relatively easily.

Internally, MySQL is much faster than a file system, so really, the chances are that if you have a separate DB server or the DB server saves its data on its own separate drives, it will return results very quickly even under high loads. In fact, under extreme loads, MySQL can outperform a mildly fragmented file system by a huge factor.

Here's an example to try:

do this - "time mysqldump 30k_record_db >/tmp/txt"

and compare it to this:

"find /usr -mtime +1"

Which gets done sooner? Now run a load generator (like Bonnie) on the two systems to simulate a heavily used system. Now what? The DB degrades, but NOTHING like the file system does, because it's smarter about how it stores data. The "find" will take... well, just about forever.

Once the disk/memory cache on the machine is used up, the filesystem can get really slow, really quickly.

Anyways - that's a very, very amateur method of demonstration but it gets the point across. Getting data from DBs can be an excellent choice.

GUI
Sun 25th Jun '00, 11:11am
Originally posted by Shaman

VB makes DB persistent connections

What exactly is a persistant connection to a database? Also, what is the function to do this?

Shaman
Sun 25th Jun '00, 11:17am
Persistant means that they don't tear down and rebuild the connection to the DB server until they're all done with their queries for that process instance.

Otherwise a process that made, say, seven SQL queries would also have the overhead of opening, then tearing down a TCP/IP or socket connection for each query.

danbeck
Mon 26th Jun '00, 12:06am
GUI: If you were being sarcastic, your comment was ignorant. Please re-read my post.

Wanderer: Dual 600, 512MB RAM, SCSI Subsystem, optimized apache + PHP4. I don't have the Zend optimizer installed since it's not a final release yet. (This is a production system.) But I will when it is final. I think I have a little room to say that it's a poor choice to store templates in the database.

Shaman: You are comparing apples to oranges. The php scripts would not have to search for files.. only open a single file in the file system.. the filesystem does not use 'find' to locate a filename that I give to fopen(). And on top of that, you are comparing a mysqldump read and write to a read from find. That makes no sense. If you've ran Bonnie on a filesystem before you will know that read and write times vary quite a bit.

You also seem to center around file fragmentation of which a Linux filesystem does not have problems with. Only Windows and it's related filesystems have fragmentation issues.

I'll state my point again. Unecessarly use of the database engine is poor programming. Although there is a point that the increase in efficiency removes the functionality that the engine gives -- see GUI's comment -- I think that the file subsystem could serve templates as fast or faster than the mysql engine could in this case and the processor time is wasted on them.

wandrer
Mon 26th Jun '00, 12:40am
. I think I have a little room to say that it's a poor choice to store templates in the database.

What is the load you are experiencing on your computer with the templates in the database ?

What is your load that mysql is generating on your computer with the templates in the database?

Edit: Basically, are we talking about a .005 increase in load on your system or are we talking about a 1.0 increase in load. With the proper system optimization vBulletin, out of the box, could be 20x as fast as UBB. How much work are you willing to put into it to make it 20.2x as fast as UBB ?



[Edited by wandrer on 06-26-2000 at 11:59 AM]

Shaman
Mon 26th Jun '00, 12:55am
Actually, the comparison is valid if highly skewed.

What I am demonstrating there is the fact that there *is* fragmentation, even on a *nix system, and that when the heads are accessing files (which is not dramatically different from searching using "find"), they are subject to hysterisis.

On the other hand, databases are highly optimized for picking pieces of data out of their storage randomly, with highly selective finds. More efficient than a file system.

Now, I agree that the test is highly skewed and somewhat misleading, but on a very busy filesystem where the heads get hysterisis, you get a similar effect. When head service times go over 100ms, things can snowball down in a hurry.

Again, I'm not saying the test is definitive, I'm saying it's a fair approximation of what happens during head hysterisis.

When you do a search on UBB, for instance, you get a lot of drive activity though they've tried their best to make it as efficient as they can. Get three searches going at once. Whoopee!

One of the forums I used to frequent - http://www.carforums.com, was brutally slow. I mean 5 second lags for a post. Sometimes it used to come up with a "CGI CPU limit exceeded" message. Searching was taking too long to be worthwhile at all. Now with vbulletin on the same hardware, it's not only much faster, but searches are just as fast as any other activity - and they are using PHP3 which is 1/4 as slow as the PHP4/optimizer. They need to pretty it up though :/

danbeck
Mon 26th Jun '00, 12:57am
The system load is around .11. MySQL is using about 14% of the processor. We were previously running on a single PIII600 box with only 256MB of ram and the system load was 2 and 3 at times.. even reaching 6 and 7 on peak loads.

The box was running UBB and choking so we moved to vBulletin, but it pretty much put a bullet in the temple of the server due to the mysql load. We moved the forums themselves to a dual processor box to handle the load.

wandrer
Mon 26th Jun '00, 1:04am
: Dual 600, 512MB RAM, SCSI Subsystem, optimized apache + PHP4

mySQL is using 14% of CPU on the above system ? Ok, now I need more facts... Could you post your 'extended-status' and 'variables' of your mysql configuration ? When you reach the 14% of CPU, could you post the stats of 'top' ?

Also do you have a url that you could post here for your board (or just list info - users/topics/posts per day/hits or visits per day total) ?

What version of mysql and what options were compiled into it ?

Shaman
Mon 26th Jun '00, 1:10am
Uhm...

256MB of memory is truly nothing for a large application these days. Talk 1GB or more. At least.

Also, it sounds like you need to work on your MySQL optimization. You wouldn't believe how much work our MySQL server does without bumping the load on its system at all. During 10K record inserts while still handling all the other tasks it does, the load doesn't go beyond 10%, and selects don't seem to change in result time.

Your DB server should be given, at a minimum:
512MB memory
3 RAID-5 or RAID 0 UltraSCSI drives (10K RPM preferred)

You can partition up a system to provide that, if you have a decent system.

That's a tiny DB server. Your web server shouldn't be given something much smaller, either. DB and Web servers are all about getting the job done very quickly so requests don't start piling up.

I've built 16GB Sun DB machines with 16 processors (I'm Sun certified up to 32 processor systems) and hardware RAID that spans 96 drives with parallel paths and 8 fiber controllers on a single partition.

You can argue with me if you want, but if you're not getting instantaneous results, don't complain to me. I am. :)

[Edited by Shaman on 06-26-2000 at 12:15 PM]

danbeck
Mon 26th Jun '00, 1:25am
Shaman: What kind of disk bandwith is going through your system if your drives are acting like that? Also, you didn't demonstrate that there is fragmentation. You mentioned drive hysterisis, but that doesn't mean much to me.

Also, your comparison is still very skewed. Find has to read very single file in the path you specified until it finds a file.. a single disk access goes directly to the first inode the file is located at and begins reading.. it might have to move.. but it doesn't move through the entire filesystem like find does. I fail to see the link there, other than the fact that the both perform a "read" system call.

Also, a webserver filesystem doesn't do a lot of deleting.. it's not on var and it's not on tmp. How will it become fragmented?

Here is a real world example. One day, I (stupidly) decided to use blob columns to cache automatically generated pdf title and appraisal reports in a mysql database. I figured that it would be quick and easy retrieval. The indexing was numeric and the engine would find a file in a matter of milliseconds. The problem reared it's ugly head when mysql was spending most of it's time serving the file itself. Lookup was not problem, but sending the data was. The mysql load would spike when pdf docs were retrieved and since the server was already at a constant 25% cpu usage.. it didn't do much good. I ended up moving to a simple single directory file storage scheme that was just as fast as the mysql engine, but didn't require a fraction of the cpu time and no memory to serve the file. It was faster on the web browser side too.

wandrer
Mon 26th Jun '00, 1:31am
Also, it sounds like you need to work on your MySQL optimization.

That is the feeling I was starting to get. In and of itself, mySQL is robust and optimized - if given the appropriate hardware. Though, you could run mysql and vbulletin on a linux 386 with 16mb ram, but don't expect good results - that doesn't mean that vbulletin or mysql need to be optimized. Depending on your traffic, a server that is sized properly for what you are wanting to do is the first place to look for optimization.

But, i still would be interested in seeing the variables and status output of your mysql server.

danbeck
Mon 26th Jun '00, 1:34am
Um.. we are talking about web servers here, not massive application servers. Your setup sound impressive and I'm even impressed at how much experience you have with l33t hardware, but it's very much overkill for many websites. And vBulletin is *not* a large application.

I think that you have entirely missed my point. Sure, I could buy all that expensive hardware you just lathered all over your post so I could run one small forum that only does 500MB a day in traffic, or I could write my application as efficiently as possible considering both the low end and high end of the hardware spectrum.

My point was about efficiency and good design, not about pushing your hardware to the limit so that your it manager can have an excuse to get another raid cabinet.

Shaman
Mon 26th Jun '00, 1:36am
What you ran into is that MySQL encodes/encrypts its protocol to provide some modicum of security (otherwise just plain old 'tcpdump' and 'netcat' is all you'd need to get all the data you want out of a server, pretty much). So, the server was encrypting those large chunks of data before it sent them - big-time server load. Many DB systems do the same thing. BLOB is there but I don't suggest you use it for high volume binary data serving, myself - on any server. MySQL in particular is meant for lightweight serving.

Onwards. Want to see a UFS file system fragmentation report? Easy.

fsck /dev/some/direct/disknumber

(fsck /dev/dsk/c0t0d0s0 on Solaris, for instance)

It'll have to be unmounted, of course.

You'll see that there is some fragmentation. But that's really not the point here. The point is that files are usually all over the hard drive (especially when there are thousands like UBB), and the heads take time to travel to them. I don't mean pieces of the files (ala fragmentation), I mean the files themselves.

I don't know what you mean by not writing to the drive much. UBB writes to the drives like a demented thing. :)

Oops. You edited. :)

The setup I use is as follows:

DB server
Dual 550Mhz PIII system
1GB memory
4 RAID 1+0 drives (mirrored, with RAID0 stripes)
Solaris x86 + GNU tools (Perl, MySQL, GCC, etc)

Web server + RADIUS server + mail server
Dual 400Mhz UltraSPARC system
512MB memory
12 drives in various partitions
4 RAID 1+0 drives handling the web partition and /opt,/usr
Solaris + GNU tools (etc) + Apache + PHP4 + MySQL client

Total hardware + software cost: $25K Canadian for what is actually running our web applications and DB applications - pricing when I bought it, because it's cheaper now and I wouldn't buy what I have if I were to buy it again today - I'd have faster drives, memory and processor today. That is not 'leet hardware, dude. It's economy, as far as I'm concerned. :)

BTW, the MySQL server was last rebooted in May of 1999. Just for reference.

[Edited by Shaman on 06-26-2000 at 12:47 PM]

danbeck
Mon 26th Jun '00, 2:03am
Shaman: But I'm only talking about 100 template files.. not 1000's of UBB files. I'm not arguing that mysql access isn't fast, I'm not arguing that we should all ditch mysql. I'm not arguing that vBulletin is slow.

Did anyone read my first post?

wandrer
Mon 26th Jun '00, 2:04am
danbeck,

My point was about efficiency and good design

No use arguing unless we know how your mysql is setup. My point is that a properly setup mysql system shouldnt be using that much cpu usage.

Shaman
Mon 26th Jun '00, 2:21am
Yup.

Pilot
Mon 26th Jun '00, 4:39am
I think that the PHP files, templates and anything which is not actually part of the bulletin board discussion data should be in flat files. Easier to manage (for non DB humans), backup, copy, rename (version control) and edit them etc that way. Templates are effectively software.

Also by having some of the I/O take place on the web server it will in fact distribute the load better for those using hosted facilities where the MySQL server is separate. This will increase the chance of not getting complaints about too much DB load. Most hosts expect you to use MySQL for your "business" data only - not to store your application in. VB is going to push hosting limits anyway due to it's intensive use of the database (compared to most web sites) and it would be good not to overdo the DB usage.

I do hope the templates are cached in the application - I can't believe it would retrieve them each time. I guess if VB does not run as a daemon task then it must re-fetch them.

Shaman
Mon 26th Jun '00, 4:47am
I've watched the vBulletin stream and it's quite efficient even though it grabs lots of data from the Db.

It takes its information from a persistent connection, pipelines lots of data and generally gets the job done in a pretty good hurry.

There's probably room for improvement but it's very good as it is now.

Try this: attach an SNMP probe to your MySQL server and see how much traffic flows out to the web host. So far for me, not much. I'm averaging less than 10K per second from the Db host, with DB integration in every page on our corporate web site.

wandrer
Mon 26th Jun '00, 4:48am
push hosting limits anyway due to it's intensive use of the database (compared to most web sites)

Compaired to what ?? Phorum's intensive use of the database ?? UBB's intensive use of flat text files ?? Explain the 'intensive use of the database' ??

Shaman
Mon 26th Jun '00, 5:02am
That's the issue, wanderer.

And issue #2, MySQL is faster at it than a file system. :)

MySQL process on my box:

277 root 67 58 0 84M 2804K sleep 49:58 0.03% mysqld

A whole 0.03% CPU time for running about eight web sites, one of them with really extensive SQL calls (TWIG, vB, et al). Granted, it goes as high as a couple percent when the 12-way select on 6M records goes live for the user login tracking database.

:rolleyes:

update: scratch that, I did a search on the big DB (12 way select on 6M records) and it bumped the load to 0.37% and returned an answer in 0.07 seconds. C'est la vie.
Thank Ghad for that extra processor, or I'd have held up all those people trying to get to my website ;) ;)

[Edited by Shaman on 06-26-2000 at 04:29 PM]

wandrer
Mon 26th Jun '00, 6:05am
Shaman,

I'd be interested in seeing your mysql settings/variables/options, if you wouldn't mind posting them.

bira
Mon 26th Jun '00, 10:59am
480 root 2 0 29M 5888K sleep 204:14 20.75% 20.75% mysqld

No wonder mysqld crashes on my server all too often :(

Shaman
Mon 26th Jun '00, 12:34pm
You can see my settings in the server forum, I posted them a few days ago.