Questions about the new datastore types

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • KrON
    Senior Member
    • Jan 2002
    • 705
    • 3.6.x

    Questions about the new datastore types

    Evidentally eAccelerator support has been removed, but I have a question that pertains to it (and mmcache and other memory based caching).

    How does this work if you have more than one webserver? I get how an instance of vbulletin can talk to a memcached server, but what about separate instances of vbulletin each running mmcache/eaccelerator?

    How do they stay in sync? Or do they not? Is memcached the only option for sites with multiple webservers?
    Kyle Christensen
    PbNation.com - one of the biggest and busiest vbulletin forums on the net!
  • KrON
    Senior Member
    • Jan 2002
    • 705
    • 3.6.x

    #2
    Correction: I see that both mmcache and eaccelerator are unsupported, so I guess my question only pertains to the filestore option.
    Kyle Christensen
    PbNation.com - one of the biggest and busiest vbulletin forums on the net!

    Comment

    • RCA
      Senior Member
      • Mar 2003
      • 133

      #3
      Im moving to multiple webservers configuration, and currently looking at that problem (also at attachments problem)

      Comment

      • Mike Sullivan
        Former vBulletin Developer
        • Apr 2000
        • 13327
        • 3.6.x

        #4
        How do they stay in sync? Or do they not? Is memcached the only option for sites with multiple webservers?
        They don't do any syncing.

        Comment

        • RCA
          Senior Member
          • Mar 2003
          • 133

          #5
          What could be best option using filesystem cache?

          Im thinking about add a scp copy command after build, so datastore_cache.php get copied to all servers on every modification.
          But not sure what kind of problems can create this.

          Comment

          • KrON
            Senior Member
            • Jan 2002
            • 705
            • 3.6.x

            #6
            What do you mean they don't do any syncing?

            If you are using memcached, it seems to me as though its not possible to not be in sync. Memcached is more or less like a database in memory except you are fetching objects instead of running queries.

            If I'm wrong, and theres no way to keep filesystem cache or memcached in sync, I am going to once again be thoroughly disappointed in Jelsoft. The whole point of memcached is a centralized point of quickly accessable data objects. I've read many times about how much success livejournal has had with it (after all they wrote it), and it is the #1 feature I'm looking forward to in vb 3.5.

            Us big sites NEED more functionality like memcached support. We've already hacked a bunch of site related functionality into the datastore that isn't already there, but our datastore db table gets a lot of traffic. Moving it to memcached is undoubtedly going to help us greatly.
            Kyle Christensen
            PbNation.com - one of the biggest and busiest vbulletin forums on the net!

            Comment

            • John
              Senior Member
              • Apr 2000
              • 4042

              #7
              The issue is that vBulletin is not multi-web-server aware. Therefore if some bit of data is stored on the webserver (eg. attachments in filesystem or a local memcached copy of the datastore) then changes will only affect one of the many webservers. One way around that would be to mount the attachments directory on an NFS share (but then each attachment must be read oof the network).

              memcached allows multiple servers to be run alongside each other and to share the load. So, in a 3 memcached setup, $a is stored on server 1, $b on server 2, $c on server 3, $d on server 1, etc. So whether webserver 1 or webserver 2 updates $a, the update will always end up on the correct memcached server. The down-side is that this will involve a call across the network in (n-1)/n cases (which I believe is the bottleneck in the current setup where the datastore is kept in the database).

              If each webserver runs its own memcached server too, then the problem comes of replication. ie if webserver 1 updates the options and updates its own memcached server, how do we tell webserver 2 that it must also update its options and its memcached server? Until vBulletin becomes aware of multiple web-servers, the solution is always going to be messy! If this problem is to be solved it will mean vBulletin creating replication features on par with database solutions, which seems rather like re-inventing the wheel.

              Sorry this post may not help much. I suspect that the issue of how to get vBulletin to scale around the limits imposed by MySQL isn't going to disappear overnight, but we'll keep on working at trying to improve vBulletin's scalability.
              John Percival

              Artificial intelligence usually beats real stupidity ;)

              Comment

              • KrON
                Senior Member
                • Jan 2002
                • 705
                • 3.6.x

                #8
                John,

                Thanks for the reply. I understand how it is possible for there to be 3 different instances of the datastore in memory (for a 3 server setup with 3 instances of memcached), but I was under the impression that the entire point of memcached was to be able to make this information available for multiple machines to eliminate unneeded database calls.

                In a single server environment, memcached support in vbulletin basically acts the same as the MySQL query cache. It can help alleviate some of the redundant calls to the database between changes in the datastores' information.

                I may be misunderstanding the memcached webpages code examples, but if you were to store the datastore in your sites single instance of memcached from server one with a name of say "foo", couldn't you just retrieve the "foo" object from subsequent webservers? If server 3 needs to update data in the datastore, it can just "set" the "foo" object to a new value in memcached.

                This is assuming you are running one instance of memcached for ALL of your servers, as opposed to one on each webserver.

                Our bottleneck isn't the local network traffic, its the sheer volume of queries that hit our database, sometimes over 1000 queries per second. MySQL query cache can only help so much based on the way that it is designed, but memcache support with multiple servers fetching cached information over the network would let us bottleneck the network between the webservers and the memcached, which to us is much cheaper than CPU cycles on a database server.

                Is it possible to have one instance of memcached and multiple webservers based on my assumptions? I'd rather find out that my network is too slow than have to deal with finding a bigger machine to run MySQL, or to farm off more queries to slave servers and have to deal with MySQL's pitiful replication.
                Kyle Christensen
                PbNation.com - one of the biggest and busiest vbulletin forums on the net!

                Comment

                • John
                  Senior Member
                  • Apr 2000
                  • 4042

                  #9
                  It sounds like you've got the right idea about memcached.

                  What in your experience is the bottleneck in such a large setup. webserver or mysql server? CPU, hard disk, network or other?

                  memcached could certainly help out, with phrases and templates as well as the datastore - basically anything that doesn't change very often.
                  John Percival

                  Artificial intelligence usually beats real stupidity ;)

                  Comment

                  • KrON
                    Senior Member
                    • Jan 2002
                    • 705
                    • 3.6.x

                    #10
                    So, if we have one memcached instance and all of our webservers pointing at it, it *should* work, right?

                    In my experience, the largest bottleneck is by far the MySQL side of things.

                    A bit about our setup:

                    We have 3 dual xeon webservers w/ 1gb of ram each. Each runs thttpd for images and apache for dynamic content.

                    In front of the webserver is a Alteon loadbalancer that loadbalances 2 groups, 1 for dynamic content and one for static content (follow me?).

                    We have an external mailserver that handles all email.

                    We have 2 database servers both dual xeons, one w/ 4gb of ram and one with 2gb.

                    The main database server takes the brunt of the load. We replicate the entire vbulletin database to the slave, and then make some changes on the slave (fulltext indexes).

                    Our vbulletin is hacked to cache the latest 4 months of posts (which is about 4 million for us) in a "postcache" table that has fulltext indexes. We use the natural language search to search against this table.

                    We have a "supporting member" userclass for members who donate who can search against this postcache which is updated hourly at any time. Normal members can only search if there are fewer than 1500 people on the site (which isn't very often). Sucks for them, and sucks for the supporting members who can't search against the whole post/thread tables, but they are just too big (we have over 14 million posts). On a dual xeon w/ 2gb of ram, a single fulltext search against our full post table can take upwards of 45 seconds.

                    We also slave some read only queries from the slave database for non mission critical information.

                    What I have found is that even w/ a highly optimized mysql server, sparing use of innodb for row-level locking (user and thread which means we can't do fulltext title searching), slaving off some queries, we still have an INSANE amount of queries hitting our database.

                    We've hacked stuff like our ad rotation software into the datacache so we can save queries per page (just removing those 2 queries per day saves us roughly 1.6 million queries per day), but at the end of the day, we still have a rough time keeping the site running when there are over 2500 users online. I'm sure we'd have well over 3000 if the site could handle it. Our max online users is over 3500 users, but the site slows down even with our agressive optimizations.

                    We do all sorts of apache/thttpd trickery like caching the vbulletin js files by mimetype (using mod_expires) and using gzip to keep page size down, replacing many of our images with text, etc, but we still get bogged.

                    Under normal usage, or apache servers only see about 90 queries per second. The thttpd side of things hits an insane number (probably over 600 queries per second.) but the second something like a vbulletin cron job for delayed thread views, or various other semi-intensive processes hit, the apache threads that are waiting for the database will block, but the traffic continues. This means apache spawns more threads, the loads on the webservers goes up, and the site continues to wait for the database to chug through the remaining backlog of queries before the load can go down on the webservers.

                    In extreme cases, we hit maxclients on our apache servers and people will get connection refused messages or timeouts.

                    Hard disk activity on the webservers is minimal as we use eaccelerator to cache much of the php scripts in memory, as well as thttpd's ability to cache static files in memory. This keeps activity aside from logging from touching the disk. All of our servers are scsi based anyway so even relatively high levels of disk IO have little effect on system load.

                    My main areas of interest in regards to optimizing vbulletin for large sites is related to stuff like memcached, being able to avoid relying on mysql replication as much as possible.

                    I'm not sure if anyone on the jelsoft side of things has experience with mysql replication, but suffice it to say it sucks at best. It is not easy to get servers back in sync when they fall out of sync, there is no high availability or failover capability, and mysql cluster is laughable at best. It simply isn't easy to use, reliable or feasible (for mysql cluster anyway). Being able to cache more commonly used site wide stats (like thread view counts, moderators, etc) for things that are not regularly updated, or all that important for them to be up to date is probably the easiest way for vbulletin to scale without large amounts of work.

                    We've thought of a bunch of things (both parts of vbulletin, as well as site specific custom stuff we've written) that we'd love to toss in a memcached server for all of our servers to access. The amount of data is not that great, and the overhead of making one or more tcp connections across a 100mbit or gigabit connection is less than having to max out the IO/CPU on expensive mysql servers.

                    Our host (theplanet) while one of the biggest in the country, does not offer any real "enterprise" level servers (quad processor opterons, etc), and we can't afford to drop that kind of money on buying them ourselves, so we're forced to rely on "lesser" hardware (their largest server would be a dual processor 2.8ghz xeon). Not having the ability to scale horizontally within vbulletin kills us, and probably many other sites.

                    I do realize that we are in the minority, but it makes sense to even the smaller sites to not have to worry about vendor lock-in. They may start out small, but with increasingly more and more people using the internet, it is just a matter of time for any long-running site to run into the same sort of bottlenecks we've seen over the years.
                    Kyle Christensen
                    PbNation.com - one of the biggest and busiest vbulletin forums on the net!

                    Comment

                    • John
                      Senior Member
                      • Apr 2000
                      • 4042

                      #11
                      Thanks for all that info. I agree that vBulletin is hard to scale horizontally, but unfortunately that's really just the nature of bulletin boards. For something like Google search, the updates are very few compared to the reads, and data that is a little stale/out of date doesn't really matter much. The tasks are also very divisible so that they can be worked on in parallel. Unfortunately that model does not work well for bulletin boards - up to date data is key and it doesn't break down well into chunks that can be run in parallel very well. But that shouldn't excuse us trying our best to get performance gains out of vBulletin. I'd be interested to see how LiveJournal has used memcached since they have a similar read/write ratio, I guess.

                      I think that there are some definite gains to be made by slotting memcached into a vBulletin system. The template, style, phrase, language, setting, datastore, etc. tables are updated very infrequently and could easily be hacked into memcached, I think. One instance of memcached could be installed for the webservers to use. It could be installed on one of the webservers initially and then moved to a separate machine if needed. I'd love to hear how you get on with this and could probably provide some help with it if needed.

                      You may want to actually talk to MySQL about the replication issues and whether they have any suggestions, if you haven't already. I've never tried using it at all, so I can't really offer advice in that area.

                      Another thing you might try is setting the delayed thread views to only run once every 24 hours or something like that, in a quiet(er) moment!
                      John Percival

                      Artificial intelligence usually beats real stupidity ;)

                      Comment

                      • John
                        Senior Member
                        • Apr 2000
                        • 4042

                        #12
                        Perhaps some more useful info here:
                        An anonymous reader asks: "An acquaintance of mine runs a fair sized web community. Having nearly 280,000 users in a single MySQL database with about 12 mirrors accessing it, performance has become an big issue. I was wondering what practical methods could be used to lessen the load and decrease use...
                        John Percival

                        Artificial intelligence usually beats real stupidity ;)

                        Comment

                        • KrON
                          Senior Member
                          • Jan 2002
                          • 705
                          • 3.6.x

                          #13
                          Thanks for the feedback. I agree that memcached is probably going to help a lot. Brad Fitz from LJ has given a lot of talks about memcached and how they use it/why they wrote it, but unfortunately I've only read about it in the livejournal backend community (http://www.livejournal.com/community/lj_backend/) which isn't updated that frequently anymore.

                          We've already got memcached running on our database server awaiting the arrival of vb 3.5. Tony, our other admin has already hacked the moderator cache, our banner ads, and is planning on hacking some other stuff into the datastore in anticipation. Since our webservers and db server are located on the same physical network segment (behind our loadbalancer), I think it should be speedy. If a memcached response is instantaneous, it should have about the same amount of overhead as a MySQL query minus the wait, so that is good.

                          I totally agree that trying to split up the data so it is more easily scaleable is very tough; a good bit of the hacking we've done is to hack expensive joins out of queries at the expense of functionality. For example on vb2 we hacked out the sorting of forums by date, poster, title, etc because it ended up being very costly from a performance standpoint, with very little actual user experience benefit.

                          Having options to disable specific portions of vb functionality to tailor it to the liking of the users or administrators would be great. Reducing overhead is just as valuable as the ability to scale, as they both go hand in hand.

                          Also just as a note, I think that the functionality i'd like to restore most that we've had to disable is most definitely the search. While we haven't disabled it totally, google has only indexed 120k of our threads (it goes up about 10k per day since we've upgraded to vb3), but I don't see a way to do so with such a large single post table. With out splitting up the post table into chunks, I don't really see a way to make fulltext searches any faster.

                          The oldschool word/phrase table could potentially allow us to cache data, but as you know once you get past a certain point, it just doesn't scale (unless you have disallowed tons of commonly used words). But we'll save our search woes for a different day.
                          Kyle Christensen
                          PbNation.com - one of the biggest and busiest vbulletin forums on the net!

                          Comment

                          • Wayne Luke
                            vBulletin Technical Support Lead
                            • Aug 2000
                            • 73976

                            #14
                            Originally posted by KrON
                            Also just as a note, I think that the functionality i'd like to restore most that we've had to disable is most definitely the search. While we haven't disabled it totally, google has only indexed 120k of our threads (it goes up about 10k per day since we've upgraded to vb3), but I don't see a way to do so with such a large single post table. With out splitting up the post table into chunks, I don't really see a way to make fulltext searches any faster.
                            The MERGE table engine might be able to help with this in the long. I am not a developer so not sure if this is possible but hear me out on an idea.

                            What if vBulletin did have multiple post tables (post1 - postX) and the number of threads in each table were dependant on the administrator. When you get to XXX thousands of posts, it built a new post table and automatically merged it in. This would require a new way of building postids though since unique IDs won't be guaranteed across tables.
                            Translations provided by Google.

                            Wayne Luke
                            The Rabid Badger - a vBulletin Cloud demonstration site.
                            vBulletin 5 API

                            Comment

                            • KrON
                              Senior Member
                              • Jan 2002
                              • 705
                              • 3.6.x

                              #15
                              Hm, I'm not familiar with this, I'll have to read about it. Any idea how you'd determine what table to search against? Or I guess you could single out table(s) to search against based on their timestamp.

                              I wonder if searching multiple smaller tables is faster than searching one big one?
                              Kyle Christensen
                              PbNation.com - one of the biggest and busiest vbulletin forums on the net!

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...