My guide to converting to UTF-8

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mk132
    Senior Member
    • May 2004
    • 129

    My guide to converting to UTF-8

    I manually upgraded to UTF-8 a few months ago. This was my experience and these are my notes. I hope they are helpful to you or to the vB team.

    I'm not sure why everybody else does so much work to change the databse. Unlike others, I upgraded the database to UTF-8 using MySQL's native ability to change one character set (latin1) to another (UTF-8). Simple as the ALTER TABLE statement. For instance:
    ALTER TABLE `forum`.holiday CHARACTER SET utf8 COLLATE utf8_unicode_ci, MODIFY COLUMN recuroption varchar(6) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL;
    This requires a good understanding of MySQL and Excel.

    The basic steps:
    0. Back up your database. Do this on a test server first. I'm not responsible for you messing up your forums.
    1. Database: Convert the database's tables and columns from latin1 to UTF-8*.
    2. HTML: Convert the connection, and HTML character set to UTF-8
    3. Convert old threads to UTF-8.

    The zip file contains the Excel workbook and also the three PHP files that are used to convert old usernames, threads and posts to UTF-8.

    Follow the steps in each of the five sheets in the Excel workbook to complete the upgrade. Do it on a test server first.

    A few notes:
    - *You can actually stop and rest after converting the database. Latin1 is a subset of UTF-8. Having the database in UTF8 and the HTML in latin1 (ISO-8859-1) is not a problem here since all characters in the database are in latin1. I ran my forums for a week without any reported problems.
    - I was worried about performance problems after the conversion. The only problem I ran into, and it may not be related, was sorting forums with more than 500k threads. I removed the headers that allow people to sort threads by username, thread title, etc, and my forums run extremely well now.
    - I am running vB 3.8.4, but these directions should work for 4.x as well.
    - I use Sphinx for search. If you do not use Sphinx, you probably want to test your forum's search functionality. It will probably work, but I haven't tested it.

    I hope my experience is helpful for some of you. Please report any problems that you have.

    EDIT: This thread has been moved to a forum where conversation is not allowed. Please PM me with any questions or problems.
    Sorry if I have offended vB.
    Though this is now in a vB4 forum, it applies to 3.8 as well.
    Attached Files
    Last edited by mk132; Mon 1 Nov '10, 7:29am.
  • valerios
    Senior Member
    • Jan 2010
    • 253
    • 4.1.x

    #2
    Hi before start to test it i would ask a question...

    my actual forum has as default a collation (based from the information provided from phpmyadmin) " utf8_unicode_ci " and "utf8_general_ci "
    Does seems normal that i have two different collation?
    Do I have to convert just the "general" one?
    Wich is the language of your forum?
    how i can understand wich is the best collation based on my language? (italian)
    Last edited by valerios; Fri 29 Apr '11, 7:28pm.
    VOTE ->Allow to rotate and tag the album pictures
    ~~~~~~~~~~~~~~~~~~~~~~~
    Official vbulletin Italian Group on vbulletin.com, Join us HERE
    ~~~~~~~~~~~~~~~~~~~~~~~
    Helping people to settle in Australia:
    www.australianboard.com
    The community of the people that dream Australia:
    www.australianboardcommunity.com

    Comment

    • melbo
      Senior Member
      • Apr 2005
      • 517
      • 3.8.x

      #3
      I was just looking for a way to do this.

      I think vB should recommend that new installations set up their dbs as UTF-8 from the beginning as this seems to be the way we are evolving.

      Thanks so much for putting this information together.

      Comment

      • Riasat
        Senior Member
        • Aug 2006
        • 4013

        #4
        iirc, there are some serialization problems if you do it this way. This is not so simple and that's there is no official solution yet.

        Comment

        • feldon23
          Senior Member
          • Nov 2001
          • 11291
          • 3.7.x

          #5
          Originally posted by CvP
          iirc, there are some serialization problems if you do it this way. This is not so simple and that's there is no official solution yet.
          People have been asking for an official solution from vB to convert a MySQL DB to UTF-8 for 5 years. The response is always "it's very difficult." Time to crowdsource this thing.

          Comment

          • Riasat
            Senior Member
            • Aug 2006
            • 4013

            #6
            Originally posted by feldon23
            People have been asking for an official solution from vB to convert a MySQL DB to UTF-8 for 5 years. The response is always "it's very difficult." Time to crowdsource this thing.
            someone opened an issue about this where a vb dev (may be ed) replied about this issue. cba to search.

            They said they'd have it in 4.0; we know how it went.
            Then they said they'd aim for it in 4.1 and when they begun to work on it, they said it'd be on 4.2.
            While all these events are hard to accept, I'm willing to wait till 4.2.
            The conversion is not the only problem, there are many other parts of vb that needs to be changed too.

            jfyi, if they don't get it done by 4.2, I'll really be pissed off

            Comment

            • mk132
              Senior Member
              • May 2004
              • 129

              #7
              Originally posted by valerios
              Hi before start to test it i would ask a question...

              my actual forum has as default a collation (based from the information provided from phpmyadmin) " utf8_unicode_ci " and "utf8_general_ci "
              Does seems normal that i have two different collation?
              Do I have to convert just the "general" one?
              Wich is the language of your forum?
              how i can understand wich is the best collation based on my language? (italian)
              My forum is a translation forum and uses all languages. How to choose between the two? Look at both collations to see which works best for your language. I recommend utf8_unicode_ci due to its advanced character matching, but I think it might not matter for Italian.

              Originally posted by melbo
              I think vB should recommend that new installations set up their dbs as UTF-8 from the beginning as this seems to be the way we are evolving.
              I agree, and suggested it to them when 4.0 was in beta. They said "there are problems", though I don't see any.
              Originally posted by CvP
              iirc, there are some serialization problems if you do it this way. This is not so simple
              Not that my directions are simple, but I have been running my forums on UTF-8 without any problems for the last six months. Serialization problems? None that I have seen. Are there any bugs you can point at that show this?

              Personally, I think vBulletin doesn't have to do anything more than put all my instructions in PHP code and make it an upgrade option in the AdminCP. Simple as that.

              Comment

              • ChaFF
                Member
                • Jun 2010
                • 90
                • 4.0.x

                #8
                Originally posted by CvP
                if they don't get it done by 4.2, I'll really be pissed off
                it'd kill my faith in vb, and I don't want that to happen
                People ask me why I don't like rats. Sorry, I'm not giving you the answer

                Comment

                • mk132
                  Senior Member
                  • May 2004
                  • 129

                  #9
                  Guys, just follow my directions and stop worrying about it.

                  Or vBulletin could decide "let's do it" and convert my directions to PHP code now, release it as a beta UTF-8 converter on Friday, and have it go gold on New Years Day 2011.

                  Comment

                  • Wayne Luke
                    vBulletin Technical Support Lead
                    • Aug 2000
                    • 73981

                    #10
                    Moved to a more appropriate location. This isn't company feedback nor is it a support request.
                    Translations provided by Google.

                    Wayne Luke
                    The Rabid Badger - a vBulletin Cloud demonstration site.
                    vBulletin 5 API

                    Comment

                    Related Topics

                    Collapse

                    Working...