utf8 database to show ăîşţ and import from phpbb

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • imedic
    Member
    • Feb 2008
    • 81
    • 3.8.x

    utf8 database to show ăîşţ and import from phpbb

    Hello,

    I have just purchase VB and prepare to install. I will later on import old phpbb database from the actual forum on the site.

    The thing is I need to set utf8 encoding to support "ă î ş ţ" used in my language. Old database is encoded latin1_swedish_ci ...unfortunately.

    I then need to import 60k of messages into it from old forum.


    If I import latin1 on utf8 will be ok?
    Should I import latin to latin and then change to utf?


    All this encoding stuff is new for me so some tips will be well appreciated before installation. I did manage to change utf8 encoding to a clean install of vb.

    Many thanks ,
    Last edited by imedic; Tue 19 Feb '08, 11:22am.
  • imedic
    Member
    • Feb 2008
    • 81
    • 3.8.x

    #2
    for the record, server is:

    MySQL version4.1.22-standard
    PHP version5.2.5

    Or the basic question it if I set utf8 encoding to vb database clean install, will this create problems when i will import from phpbb with latin1_swedish_ci encoding?
    Last edited by imedic; Tue 19 Feb '08, 11:20am.

    Comment

    • Andy Huang
      Senior Member
      • Feb 2004
      • 4602

      #3
      The colation (latin1_swedish_ci, utf8_general_ci, etc.) is used only for sorting, you shouldn't need to worry about them unless your site contains non-western-european characters (IE: Chinese, Japanese, Korean, etc. where it doesn't use close to ABC system), and absolutely need sort to work correctly (IE: sort by thread title).

      If you want to convert it, here's the recommended steps to take -- though, milages varies due to different factors.

      First, export using latin1 (or whatever the current default is):
      Code:
      mysqldump --default-character-set=latin1 -h mysql.host.com -u database_username -p database_name > backup.sql
      When prompted, enter your password.

      Then, edit the table using a unicode compatible text editor -- don't even look at notepad, or wordpad from microsoft, they tend to be glitchy when it comes to saving as a different encoding, personally I use EmEditor, which is commercial and I'm not affiliated nor endorsing it, there are other free ones available too -- open up your sql dump, and replace all latin1_swedish_ci with utf8_general_ci, then replace all latin1 with utf8, save as a separate file.

      Finally, re-import the dump into a new database:
      Code:
      mysql -h mysql.host.com -u database_username -p new_database < modified.sql
      Before deleting your old database, verify everything is working first. Then, delete the old database, or keep it there if you have the spare disk space -- you never know when it might comes in handy.

      You can do this either before OR after the import, it shouldn't make any difference as ImpEx actually just takes the data, and stores them exactly as is in vBulletin database format, without regards to its encoding or what not -- You can kind of blame MySQL's lack of strict encoding requirements for that.

      BEFORE you import, however, be sure to REMOVE contents from Admin CP > vBulletin Options > Censorship Options > Blank Character Stripper. It doesn't work too well with multi-byte character (UTF-8 is 3 bytes from memory), and it WILL break characters if you leave single byte contents in there. You will need to re-type or re-import to fix the characters it break.
      Best Regards,
      Andy Huang

      Comment

      • izz
        New Member
        • Sep 2002
        • 13
        • 3.8.x

        #4
        Dear Andy Hunag

        I faced the same problem and I'm trying to apply your solution.

        When I open the .sql file using EmEditor and select my local language, still I see the characters of my own language corrupted.
        Do I have to replace the encoding and collation strings without taking care of the corrupted text that I'm seeing?

        Comment

        • Andy Huang
          Senior Member
          • Feb 2004
          • 4602

          #5
          This really depends on a lot of other factors, and I don't know for sure unless I look at it -- even then some times I get it wrong! -- I'd recommend you to try loading the .sql file using different encodings available in your editor, and try to match what your language's character set was set to. Work with it until you find the contents appear correctly -- if not, then your export may have been done incorrectly, go back and try again -- then finally save as unicode and then re-import.

          One thing I forgot to mention in the previous reply from last year is that you should also move any binary data -- attachments, avatars, etc. -- out of the database before doing that. Or else when you do the whole encoding swapping thing, it will corrupt those data and you can't get them back.

          Lastly, very important, and people seem to always not realize this... do not ever work with your live database. Always do things on a separate deployment first, make sure everything is working proper, and never delete your backup until you are 100% sure your changes are exactly as you want. Once you deleted your backup, or if you don't have one, there is always a chance that you can never go back.

          If things still doesn't work out for you, please feel free to open a support ticket attention to me, and provide all relevant login informations, I can try to take a look at it for you -- no promises on how quickly I'll be able to do the turn around though :/
          Best Regards,
          Andy Huang

          Comment

          widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
          Working...