Importing UTF-8 languages

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Andy Huang
    Senior Member
    • Feb 2004
    • 4602

    Importing UTF-8 languages

    More and more people are asking for Unicode UTF-8 based language packs and or encountering problem importing them. This post will hopefully provide some information for those of you that are stuck on the process and unsure how to proceed.

    Please note that before we proceed:
    * Unicode UTF-8 is not a magic pill that solves all your problem. In fact, it may create more problems. If you are happy with how your forum is working, don't change it. If it's not broken, there's no need to fix it, right?
    * If you have the default installation, with just English interface, WITHOUT any other language pack addons EVER installed, then your old posts in foreign languages may still display okay... You can choose to continue if you are absolutely sure about it... I cannot guarentee that your old posts will still display properly, however.
    * If you have previously installed another language pack (IE: Chinese Traditional using Big5, Japanese using SJIS/EUC, etc.), your previous posts are likely to be in those encoding... don't follow this guide as it will cause them to not render properly. There are other third party tools available to help you with that.
    * NOT all webmail clients will support Unicode encoded emails properly. Most local / regional mail providers -- in fact -- does not support it. Your members may receive emails in jibberish if you follow these steps


    Okay, now that all the notices have been said... what will you need?
    * Language XML saved in Unicode UTF-8, without BOM (Not sure if the XML parser will complain if there is BOM, I didn't test it)
    * Time, internet connectivity (if your forum is online), and some effort

    You might also need a text editor, to edit one php file (you can revert this edit after you're done) if your system doesn't import properly.

    First, you'd need to open up the language XML file to make sure it is saved correctly. You should see these (or something similar to these) lines near the top of the language file, if not, change them to look like these:
    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    Code:
    		<charset><![CDATA[UTF-8]]></charset>
    The first one will tell your sever what encoding to use to parse the XML file into memory, so if it is set incorrectly, you will get jibberish. The second one tells your server what encoding to tell browsers at all output pages, so if it is set incorrectly, you will also get jibberish.

    Once you're done editing, go File > Save As, and make sure to check off to save as Unicode UTF-8... This is possible with Windows Vista's default Notepad application; but if it is not possible with your text editor, google for a Unicode Text Editor, there are freely available ones, as well as commercial ones -- as long as it can do what you need, the choice is yours.

    Next, try to import it as per usual via Admin CP > Languages & Phrases > Download / Upload Languages. Make sure to leave the setting to be "(Create New Language)" as this will be a new language on your forums, and ignore language version if needed -- IE: my xml is for 3.6.4 but I'm running 3.6.8.

    Check to see if the imported language is displaying correctly by append &langid=x (replace x with your newly imported language's language id) on your forum's url. IE: http://www.vbulletin.com/forum/?langid=8 will set your language here to UTF-8. Check if posts are displaying correctly etc. For most part, this should work fine already, if not, please read on.

    If it does not work, then the xml parser did not auto-detect your language's xml encoding, and it tried to default to ISO-8859-1. You will need to edit one file quickly, delete the broken one, and re-upload it again. Open up /includes/class_xml.php in a text editor (Notepad will be fine) and search for:
    Code:
    	function &parse($encoding = 'ISO-8859-1', $emptydata = true)
    Replace it with:
    Code:
    	function &parse($encoding = 'UTF-8', $emptydata = true)
    Save, upload to replace the file on your server, then delete the broken language that was uploaded before from your forum via Admin CP > Languages & Phrases > Language Manager, and then upload the same language xml again.

    Finally, test your forum to make sure that it is displaying all characters properly.

    You may also need to set the default English's charset to UTF-8 by going Admin CP > Languaegs & Phrases > Language Manager > [Edit Settings], and enter in "HTML Character Set" option "UTF-8" (without the quotes).

    If you are noticing "strange" characters appearing (They typically looks like a diamond / square with a question mark in it), then you will need to disable blank character stripper feature. Please go to Admin CP > vBulletin Options > Censorship Options > Blank Character Stripper, and remove all of its contents; then, try to make the post again.

    That should be about it... If you have other problems, please post here and I'll check back every now and then to reply to them.
    Best Regards,
    Andy Huang
  • joomlajon
    Senior Member
    • Aug 2006
    • 129
    • 3.8.x

    #2
    I have the language files in utf-8, translated via the admin panel.

    I have all tables in utf-8 and the stuff stored in there looks fine.

    However, the language xml-file states iso-8859-1 still. And I asked about that and was told to leave it be.

    Code:
    <?xml version="1.0" encoding="ISO-8859-1"?>
    
    <language name="Svenska utf-8" vbversion="3.6.8" product="vbulletin" type="custom">
        <settings>
            <options><![CDATA[3]]></options>
            <languagecode><![CDATA[sv]]></languagecode>
            <charset><![CDATA[UTF-8]]></charset>
    It looks like everything is as it should forumwise, and in the database. At least what I can make out of it.

    I made a clean install and have not converted anything. I needed utf-8 partly bacause of the å ä ö and all "&auml;" that was driving me mad, and for the future to post material from the forum on a frontpage would need it. Since allmost everyone besides vBulletin uses utf-8 by default.

    Should I change the hader to utf-8 or just leave it? What difference will it make.

    thanks,
    jon

    Comment

    • Andy Huang
      Senior Member
      • Feb 2004
      • 4602

      #3
      Leave it; if its not broken, don't fix it

      At this stage for you, changing it will just tell newer text editors that the file is written in UTF-8 encoding. Since your forum is already working fine, there's no need to change anything.
      Best Regards,
      Andy Huang

      Comment

      • Luciano
        Member
        • Apr 2001
        • 87
        • 3.8.x

        #4
        Thanks for the post andy, is very helpfull.
        But I've been playing around with UTF-8 for some time now..
        And finally i think i found a simple solution to fit my needs.. I'll post what I did so maybe it will work for others also.. and maybe some people will find it usefull.
        ATTENTION: This worked for me but it is NOT said it will work for everybody !!!!

        What i did and worked for me:

        I did NOT touch the encoding routine.. because (I think) it implements that your server connection is also into UTF-8..

        I only changed the second line in the languange files
        Code:
          <charset><![CDATA[UTF-8]]></charset>
        And left
        Code:
        <?xml version="1.0" encoding="ISO-8859-1"?>
        At the beginning it did not work correctly.. dont ask me why..

        So what I did was:


        Step 1..
        Before doing a clean install of vbulletin.. I opened the install directory and in there the file vbulletin-language.xml.
        I opened it with a good editor and saved it the file again (overwriting it) in UTF-8 format. (NO OTHER CHANGES) (Make sure your editor supports "save as UTF-8" - I use edit plus .. works fine there)

        I then did a clean install of english version of Vbulletin
        (No modifications)

        Step 2..
        Then, first thing go to the ACP --> Language Manager --> English and open the Edit Settings
        There Enter UTF-8 as HTML Character Set and en as Language Code
        Save

        Check if your board is working..

        Step 3..
        Additional languages.
        All worked the same for me..
        so lets take as example: chinese
        I used the file posted by Andy at http://www.vbulletin.com/forum/showthread.php?t=230036

        IF english is your main language.. you cannot use the file as is.. (as it is for chinese as master language)

        First open the file in your editor (edit plus for me as i said)
        Make sure you see the chinese characters in your editor (if not change editor or review your windows settings or.. or.. or..)
        around the 5th line it should read:
        <phrasetype name="访问权限 (Access Masks)"
        and not:
        <phrasetype name="è??é??æ?ƒé?? (Access Masks)"

        So once you are seeing the chinese characters do the following edits

        at the top find:
        Code:
        [B][SIZE=2]<?xml version="1.0" encoding="UTF-8"?>[/SIZE][/B]
        [B][SIZE=2]<language name="MASTER LANGUAGE" vbversion="3.6.8" product="vbulletin" type="master">[/SIZE][/B]
        and replace these 2 lines with the following:
        Code:
        <?xml version="1.0" encoding="ISO-8859-1"?>
        <language name="中文 - Chinese Simplified" vbversion="3.6.8" product="vbulletin" type="custom">
         <settings>
          <options><![CDATA[1]]></options>
          <languagecode><![CDATA[zh]]></languagecode>
          <charset><![CDATA[UTF-8]]></charset>
          <imagesoverride />
          <dateoverride />
          <timeoverride />
          <registereddateoverride />
          <calformat1override />
          <calformat2override />
          <logdateoverride />
          <locale />
          <decimalsep />
          <thousandsep />
         </settings>
        Once this is done, save the file in UTF-8 Format (Make sure your editor supports saving-as UTF-8)

        And then import the file into vbulletin.

        It worked fine for me..

        My first problems are gone and i got it to work like that on 2 servers... windows and linux.. (but i dont know if it will work with all server settings and mysql versions)

        I think the important is if you use UTF-8, to have ALL your languages in UTF-8 including master language.. AND to convert the language-xml file without any changes to UTF-8 BEFORE INSTALL OF Vbulletin.. But of course i am not really sure *LOL*

        Hope my experience helps others.. Attention.. this CAN but must not work on your server.

        thats my 2 cents on the subject
        Luc

        PS1: You can also do it the other way round.. when chinese is your main language.. and use the english file (also posted by andy at http://www.vbulletin.com/forum/showthread.php?t=239176) In that case make sure that the first line of your xml is identical with the one of the vbulletin language xml in the install directory.. (i think that vbulletin china modified the core for importing xml like andy showed above to utf-8. I myself left the first line in ISO for english AND chinese and it works fine..)

        PS2: I had some problems with some russian language files.. problems that are not vbulletin related.. My editor did not support the cyrillic encoding.. so i only saw the weird characters in the xml file.. i had to use another editor, (em-editor-free) to be able to read the files correctly.. but it has so many utf-8 settings i could not save the file correctly.. so I had to do a copy-paste over to edit-plus of the whole file and save it there.. thats the only way i could get the russian file to correct flavor of utf-8.
        Last edited by Luciano; Mon 8 Oct '07, 5:09am.

        Comment

        widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
        Working...