More and more people are asking for Unicode UTF-8 based language packs and or encountering problem importing them. This post will hopefully provide some information for those of you that are stuck on the process and unsure how to proceed.
Please note that before we proceed:
* Unicode UTF-8 is not a magic pill that solves all your problem. In fact, it may create more problems. If you are happy with how your forum is working, don't change it. If it's not broken, there's no need to fix it, right?
* If you have the default installation, with just English interface, WITHOUT any other language pack addons EVER installed, then your old posts in foreign languages may still display okay... You can choose to continue if you are absolutely sure about it... I cannot guarentee that your old posts will still display properly, however.
* If you have previously installed another language pack (IE: Chinese Traditional using Big5, Japanese using SJIS/EUC, etc.), your previous posts are likely to be in those encoding... don't follow this guide as it will cause them to not render properly. There are other third party tools available to help you with that.
* NOT all webmail clients will support Unicode encoded emails properly. Most local / regional mail providers -- in fact -- does not support it. Your members may receive emails in jibberish if you follow these steps
Okay, now that all the notices have been said... what will you need?
* Language XML saved in Unicode UTF-8, without BOM (Not sure if the XML parser will complain if there is BOM, I didn't test it)
* Time, internet connectivity (if your forum is online), and some effort
You might also need a text editor, to edit one php file (you can revert this edit after you're done) if your system doesn't import properly.
First, you'd need to open up the language XML file to make sure it is saved correctly. You should see these (or something similar to these) lines near the top of the language file, if not, change them to look like these:
The first one will tell your sever what encoding to use to parse the XML file into memory, so if it is set incorrectly, you will get jibberish. The second one tells your server what encoding to tell browsers at all output pages, so if it is set incorrectly, you will also get jibberish.
Once you're done editing, go File > Save As, and make sure to check off to save as Unicode UTF-8... This is possible with Windows Vista's default Notepad application; but if it is not possible with your text editor, google for a Unicode Text Editor, there are freely available ones, as well as commercial ones -- as long as it can do what you need, the choice is yours.
Next, try to import it as per usual via Admin CP > Languages & Phrases > Download / Upload Languages. Make sure to leave the setting to be "(Create New Language)" as this will be a new language on your forums, and ignore language version if needed -- IE: my xml is for 3.6.4 but I'm running 3.6.8.
Check to see if the imported language is displaying correctly by append &langid=x (replace x with your newly imported language's language id) on your forum's url. IE: http://www.vbulletin.com/forum/?langid=8 will set your language here to UTF-8. Check if posts are displaying correctly etc. For most part, this should work fine already, if not, please read on.
If it does not work, then the xml parser did not auto-detect your language's xml encoding, and it tried to default to ISO-8859-1. You will need to edit one file quickly, delete the broken one, and re-upload it again. Open up /includes/class_xml.php in a text editor (Notepad will be fine) and search for:
Replace it with:
Save, upload to replace the file on your server, then delete the broken language that was uploaded before from your forum via Admin CP > Languages & Phrases > Language Manager, and then upload the same language xml again.
Finally, test your forum to make sure that it is displaying all characters properly.
You may also need to set the default English's charset to UTF-8 by going Admin CP > Languaegs & Phrases > Language Manager > [Edit Settings], and enter in "HTML Character Set" option "UTF-8" (without the quotes).
If you are noticing "strange" characters appearing (They typically looks like a diamond / square with a question mark in it), then you will need to disable blank character stripper feature. Please go to Admin CP > vBulletin Options > Censorship Options > Blank Character Stripper, and remove all of its contents; then, try to make the post again.
That should be about it... If you have other problems, please post here and I'll check back every now and then to reply to them.
Please note that before we proceed:
* Unicode UTF-8 is not a magic pill that solves all your problem. In fact, it may create more problems. If you are happy with how your forum is working, don't change it. If it's not broken, there's no need to fix it, right?
* If you have the default installation, with just English interface, WITHOUT any other language pack addons EVER installed, then your old posts in foreign languages may still display okay... You can choose to continue if you are absolutely sure about it... I cannot guarentee that your old posts will still display properly, however.
* If you have previously installed another language pack (IE: Chinese Traditional using Big5, Japanese using SJIS/EUC, etc.), your previous posts are likely to be in those encoding... don't follow this guide as it will cause them to not render properly. There are other third party tools available to help you with that.
* NOT all webmail clients will support Unicode encoded emails properly. Most local / regional mail providers -- in fact -- does not support it. Your members may receive emails in jibberish if you follow these steps
Okay, now that all the notices have been said... what will you need?
* Language XML saved in Unicode UTF-8, without BOM (Not sure if the XML parser will complain if there is BOM, I didn't test it)
* Time, internet connectivity (if your forum is online), and some effort
You might also need a text editor, to edit one php file (you can revert this edit after you're done) if your system doesn't import properly.
First, you'd need to open up the language XML file to make sure it is saved correctly. You should see these (or something similar to these) lines near the top of the language file, if not, change them to look like these:
Code:
<?xml version="1.0" encoding="UTF-8"?>
Code:
<charset><![CDATA[UTF-8]]></charset>
Once you're done editing, go File > Save As, and make sure to check off to save as Unicode UTF-8... This is possible with Windows Vista's default Notepad application; but if it is not possible with your text editor, google for a Unicode Text Editor, there are freely available ones, as well as commercial ones -- as long as it can do what you need, the choice is yours.
Next, try to import it as per usual via Admin CP > Languages & Phrases > Download / Upload Languages. Make sure to leave the setting to be "(Create New Language)" as this will be a new language on your forums, and ignore language version if needed -- IE: my xml is for 3.6.4 but I'm running 3.6.8.
Check to see if the imported language is displaying correctly by append &langid=x (replace x with your newly imported language's language id) on your forum's url. IE: http://www.vbulletin.com/forum/?langid=8 will set your language here to UTF-8. Check if posts are displaying correctly etc. For most part, this should work fine already, if not, please read on.
If it does not work, then the xml parser did not auto-detect your language's xml encoding, and it tried to default to ISO-8859-1. You will need to edit one file quickly, delete the broken one, and re-upload it again. Open up /includes/class_xml.php in a text editor (Notepad will be fine) and search for:
Code:
function &parse($encoding = 'ISO-8859-1', $emptydata = true)
Code:
function &parse($encoding = 'UTF-8', $emptydata = true)
Finally, test your forum to make sure that it is displaying all characters properly.
You may also need to set the default English's charset to UTF-8 by going Admin CP > Languaegs & Phrases > Language Manager > [Edit Settings], and enter in "HTML Character Set" option "UTF-8" (without the quotes).
If you are noticing "strange" characters appearing (They typically looks like a diamond / square with a question mark in it), then you will need to disable blank character stripper feature. Please go to Admin CP > vBulletin Options > Censorship Options > Blank Character Stripper, and remove all of its contents; then, try to make the post again.
That should be about it... If you have other problems, please post here and I'll check back every now and then to reply to them.
Comment