Vbulletin Translation and encoding issue

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Dody
    Senior Member
    • Aug 2004
    • 1896
    • 3.8.x

    #16
    Originally posted by vishnubhatia
    Hi,

    Has anyone been able to upload the Arbic UTF-8 language pack successfully?? I have installed my DB in UTF-8 General CI, changed the config.php file, converted the master language file to utf-8 and after all this when I am trying to upload the arabic language file which is created in UTF-8 by PHP Dev (http://www.vbulletin.com/forum/showt...guage-packages), all the phrases are converted to ?????... anyone who can help me?
    You are using the wrong vbulletin language pack, it is not well encoded. Use my attemp to generate the right utf8 encoding that works both with utf8 and windows-1256: check it here #24
    while(true){
    if(
    $someone->needsHelp() && $i->canHelp()) $post->help();
    if(
    $i->findBug()) $post->bug();
    }

    Comment

    • vishnubhatia
      New Member
      • Oct 2009
      • 6
      • 3.8.x

      #17
      Check Where?

      Originally posted by Dody
      You are using the wrong vbulletin language pack, it is not well encoded. Use my attemp to generate the right utf8 encoding that works both with utf8 and windows-1256: check it here #24
      Hi Dody,

      Check where? What is #24?

      Comment

      • Dody
        Senior Member
        • Aug 2004
        • 1896
        • 3.8.x

        #18
        Sorry, it was ment to be a link, anyway here you go: #24
        while(true){
        if(
        $someone->needsHelp() && $i->canHelp()) $post->help();
        if(
        $i->findBug()) $post->bug();
        }

        Comment

        • vishnubhatia
          New Member
          • Oct 2009
          • 6
          • 3.8.x

          #19
          Thanks I had found it and reply you back on the same thread.

          Comment

          • Darren Gordon
            New Member
            • May 2008
            • 17

            #20
            Originally posted by sajid09
            use iconv("windows-1256", "UTF-8",$text);
            Just to point out, in this case that would be quite a bad idea. If you use ISO-8859-1 for English then you would end up with a post table that contains a mix of raw ISO-8859-1 and UTF-8. In fact, using WINDOWS-1256 and ISO-8859-1 for two seperate languages is also a bad idea.

            Unfortunately vBulletin does not track the charset used on a per post/per thread basis so if you ever wanted to convert such a table to another charset at a later date (ie UTF-8) then it would be extremely difficuly to do the conversion as the conversion process won't know if the source character set is WINDOWS-1256 or ISO-8859-1 when reading the content.

            If you need to use mutliple languages, the safest solution is to use UTF-8 for everything. Even using ISO-8859-1 for arabic content (as well as english) would be a better approach despite being incorrect as then the characters will at least be encoded by the browser to NCRs (the htmlentities that you refered to). If you convert to UTF-8 later the conversion can decode the NCRs into UTF-8 along with the surrounding content in the source characterset.

            Comment

            • Dody
              Senior Member
              • Aug 2004
              • 1896
              • 3.8.x

              #21
              Beside what Darren has said, I noticed by looking at my post table, that it includes a mix of windows-1256 chars and the unicode characters in &ampersand; entities. I have always used windows-1256 and nothing else and my content is only in arabic and yet this still happens. So converting to UTF-8 isn't quite easy, especially for larg forums.
              while(true){
              if(
              $someone->needsHelp() && $i->canHelp()) $post->help();
              if(
              $i->findBug()) $post->bug();
              }

              Comment

              • Merjawy
                Senior Member
                • Sep 2002
                • 2613

                #22
                I've always used windows-1256 for all language packs I use and worked just fine... but not with 4.0 (email issue with 4)
                To be or not to be... Where the hell is the question????
                My psychiatrist told me I was crazy and I said I want a second opinion. He said okay, you're ugly too

                Live vBulletin 4.2.0 Multilingual * Alpha/Beta vB 4 - vB 5 Tier 1A
                CentOS 6.2 - Apache:2.2.15(Apache2Handler) - PHP:5.3.3 - MySQL:5.1.61
                Xampp/Win-XP - Apache v2.2.21(Apache2Handler) - PHP:5.3.8 - MySQL:5.5.16

                Comment

                • apply
                  New Member
                  • Jul 2009
                  • 5
                  • 3.8.x

                  #23
                  I have been running vB3.x on a persian forum for quite a while with no issues, but after I upgraded to vB4 search for non-english words no longer works. In fact, it never returns any results. All my tables use utf8 charset with utf8_general_ci collation. I do not use $config['Mysqli']['charset'] = 'utf8' in my config. If I enable it, my forum starts throwing errors.
                  I have rebuilt the index a few times, but the problem persists. It is really frustrating and my users are really pissed off because of this issue. Any help is appreciated.

                  Comment

                  • Darren Gordon
                    New Member
                    • May 2008
                    • 17

                    #24
                    Hey apply, have a look at this bug and the solution from Strateges.

                    Comment

                    • Darren Gordon
                      New Member
                      • May 2008
                      • 17

                      #25
                      Originally posted by Dody
                      I noticed by looking at my post table, that it includes a mix of windows-1256 chars and the unicode characters in &ampersand; entities. I have always used windows-1256 and nothing else and my content is only in arabic and yet this still happens. So converting to UTF-8 isn't quite easy
                      The &ampersand; entities are NCRs (Numeric Code References). They're submitted by your user's browser whenever they enter characters that are outside of the range of Windows-1256. NCRs are code point values of Unicode which maps to pretty much every known character used in languages. Because of this, NCRs are very simple to decode and convert to another character set; they present no loss of data and are actually the only thing we reliably know the encoding of. If you convert to UTF-8 later then NCRs won't pose any problems.

                      They also take more space (not a big problem these days but it'd still be better using a more efficient character encoding) and can pose logical problems - Σ counts as 6 characters; but these aren't much different from the challenges posed by multibyte characters anyway.

                      Comment

                      • apply
                        New Member
                        • Jul 2009
                        • 5
                        • 3.8.x

                        #26
                        Originally posted by Darren Gordon
                        Hey apply, have a look at this bug and the solution from Strateges.
                        Yes, Darren. I saw that bug and the solution a few days ago, but it didn't fix the problem. As I said in my previous post, I don't use $config['Mysqli']['charset'] = 'utf8' in my config.php. Could it be causing the problem?

                        Comment

                        Related Topics

                        Collapse

                        Working...