Charset problem (cp1251 -> utf8)

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • player1
    New Member
    • Aug 2009
    • 1

    Charset problem (cp1251 -> utf8)

    Hello! I have some issue changing charset from cp1251 to utf8.
    First of all let me tell you from the begining. The first thing that i found it that tables are in utf8, but all works correct with cp1251 when viewing forum. I've made a dump, open it in notepad++ and change all cp1251 comments to utf8 and clear word table, save and upload (all in utf8). Then I've changed config.php charset to utf8 and uncomment $config['Mysqli']['charset']. And so get this error:
    Warning: array_keys() [function.array-keys]: The first argument should be an array in [path]/includes/functions.php on line 4243
    Warning: Invalid argument supplied for foreach() in [path]/includes/functions.php on line 4243
    What could it be and how it can be changed? By the way, I can't make any changes to mysql server settings, so threre are no any possible solutions.
  • McLPL
    New Member
    • Feb 2006
    • 12

    #2
    I need to bump this topic up.
    How to make vBulletin working with utf8 output encoding?

    We are mainly using Polish language with characters originated from ISO-8859-2 character set (called also latin2). Until now we used ISO as output encoding and now we want to change it to utf8.
    The operation should be easy:
    1. change mysql client charset to utf8
    2. change HTML charset utf-8.

    Unfortunately after changing mysql client charset to utf8, we get errors listed in previous post.
    Also strange is, that even if we don't change mysql client encoding leaving it as 'latin2', most forum operations works correctly. It is really strange because in that case all national characters should be screwed up - always.
    As for example: saving new post using 'quick post' feature works fine. But if we want to preview written text - the letters goes crazy (is there any db operation on written text for generating preview?).
    But as I said it shouldn't work at all. I gave this information as a symptom.


    So, question is how to manage vB with utf8
    Is vBulletin sending NAMES command to database? Or use other methods (native php commands).
    Is setting correct character set ensured before any database operation?

    We are running vBulletin 3.8.4
    on MySQL 5.0
    Default charset of database and tables is set to utf8 (which shouldn't matter btw);

    best regards

    Comment

    • Zachery
      Former vBulletin Support
      • Jul 2002
      • 59097

      #3
      You only changed ohw the data is encoded, not how it was originally stored. You need to revert your change and then change the text manually. We do not offically support UTF-8 in vB3/vB4 at this time.

      Comment

      • McLPL
        New Member
        • Feb 2006
        • 12

        #4
        Thank you for fast and fair enough answer.

        Could you explain what does 'encoded' mean in your post?
        I can see correct national letters in database using other mysql clients/software. So text is not encoded in any way.
        So, for me it is enough to set mysql client to desired character encoding to get what I need.
        Would you like to provide any more details?

        Comment

        • Zachery
          Former vBulletin Support
          • Jul 2002
          • 59097

          #5
          The data in the tables are still being stored in the old encoding (ISO-8859-2). You just changed how new posts will have their data encoded (UTF-8). This means you'll have mixed data in your forums, which is a very bad thing.

          To get your data into the proper format it needs to be converted.

          Comment

          • McLPL
            New Member
            • Feb 2006
            • 12

            #6
            No. I'm sorry to say, but You are wrong.
            Since entered text into forum is encoded with the same charset as php mysqli client has declared to DB, data are stored correctly.
            Database converts the text stream into whatever it needs, according to declared character encoding by client - it's simply.

            Thanx to this, I can to have backend with tables in any character set (which supports all entered letters). Since client declares correct character set - data stored into tables must correct also.

            Then, I may connect do database and tell how received data must be encoded. I can receive utf8 encoded data, or latin2. Doesn't matter how data are stored on backend side.

            So again:
            1. we entered text content on www pages in iso8859-2 which is equal to latin2
            2. we had mysqli client set to latin2.
            In result we have correct written data into vb tables.

            The proof: I can see data correctly using another software, for example MySQL Query Browser, which communicates with DB using UTF8 encoding btw.

            So, only one problem is, that probably vB doesn't set client charset always before db queries, or it works with received multibyte stream in wrong way (maybe it's the reason why array_keys() function failed).

            best regards

            Comment

            • Zachery
              Former vBulletin Support
              • Jul 2002
              • 59097

              #7
              You're welcome to not believe me. But your data in the database is still in latin2, new data will be stored in utf8, this is a problem.

              Comment

              • McLPL
                New Member
                • Feb 2006
                • 12

                #8
                It is not about believing or not. It is about a knowledge how databases work.
                I'm working with various databases for more than 10 years, including mysql and php of course. I wrote a lot web (not only) apps which are working with utf8. So, I can say I'm skilled enough to know what is backend charset, client charset, UI charset and relations between them.

                You would be right only in one case: if text passed from UI to mysql client will be not in the same encoding than encoding declared by mysql client to database. In that case letter stored in DB would be trashed by incorrect conversion (if DB dosn't reject transaction due to incorrect encoding). But it is not our case.
                At least in case of textual data. Maybe vB encodes some internal data into some byte stream, which is screwed up by multibyte encoding - don't know, but it is a possibility.

                Again, please answer a question: does vB set client character encoding before it hits database data? (for example sending NAMES command). I suppose it should since there is related setting in vB config. But from behaviour - it looks it doesn't work as expected.

                Comment

                • Zachery
                  Former vBulletin Support
                  • Jul 2002
                  • 59097

                  #9
                  You haven't changed the data ALREADY IN THE DATABASE. Which is the problem.

                  http://www.vcharset.com/ (this used to be a free serivce )

                  Comment

                  • McLPL
                    New Member
                    • Feb 2006
                    • 12

                    #10
                    Ehh. I don't know what I have to say? MySQL5 has ability to convert stored data between charsets by itself. Please refer MySQL documentation.
                    I HAVE tables with utf8 default charset, filled by utf8 data. I know that and I'm sure for that. I can give you a screenshot from MySQL Query Browser if you need.

                    But OK. If you are so sure that I have latin2 encoded letters in my utf8 tables, please say - why?

                    Comment

                    • McLPL
                      New Member
                      • Feb 2006
                      • 12

                      #11
                      I'm back again after some time. Please forgive me my arrogance but I though you will read carefully again what I have written do some test and improve you product.
                      Looks I was wrong. So I wrote a test page, to show how character set in databases work, and how to work with it.

                      It is very simply script (see below). I installed it on our server to make easy access for you. But I recommend to install it on your own to get a proof.

                      The script allows to select www page character set (utf8 and iso-8859-2) and store given text into database. Then you can switch to another charset and see if something wrong happened. Those 2 options differ only in 2 things:
                      - mysql client charset, which is set by SET NAMES
                      - www page charset set in html header.
                      All SQL queries are displayed on page.


                      Note, that character set of test table is intentionally set to latin2 to proof that table charset doesn't matter until correct client character set is defined.
                      But due to this, don't try to put characters which don't belong to iso-8859-2. To get all features of utf8 charset, table should be created as utf8 one. Then you are free to put even Chinese letters (as long as web page is displayed with utf8)
                      If you are using Windows you can put iso-8859-2 letters (Central European) by installing "Polish programmer" keyboard and press aeoaslzxn letters holding right alt.

                      Here is link to test page: CLICK

                      I hope, after this experiment, no one will say that data correctly stored into database must be converted any time I want to change client charset.
                      Enjoy
                      with regards
                      Michal Kozusznik

                      Here are scripts you can use on your own:

                      Sql Script to create test table:
                      Code:
                      CREATE TATABASE charsettest;
                      USE charsettest;
                      
                      CREATE TABLE testtable (
                        id INTEGER UNSIGNED NOT NULL,
                        testfield VARCHAR(100) NOT NULL,
                        PRIMARY KEY (id)
                      )
                      ENGINE = InnoDB
                      CHARACTER SET latin2 COLLATE latin2_general_ci;
                      php code to run with it (set cnf values to correct ones)
                      PHP Code:
                      <?php
                      $cnf
                      =array();
                      ini_set('display_errors'1);
                      error_reporting(E_ALL);



                      /**
                       * Database connection config
                       */
                      $cnf['db_host']     = 'localhost';
                      $cnf['db_schema']   = 'charsettest';
                      $cnf['db_user']     = 'root';
                      $cnf['db_pass']     = 'root';



                      if (empty(
                      $_GET['cs'])) exit ('cs param reguired. Use one of values: utf8 or iso');
                      if (
                      $_GET['cs']!='iso' && $_GET['cs']!='utf8') exit ('Use one of values: utf8 or iso');


                      /**
                       * Simply wrapper over mysql functions
                       */
                      class Db
                      {
                          private static 
                      $resource;
                          private static 
                      $result;
                          public static 
                      $charset;
                          private static 
                      $sqls_arr=array();
                          
                          
                          
                      /**
                           * Execute sql query
                           * @param string $sql
                           * @return boolean 
                           */
                          
                      public static function DoSql ($sql)
                          {
                              if (
                      self::$resource===nullself::Connect ();
                              
                              
                      self::$result mysql_query($sqlself::$resource);
                              if (!
                      self::$result
                              {
                                  print 
                      mysql_error (self::$resource);
                                  
                      self::$result=null;
                                  return 
                      false;
                              }
                              
                              
                      self::$sqls_arr[] = $sql;
                              return 
                      true;
                          }
                          
                          
                          
                      /**
                           * Get value of read record from database
                           * @param string $field Name of field to get value from
                           * @return string  
                           */
                          
                      public static function GetValue($field)
                          {
                              if (empty(
                      self::$result)) return;
                              
                      $a mysql_fetch_assoc(self::$result);
                              
                              if (empty(
                      $a[$field])) return null;
                              return 
                      $a[$field];
                          }
                          
                          
                      /**
                           * Connect to database and set chosen client charset
                           * @global type $cnf 
                           */
                          
                      private static function Connect()
                          {

                              global 
                      $cnf;
                              
                      self::$resource mysql_connect($cnf['db_host'], 
                                                        
                      $cnf['db_user'],
                                                        
                      $cnf['db_pass']);

                              
                      mysql_select_db ($cnf['db_schema'], self::$resource);

                              
                      $sql 'SET NAMES ' self::$charset;
                              
                      self::DoSql($sql);
                          }
                          
                          
                          
                      /**
                           * Excape string using current db character set
                           * @param string $val Input value
                           * @return string escaped value 
                           */
                          
                      public static function EscapeString($val)
                          {
                              if (
                      self::$resource===nullself::Connect ();
                              
                              return 
                      mysql_real_escape_string($valself::$resource);
                          }
                          
                          
                          
                      /**
                           * Get all queries executed
                           * @return string 
                           */
                          
                      public static function GetExecutedQueries()
                          {        
                              return 
                      implode('<br>' PHP_EOL,self::$sqls_arr);
                          }
                      }


                      /**
                       * Prepare charset definitions for mysql and html
                       */
                      $charset_html null;
                      switch (
                      $_GET['cs'])
                      {
                          case 
                      'utf8':
                              
                      Db::$charset 'utf8';
                              
                      $charset_html 'utf-8';   
                              break;
                          case 
                      'iso':
                              
                      Db::$charset 'latin2';
                              
                      $charset_html 'iso-8859-2';    
                              break;
                      }
                          




                      /**
                       * Store new value to database
                       */
                      if (isset($_GET['testfield']))
                      {
                          
                      $escaped Db::EscapeString($_GET['testfield']);
                          
                      $sql "INSERT INTO testtable (id,testfield) VALUES (1, '" $escaped "') ON DUPLICATE KEY UPDATE testfield='" $escaped "';";
                          
                      Db::DoSql($sql);
                      }



                      /**
                       * Read value from database
                       */
                      $sql 'SELECT testfield FROM testtable';
                      Db::DoSql($sql);
                      $val Db::GetValue('testfield');



                      /**
                       * Print HTML page
                       */
                      print '
                          
                      <!DOCTYPE html>
                      <html>
                      <head>
                      <title>charset test</title>
                      <meta http-equiv="Pragma" content="no-cache">
                      <meta http-equiv="Content-Type" content="text/html; charset=' 
                      $charset_html '">
                      </head>
                      <body>
                          <b>Executed queries:</b><br>'
                      .
                              
                      Db::GetExecutedQueries()
                          .
                      '
                          <br><br>
                          <b>Stored value:</b> ' 
                      $val .  '
                          <br><br>
                          <form>
                              <input type="text" name="testfield" value="' 
                      $val '">
                              <input type="submit" name="submit" value="submit">
                              <input type="hidden" name="cs" value="' 
                      $_GET['cs'] . '">
                          </form>
                          
                      Change to: <a href="index.php?cs=iso">iso-8859-2</a> or <a href="index.php?cs=utf8">utf-8</a>
                      </body>
                      </html>

                      '

                      ?>
                      Last edited by McLPL; Mon 28 Nov '11, 1:59pm.

                      Comment

                      Related Topics

                      Collapse

                      Working...