convert database from latin1 to UTF8

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Emath
    Senior Member
    • Aug 2008
    • 146

    convert database from latin1 to UTF8

    hello everyone i want to convert my database from latin1 to UTF8.

    i tried the followed:

    1. use the vCC mod (in vbulletin.org) - didnt succeed at all.

    2. use database converter v2.2 - it converted only my database collation (not the tables data)

    3. using tutorials, tried iconv, sed etc..

    can anyone write a understoodable guide, which is also working for how to convert database from latin1 to UTF8 ?

    i rly need this.

    thanks.
    בגרות במתמטיקה | פתרונות לספרי לימוד
  • borbole
    Senior Member
    • Feb 2010
    • 3074
    • 4.0.0

    #2
    Originally posted by imiviortal
    hello everyone i want to convert my database from latin1 to UTF8.

    i tried the followed:

    1. use the vCC mod (in vbulletin.org) - didnt succeed at all.

    2. use database converter v2.2 - it converted only my database collation (not the tables data)

    3. using tutorials, tried iconv, sed etc..

    can anyone write a understoodable guide, which is also working for how to convert database from latin1 to UTF8 ?

    i rly need this.

    thanks.
    Create a file called convert.php and place the following code in it

    PHP Code:
    <?php
    // Don't forget to enter your db infos.

    define('THIS_SCRIPT''convert');
    require 
    './global.php';

    //---------------

    header('Content-type: text/plain');

    $dbconn mysql_connect('servername''db_user''db_pass') or die( mysql_error() );
    mysql_select_db("db_name");

    $sql "ALTER DATABASE `db_name` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci";
    $result mysql_query($sql) or die( mysql_error() );
    print 
    "Database changed to UTF-8.\n";

    $sql 'SHOW TABLES';
    $result mysql_query($sql) or die( mysql_error() );

    while ( 
    $row mysql_fetch_row($result) )
    {
    $table mysql_real_escape_string($row[0]);
    $sql "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci";
    mysql_query($sql) or die( mysql_error() );
    print 
    "$table changed to UTF-8.\n";
    }

    mysql_close($dbconn);
    ?>
    Enter your db info accordingly and then upload the file in the root of your forum folder and run it from the browser. But first it would be best if you backed up your db.

    Comment

    • Emath
      Senior Member
      • Aug 2008
      • 146

      #3
      Originally posted by borbole
      Create a file called convert.php and place the following code in it

      PHP Code:
      <?php
      // Don't forget to enter your db infos.
       
      define('THIS_SCRIPT''convert');
      require 
      './global.php';
       
      //---------------
       
      header('Content-type: text/plain');
       
      $dbconn mysql_connect('servername''db_user''db_pass') or die( mysql_error() );
      mysql_select_db("db_name");
       
      $sql "ALTER DATABASE `db_name` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci";
      $result mysql_query($sql) or die( mysql_error() );
      print 
      "Database changed to UTF-8.\n";
       
      $sql 'SHOW TABLES';
      $result mysql_query($sql) or die( mysql_error() );
       
      while ( 
      $row mysql_fetch_row($result) )
      {
      $table mysql_real_escape_string($row[0]);
      $sql "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci";
      mysql_query($sql) or die( mysql_error() );
      print 
      "$table changed to UTF-8.\n";
      }
       
      mysql_close($dbconn);
      ?>
      Enter your db info accordingly and then upload the file in the root of your forum folder and run it from the browser. But first it would be best if you backed up your db.

      i run this script, and i get :

      Code:
      Database changed to UTF-8.
      latexreferencecommands changed to UTF-8.
      latexreferencegroups changed to UTF-8.
      latexreferencesettings changed to UTF-8.
      vb_aaggregate_temp_1244398200 changed to UTF-8.
      vb_aaggregate_temp_1268604600 changed to UTF-8.
      vb_access changed to UTF-8.
      vb_adminhelp changed to UTF-8.
      vb_administrator changed to UTF-8.
      vb_adminlog changed to UTF-8.
      vb_adminmessage changed to UTF-8.
      vb_adminutil changed to UTF-8.
      vb_adv_modules changed to UTF-8.
      vb_adv_pages changed to UTF-8.
      vb_adv_pages_user changed to UTF-8.
      vb_adv_setting changed to UTF-8.
      vb_adv_settinggroup changed to UTF-8.
      vb_album changed to UTF-8.
      vb_albumpicture changed to UTF-8.
      vb_albumupdate changed to UTF-8.
      vb_announcement changed to UTF-8.
      vb_announcementread changed to UTF-8.
      vb_attachment changed to UTF-8.
      vb_attachmentpermission changed to UTF-8.
      vb_attachmenttype changed to UTF-8.
      vb_attachmentviews changed to UTF-8.
      vb_avatar changed to UTF-8.
      vb_bbcode changed to UTF-8.
      etc..........
      but after converting to utf8 as i understood my forum shouldnt work until i configure the charset and the config file, and my forum is working just fine .

      in addition when i change to charset UTF8 i see gibrish which means the tables data isnt utf8.

      further more, ive checked twice that the convert is done on the right database , and that in the config.php file the right database is selected.

      any ideas ???

      bah....
      בגרות במתמטיקה | פתרונות לספרי לימוד

      Comment

      • borbole
        Senior Member
        • Feb 2010
        • 3074
        • 4.0.0

        #4
        Try to change the langage charset option to UTF-8 as well.

        Comment

        • Emath
          Senior Member
          • Aug 2008
          • 146

          #5
          please read again my last comment..

          in addition when i change to charset UTF8 i see gibrish(in the forum) which means the tables data isnt utf8.
          but it doesnt rly matter, because the forum worked after i converted the database, and it shouldnt.
          בגרות במתמטיקה | פתרונות לספרי לימוד

          Comment

          • Emath
            Senior Member
            • Aug 2008
            • 146

            #6
            any help please?
            בגרות במתמטיקה | פתרונות לספרי לימוד

            Comment

            • Daniel.P
              Senior Member
              • Apr 2008
              • 600
              • 4.0.x

              #7
              Photography Life provides articles, news, digital camera and lens reviews, tips and detailed tutorials to photographers of all levels.

              I tried this way
              but it still has problems

              Convert the codepage from your source codepage to UTF-8:


              iconv -c -f latin1 -t utf-8 vbulletin_db.sql > vbulletin_db_utf8.sql
              born to fish forced to work

              Comment

              • AlexanderT
                Senior Member
                • Mar 2003
                • 992

                #8
                Here is how I did it:



                Make sure to backup first. And remember, not all string-related functions in vB are UTF-aware.

                Comment

                • Emath
                  Senior Member
                  • Aug 2008
                  • 146

                  #9
                  And remember, not all string-related functions in vB are UTF-aware.
                  which means? what i need to do about it?

                  thanks , ill try what uve post.
                  בגרות במתמטיקה | פתרונות לספרי לימוד

                  Comment

                  • AlexanderT
                    Senior Member
                    • Mar 2003
                    • 992

                    #10
                    Originally posted by imiviortal
                    which means? what i need to do about it?
                    Essentially, there are functions related with text in vB that assume that a character is always of size 1 byte. That is no longer necessarily true with UTF-8. So there will be issues for instance when you set a minimum length for user names. Or there will be issues with forcing line breaks.

                    Comment

                    • Emath
                      Senior Member
                      • Aug 2008
                      • 146

                      #11
                      Originally posted by borbole
                      Try to change the langage charset option to UTF-8 as well.
                      do have any other idea what is wrong? because using your script is much more easier and safe.
                      im sure that alot of people will use it.
                      בגרות במתמטיקה | פתרונות לספרי לימוד

                      Comment

                      • goyo
                        Senior Member
                        • Dec 2002
                        • 304
                        • 3.8.11

                        #12
                        It could work for English language boards...but definitely not for others.

                        Comment

                        • Emath
                          Senior Member
                          • Aug 2008
                          • 146

                          #13
                          Originally posted by goyo
                          It could work for English language boards...but definitely not for others.
                          which script you are talking about ?

                          the one borbole posted?
                          בגרות במתמטיקה | פתרונות לספרי לימוד

                          Comment

                          • goyo
                            Senior Member
                            • Dec 2002
                            • 304
                            • 3.8.11

                            #14
                            Originally posted by imiviortal
                            which script you are talking about ?

                            the one borbole posted?
                            It doesn't matter as most likely there's no automated solution to get a CLEAN UTF8 post or other table for many of the foreign languages.

                            I'll give you an example:

                            Hungarian language with charset ISO-8859-2.

                            There's a letter Á. It can be in the database as Á or &Aacute; or &#193; ok...but in the past there were no real Hungarian keyboards so it can be for example À, Ã and it's equivalents.
                            Further more in case of the vB translations it really depends what charset/browser/keyboard used in the admincp during text entry especially when the languages translated by several people over several vB version.
                            You wont see these on the actual board as modern browsers smart enough to display the right characters but they wont become real UTF-8 characters during the conversion (so you can have a badly encoded RSS feed).

                            Of course I could be wrong but every time I've converted a larger user created database with any automatized method...later on I had to replace the remained non-UTF8 characters manually with a text editor (which is quite fun in case of a 2+ GByte tables).

                            Comment

                            • Emath
                              Senior Member
                              • Aug 2008
                              • 146

                              #15
                              ha..
                              i dont think this is my case since im using hebrew on my board.
                              בגרות במתמטיקה | פתרונות לספרי לימוד

                              Comment

                              Related Topics

                              Collapse

                              Working...