default UTF-8

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • -=|zami|=-
    Senior Member
    • Feb 2009
    • 190

    default UTF-8

    friends, please: in admincp>>language & phrases>>language manager>>edit & settings there is the "HTML Character Set" option with the description "This value tells your browser how to display characters on HTML Pages. Default: UTF-8"

    my doubts:
    - Why is the default UTF-8? any advantage over other standards?
    - Is it possible to use ISO-8859-1 without problems or are there any restrictions in VBulletin 5?

    thanks!!!

  • Wayne Luke
    vBulletin Technical Support Lead
    • Aug 2000
    • 73976

    #2
    UTF-8 is the only standard in the modern web. vBulletin should have converted to UTF-8 in the early 2000s. Instead the development team at the time decided to turn none English characters into HTML Entities and force them into the database. It is the base of the conversion problems non-English sites face today.

    If all you're using is the English Language than you can use ISO-8859-1. It is a subset of UTF-8 and doesn't include thousands of character glyphs that are included in UTF-8. This missing characters will show as squares or black question marks in your browser. It isn't even recommended to use ISO-8859-1 in web applications.

    New vBulletin 5 installations use UTF8MB4 for the database and UTF-8 for the HTML character set. This provides the best compatibility with the devices accessing the web today.
    Translations provided by Google.

    Wayne Luke
    The Rabid Badger - a vBulletin Cloud demonstration site.
    vBulletin 5 API

    Comment

    • Blackhorse
      Senior Member
      • Jul 2018
      • 298
      • 5.3.x

      #3
      Hello,

      untill support team replies

      if your site is using only english and there is no problem with charactets then it is ok to leave it.

      if there is a possibility to have international languages on your forum then you should change it to utf8. but make sure you have prepared database with utf8 charset and collation.


      UTF8 is the default now and works well for international sites and english as well.

      Comment

      • Blackhorse
        Senior Member
        • Jul 2018
        • 298
        • 5.3.x

        #4
        Sorry Wayne Luke i just have sent befpre i see your reply.

        Comment

        • -=|zami|=-
          Senior Member
          • Feb 2009
          • 190

          #5
          Thank you both for the help!

          very useful information from you!
          I do not use English only, and I need to allow accented words. however, the text of the posts is only visible and perfectly accented if the HTML Character Set is equal to ISO-8859-1.
          if I use UTF-8 and the text contains accented words, the text will not be visible.

          any tips?

          Comment

          • Wayne Luke
            vBulletin Technical Support Lead
            • Aug 2000
            • 73976

            #6
            if I use UTF-8 and the text contains accented words, the text will not be visible.
            You're not completely using UTF-8. Something in the configuration is broken. The character set is only the last part of the puzzle. The database has to be UTF-8, preferably UTF8MB4. Your Locale in the language settings has to be set to a UTF-8 locale. You will also need a modern font that is UTF-8 compatible.

            If you install a brand new copy of vBulletin 5 on the same server but with a new database and directory, what happens to the same characters?
            Translations provided by Google.

            Wayne Luke
            The Rabid Badger - a vBulletin Cloud demonstration site.
            vBulletin 5 API

            Comment

            • Blackhorse
              Senior Member
              • Jul 2018
              • 298
              • 5.3.x

              #7
              You need to change your database collation and charset, you have to change every table and column and allow utf8mb4 then!

              Comment

              • -=|zami|=-
                Senior Member
                • Feb 2009
                • 190

                #8
                thank you!
                Following the tips, I did a new installation for testing.

                1) during the prepration for the test installation, I observed the following:
                - in core / includes / config.php, I noticed that the line "// $ config ['Mysqli'] ['charset'] = 'utf8'; is commented. no problems?

                2) when I started the installation:
                - I received the following alert:
                "Action Required
                The current database character set is 'latin1'. It is strongly recommended that you use the utf-8 character set when running vBulletin. However, if you are using this database then changing the character set can affect them.
                Do you automatically want to change your database to use utf-8? "

                I clicked YES

                3) in admincp of the new forum: language pack ==> UTF-8

                4) I posted messages new forum and found no problems with the accented words

                5) in the new database: collation = utf8mb4_general_ci and table types = InnoDB, in all tables.
                character_set_client utf8mb4
                character_set_connection utf8mb4
                character_set_database utf8mb4
                character_set_filesystem binary
                character_set_results utf8mb4
                character_set_server latin1
                character_set_system utf8
                character_sets_dir /usr/share/mysql/charsets/
                6) in the old database: collation = latin1_swedish_ci (all tables) and table types = InnoDB, MEMORY or MyISAM. (InnoDB and MEMORY less than 50% of the tables).
                Variable_name Value
                character_set_client latin1
                character_set_connection latin1
                character_set_database latin1
                character_set_filesystem binary
                character_set_results latin1
                character_set_server latin1
                character_set_system utf8
                character_sets_dir /usr/share/mysql/charsets/
                any tips? thanks!
                Last edited by -=|zami|=-; Sat 20 Apr '19, 7:09pm.

                Comment

                • Blackhorse
                  Senior Member
                  • Jul 2018
                  • 298
                  • 5.3.x

                  #9
                  5) in the new database: collation = utf8mb4_general_ci and table types = InnoDB, in all tables.
                  character_set_client utf8mb4
                  character_set_connection utf8mb4
                  character_set_database utf8mb4
                  character_set_filesystem binary
                  character_set_results utf8mb4
                  character_set_server latin1
                  character_set_system utf8
                  character_sets_dir /usr/share/mysql/charsets/

                  As you are working on a new installation for testing then there is no harm,
                  character_set_server needs to be utf8mb4 not latin1

                  1st: go un-comment the config file allow utf8 and change it to utf8mb4

                  2nd, we are going to make a global change for now: if you have access to server (or your host) go change my.cnf configuration adding the following:

                  PHP Code:
                   [mysqld]
                  character-set-client-handshake FALSE
                  character
                  -set-server=utf8mb4
                  collation
                  -server=utf8mb4_general_ci
                  init_connect
                  ='SET collation_connection = utf8mb4_general_ci,NAMES utf8mb4'

                  [client]
                  default-
                  character-set=utf8mb4

                  [mysql]
                  default-
                  character-set=utf8mb4 

                  After those steps you should see the following in installation site:
                  character_set_client utf8mb4
                  character_set_connection utf8mb4
                  character_set_database utf8mb4
                  character_set_filesystem binary
                  character_set_results utf8mb4
                  character_set_server utf8mb4
                  character_set_system utf8
                  character_sets_dir /usr/share/mysql/charsets/

                  After those steps you you should see the following in live old site:
                  Variable_name Value
                  character_set_client latin1
                  character_set_connection latin1
                  character_set_database latin1
                  character_set_filesystem binary
                  character_set_results latin1
                  character_set_server utf8mb4
                  character_set_system utf8
                  character_sets_dir /usr/share/mysql/charsets/

                  to be continued

                  Comment

                  • Blackhorse
                    Senior Member
                    • Jul 2018
                    • 298
                    • 5.3.x

                    #10
                    6) in the old database: collation = latin1_swedish_ci (all tables) and table types = InnoDB, MEMORY or MyISAM. (InnoDB and MEMORY less than 50% of the tables).
                    Variable_name Value
                    character_set_client latin1
                    character_set_connection latin1
                    character_set_database latin1
                    character_set_filesystem binary
                    character_set_results latin1
                    character_set_server latin1
                    character_set_system utf8
                    character_sets_dir /usr/share/mysql/charsets/
                    any tips? thanks!
                    to change character_set_database you have to convert all tables and columns to utf8mb4 and here you use the vBulletin tools to convert database, or you can hire a professional to do or do on whatever script or solution you have by your own - This is the hardest and most tough work! - You must be careful and you have to keep a clean backup even after successful conversion, don't lose it.The conversion will be done 1st on a copy of the database, keep the original untouchable!!

                    1st TAKE a BACKUP

                    2nd: run this query on the COPY:

                    PHP Code:
                    ALTER DATABASE old_db_name
                        
                    DEFAULT CHARACTER SET utf8mb4
                        
                    DEFAULT COLLATE utf8mb4_general_ci


                    3rd; use the conversion tool for columns and tables.


                    4th: If database is converted successfully, then uncomment // $ config ['Mysqli'] ['charset'] = 'utf8' in config file and change utf8 to utf8mb4: it will look : $ config ['Mysqli'] ['charset'] = 'utf8mb4'

                    5th: upload Language xml file with UTF-8 encoding, if not one of the default installed language!

                    6th: change language settings for UTF-8 and change locale

                    ======

                    You should see the following results:
                    Variable_name Value
                    character_set_client utf8mb4
                    character_set_connection utf8mb4
                    character_set_database utf8mb4
                    character_set_filesystem binary
                    character_set_results utf8mb4
                    character_set_server utf8mb4
                    character_set_system utf8
                    character_sets_dir /usr/share/mysql/charsets/
                    Last edited by Blackhorse; Sun 21 Apr '19, 10:14pm.

                    Comment

                    • Wayne Luke
                      vBulletin Technical Support Lead
                      • Aug 2000
                      • 73976

                      #11
                      We don't generally recommend changing the Server Character Set on the database. Not unless you have run a complete conversion on the contents of every field in your database. Just changing the field to UTFBMB4 isn't enough because vBulletin 4 stores most non-English characters as HTML entities. The problem is that LATIN1 uses 1 byte per character. UTF8 uses 2-4 bytes per character. UTF8MB4 uses 4 Bytes per character. By changing the character set, you've changed the number of bytes per character and now the system doesn't actually know if it is serving valid characters.

                      This is the most complicated part of the conversion. Does an & mean ampersand or is it an HTML entity? What happens if a byte in a character ends up translating to an ampersand, or worse end of file marker? This is the part we're working on in our conversion scripts. The data isn't clean in vBulletin 3 and 4. We have to clean it.
                      Translations provided by Google.

                      Wayne Luke
                      The Rabid Badger - a vBulletin Cloud demonstration site.
                      vBulletin 5 API

                      Comment

                      • -=|zami|=-
                        Senior Member
                        • Feb 2009
                        • 190

                        #12
                        friends, sorry for the delay in answering. Can we resume the discussion?

                        by manual, it would be an error if the character_set_database and character_set_connection values ​​were different, right?
                        https://www.vbulletin.com/docs/html/mysqli

                        In my live forum these two values ​​are the same as latin1.
                        at first, no problem, right?
                        however, the ideal is that these values ​​are equal to utf8mb4, correct?

                        is there a safe way to make this change?

                        thanks!!!


                        Comment

                        • -=|zami|=-
                          Senior Member
                          • Feb 2009
                          • 190

                          #13
                          any tips?

                          thanks!

                          Comment

                          • Wayne Luke
                            vBulletin Technical Support Lead
                            • Aug 2000
                            • 73976

                            #14
                            it would be an error if the character_set_database and character_set_connection values ​​were different, right?
                            The answer here is maybe. Unfortunately, there are too many variables to answer definitively.

                            In a perfect world, these would be the same. However, it really depends on how your vBulletin was communicating to the server before. Changing this could cause text stored before the change to display incorrectly. The problem is that we have no idea what the length of the individual character is. UTF-8 characters can take 1-4 bytes of storage. UTF-8 characters usually take 2-3 bytes. Which is correct? We have no idea.

                            If you were using Latin1 then it is one byte per character and we've converted every UTF-8 character to HTML Entities. These display properly but aren't searchable.

                            On a new installation of vBulletin, we set everything to utf8mb4. This means that every character takes 4 bytes to store. This allows for every UTF-8 character including the Emoji standard popular on mobile devices at the expense of storage space. However, we know what the length of each character is without a doubt. 4 bytes or 32 bits. This allows us to handle things a lot better.

                            Converting to utf8mb4 without knowing the actual content before hand is the difficult. Converting all those stored HTML Entities back to characters is a little easier if we know what the character size is. We have our convertors working for languages based on Latin characters. These are the easiest because they are heavily used in computer processing, even by nations that don't speak those languages languages. We're working on finalizing Arabic and other languages as quickly as possible.
                            Translations provided by Google.

                            Wayne Luke
                            The Rabid Badger - a vBulletin Cloud demonstration site.
                            vBulletin 5 API

                            Comment

                            • -=|zami|=-
                              Senior Member
                              • Feb 2009
                              • 190

                              #15
                              thanks Wayne!

                              when you say "it depends on how VB was communicating with the server before", you mean before the conversion to VB5?

                              Comment

                              Related Topics

                              Collapse

                              Working...