utf8mb4_general_ci or utf8mb4_unicode_ci for collation

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • mark.hs
    Senior Member
    • Dec 2016
    • 197
    • 5.7.0

    utf8mb4_general_ci or utf8mb4_unicode_ci for collation

    I saw in Wayne's October article on mysql best practices that he recommends a database character set of utf8mb4 and a collation of utf8mb4_general_ci. From what I read online, most people recommend utf8mb4_unicode_ci; apparently this is a newer version. Is there a reason you recommend general, or can we use either?
    Admin for hobbysquawk.com
    VB 5.74
    PHP 8.2
    Maria DB 10.6
  • Wayne Luke
    vBulletin Technical Support Lead
    • Aug 2000
    • 74118

    #2
    The collation determines how characters are sorted, nothing more. This would be used in alphabetical sorts and unstructured data lookups in MySQL. They do not determine what is stored as that is the role of the character set. Either will work for general purposes. If your site has a lot of multi-byte characters (Arabic, Cyrillic, Hebrew, or Asian languages) then utf8mb_unicode_ci will probably suit you better. It is mostly Latin language characters, you probably won't see a difference.

    You can read this thread on StackExchange for more information:
    https://stackoverflow.com/questions/...tf8-unicode-ci

    The character set of UTFMB4 is the most important part though. This provides support for the majority of characters in languages around the world as well as some speciality characters like the UTF-8 Emoji Standard.
    Translations provided by Google.

    Wayne Luke
    The Rabid Badger - a vBulletin Cloud demonstration site.
    vBulletin 5 API

    Comment

    • mark.hs
      Senior Member
      • Dec 2016
      • 197
      • 5.7.0

      #3
      Wayne - Thanks for the info.
      Regarding utf8mb8, my database is currently in latin1 which is creating random junk characters in various posts (this was discovered in my other post about ipad issues). I have no idea why the database was setup in latin1, but I know I need to change it. I plan to modify the database from latin1 to utf8mb4 this evening. Other than the standard myslq commands to modify the database char set and collation (for all tables), is there anything else that I should be mindful of when moving to utf8mb4? My biggest concern is data loss or truncation. I'm assuming I can alter the database, change the language settings in VB to UTF-8, clear the cache and everything will be good, or are there any other specific steps or cautions that I need to be aware of??
      Admin for hobbysquawk.com
      VB 5.74
      PHP 8.2
      Maria DB 10.6

      Comment

      • Wayne Luke
        vBulletin Technical Support Lead
        • Aug 2000
        • 74118

        #4
        latin1 was the default format for MySQL until 5.7.something. The default for new databases is UTF8MB4 in MySQL 8.0 (there is no MySQL 6 or 7). If vBulletin's installer doesn't create the database, it will use the MySQL Server's default. In current versions of vBulletin's installer will create the database with the UTF8MB4 character set and the collation will be utf8mb4_general_ci (hence my recommendation). The installer only affects new installations though. Upgrades do not change the character set and collation.

        Attached is a set of experimental tools to convert your database to UTF8MB4 with a utf8mb4_general_ci collation. The documentation is in the readme.md text file. These should be run on a copy of your database for testing purposes before trying to convert your live database. They are experimental but I have used them on several dozen databases without issue. However, they have always been English databases. These tools will only look at standard vB5 tables in their present configuration. If you have other tables, you'll have to manually change them.
        Attached Files
        Translations provided by Google.

        Wayne Luke
        The Rabid Badger - a vBulletin Cloud demonstration site.
        vBulletin 5 API

        Comment

        • mark.hs
          Senior Member
          • Dec 2016
          • 197
          • 5.7.0

          #5
          Wayne,
          I just finished running the scripts on a backup of our vbulletin database. A couple of questions:
          1. A few tables remain as MyISAM. They include: tagcontent, phrase, language, impexerror, userfield and mybb_attachments. Is this OK or should I attempt to change the type?
          2. See attached image. About 30 or so tables did not convert, they remain in latin1. Is this ok, or do I need to change these manually?
          Attached Files
          Last edited by mark.hs; Mon 12 Nov '18, 1:24pm.
          Admin for hobbysquawk.com
          VB 5.74
          PHP 8.2
          Maria DB 10.6

          Comment

          • Wayne Luke
            vBulletin Technical Support Lead
            • Aug 2000
            • 74118

            #6
            1. language, phrase and userfield are left as MyISAM because our minimum required version of MySQL is 5.5.8. You need to be running MySQL 5.7 to change them and we haven't tested those changes. tagcontent and mybb_attachments are not vBulletin 5 tables.

            2. The tables that weren't converted in the image do not appear to be vBulletin 5 tables. The upgrade scripts should have removed most of these if your database user has permission to delete tables.
            Translations provided by Google.

            Wayne Luke
            The Rabid Badger - a vBulletin Cloud demonstration site.
            vBulletin 5 API

            Comment

            • mark.hs
              Senior Member
              • Dec 2016
              • 197
              • 5.7.0

              #7
              I just completed the conversion. Everything went smoothly. Two quick follow-up questions:

              1. Do I need to anything with search, such as re-index, or will that sort itself out over time?

              2. I seem to have 20 or so tables in the database that don't belong, at least they didnt convert to utf8mb4 and they little or no data within them. Many appear to be related to VB4. Do you have a list of all valid tables in VB5.x so I can compare the master list to what's inside my database?

              Admin for hobbysquawk.com
              VB 5.74
              PHP 8.2
              Maria DB 10.6

              Comment

              • Wayne Luke
                vBulletin Technical Support Lead
                • Aug 2000
                • 74118

                #8
                1. Search tables should have been converted and not emptied. If you emptied the search, then you should rebuild. If you use Sphinx Search, you should rebuild the index.

                2.
                vb_ad
                vb_adcriteria
                vb_adminhelp
                vb_administrator
                vb_adminlog
                vb_adminmessage
                vb_adminutil
                vb_announcement
                vb_announcementread
                vb_apiclient
                vb_apiclient_devicetoken
                vb_apilog
                vb_attach
                vb_attachmentcategory
                vb_attachmentcategoryuser
                vb_attachmentpermission
                vb_attachmenttype
                vb_attachmentviews
                vb_autosavetext
                vb_avatar
                vb_bbcode
                vb_bbcode_video
                vb_cache
                vb_cacheevent
                vb_calendar
                vb_calendarcustomfield
                vb_calendarmoderator
                vb_calendarpermission
                vb_channel
                vb_channelprefixset
                vb_closure
                vb_contentpriority
                vb_contenttype
                vb_cpsession
                vb_cron
                vb_cronlog
                vb_customavatar
                vb_customprofile
                vb_customprofilepic
                vb_datastore
                vb_deletionlog
                vb_discussion
                vb_editlog
                vb_event
                vb_externalcache
                vb_faq
                vb_fcmessage
                vb_fcmessage_offload
                vb_fcmessage_queue
                vb_filedata
                vb_filedataresize
                vb_forumpermission
                vb_forumrunner_attachment
                vb_forumrunner_push_data
                vb_forumrunner_push_users
                vb_gallery
                vb_groupintopic
                vb_holiday
                vb_hook
                vb_humanverify
                vb_hvanswer
                vb_hvquestion
                vb_icon
                vb_imagecategory
                vb_imagecategorypermission
                vb_infraction
                vb_infractionban
                vb_infractiongroup
                vb_infractionlevel
                vb_ipaddressinfo
                vb_language
                vb_legacyevent
                vb_link
                vb_loginlibrary
                vb_mailqueue
                vb_mapiposthash
                vb_messagefolder
                vb_moderation
                vb_moderator
                vb_moderatorlog
                vb_node
                vb_nodehash
                vb_noderead
                vb_nodeview
                vb_notice
                vb_noticecriteria
                vb_noticedismissed
                vb_notification
                vb_notificationevent
                vb_notificationtype
                vb_package
                vb_page
                vb_pagetemplate
                vb_passwordhistory
                vb_paymentapi
                vb_paymentinfo
                vb_paymenttransaction
                vb_permission
                vb_photo
                vb_phrase
                vb_phrasetype
                vb_picturecomment
                vb_picturecomment_hash
                vb_poll
                vb_polloption
                vb_pollvote
                vb_postedithistory
                vb_prefix
                vb_prefixpermission
                vb_prefixset
                vb_privacyconsent
                vb_privatemessage
                vb_product
                vb_productcode
                vb_productdependency
                vb_profilefield
                vb_profilefieldcategory
                vb_profilevisitor
                vb_ranks
                vb_redirect
                vb_report
                vb_reputation
                vb_reputationlevel
                vb_routenew
                vb_rssfeed
                vb_rsslog
                vb_screenlayout
                vb_searchlog
                vb_searchtowords_a
                vb_searchtowords_b
                vb_searchtowords_c
                vb_searchtowords_d
                vb_searchtowords_e
                vb_searchtowords_f
                vb_searchtowords_g
                vb_searchtowords_h
                vb_searchtowords_i
                vb_searchtowords_j
                vb_searchtowords_k
                vb_searchtowords_l
                vb_searchtowords_m
                vb_searchtowords_n
                vb_searchtowords_o
                vb_searchtowords_other
                vb_searchtowords_p
                vb_searchtowords_q
                vb_searchtowords_r
                vb_searchtowords_s
                vb_searchtowords_t
                vb_searchtowords_u
                vb_searchtowords_v
                vb_searchtowords_w
                vb_searchtowords_x
                vb_searchtowords_y
                vb_searchtowords_z
                vb_sentto
                vb_session
                vb_sessionauth
                vb_setting
                vb_settinggroup
                vb_sigparsed
                vb_sigpic
                vb_sigpicnew
                vb_site
                vb_smilie
                vb_spamlog
                vb_stats
                vb_strikes
                vb_style
                vb_stylevar
                vb_stylevardfn
                vb_subscribediscussion
                vb_subscribeevent
                vb_subscription
                vb_subscriptionlog
                vb_subscriptionpermission
                vb_tag
                vb_tagnode
                vb_tagsearch
                vb_template
                vb_templatehistory
                vb_templatemerge
                vb_text
                vb_thread_post
                vb_trending
                vb_upgradelog
                vb_user
                vb_useractivation
                vb_userauth
                vb_userban
                vb_userchangelog
                vb_userfield
                vb_usergroup
                vb_usergroupleader
                vb_usergrouprequest
                vb_userlist
                vb_userloginmfa
                vb_usernote
                vb_userpromotion
                vb_userstylevar
                vb_usertextfield
                vb_usertitle
                vb_video
                vb_videoitem
                vb_widget
                vb_widgetchannelconfig
                vb_widgetdefinition
                vb_widgetinstance
                vb_widgetuserconfig
                vb_words
                Translations provided by Google.

                Wayne Luke
                The Rabid Badger - a vBulletin Cloud demonstration site.
                vBulletin 5 API

                Comment

                • Wayne Luke
                  vBulletin Technical Support Lead
                  • Aug 2000
                  • 74118

                  #9
                  The forumrunner tables won't have been updated and if you do not have the forumrunner package installed under Hooks & Products, you can disregard them.
                  Translations provided by Google.

                  Wayne Luke
                  The Rabid Badger - a vBulletin Cloud demonstration site.
                  vBulletin 5 API

                  Comment

                  • mark.hs
                    Senior Member
                    • Dec 2016
                    • 197
                    • 5.7.0

                    #10
                    We are using Sphinx. Can we rebuild the search index by going to Admin CP > Maintenance > General Update Tools > Rebuild Search Index, or do we need to do it via the command line with the searchindex.php file?

                    If we can go through the Admin CP, do we need to Empty the search index first or just click Rebuild the Search Index?
                    Admin for hobbysquawk.com
                    VB 5.74
                    PHP 8.2
                    Maria DB 10.6

                    Comment

                    • Wayne Luke
                      vBulletin Technical Support Lead
                      • Aug 2000
                      • 74118

                      #11
                      Using searchindex.php will be slightly quicker as you're removing the HTML intefrace. Also it won't time_out and needs less babysitting than the AdminCP rebuild.

                      Deleting the Sphinx index files and restarting the server should also trigger a reindex.
                      Translations provided by Google.

                      Wayne Luke
                      The Rabid Badger - a vBulletin Cloud demonstration site.
                      vBulletin 5 API

                      Comment

                      • WielerArchieven
                        Member
                        • Apr 2007
                        • 51
                        • 3.6.x

                        #12
                        Picking up on this rather old thread. But is this still the case?

                        I see when I create a new database on my server running MySQL 8.0.27 that the collation defaults to utf8mb4_0900_ai_ci for that new database. But I read this collation is 'is accent-insensitive' so I suspect for my forum which uses Dutch and French as languages this isn't the way to go as for example an username Rene and René would clash.

                        Is my understanding correct on this? And should I still use utf8mb4_general_ci instead?

                        Comment

                        • Wayne Luke
                          vBulletin Technical Support Lead
                          • Aug 2000
                          • 74118

                          #13
                          Your collation determines how MySQL sorts query results. That is all it does. It does not determine whether a character exists or not, the character set determines which characters are available. 0900 will sort better than general, in most cases. 0900 follows the latest unicode standards whereas general may not.

                          The ai in the collation means accent insensitive. The ci means case insensitive. These are recommended for most languages. It means that Cafe, Café, cafe and café will all be sorted together.

                          As for your username example. Both would be allowed be e and é are different characters. Again the collation doesn't limit the characters that can be used. The accent just wouldn't be accounted for in sorting a list of usernames so both names should appear together (one after another) in the list instead of having René appear after Renz.
                          Translations provided by Google.

                          Wayne Luke
                          The Rabid Badger - a vBulletin Cloud demonstration site.
                          vBulletin 5 API

                          Comment

                          • WielerArchieven
                            Member
                            • Apr 2007
                            • 51
                            • 3.6.x

                            #14
                            OK thanks for the clear clarification.

                            Comment

                            Related Topics

                            Collapse

                            Working...