#69 new
Charles Brunet

database character encoding

Reported by Charles Brunet | October 28th, 2008 @ 03:14 PM | in 5.1.0

It seems Sitellite uses MySQL default encoding when creating database, usualy latin1 on most systems. However, it now uses utf-8 as encoding for all its data. The result is that when viewing data with another tool, like phpmyadmin, it shows data with wrong encoding.

I converted my local database to utf8 encoding (using mysqldump, vim to convert it and to replace all 'latin1' with 'utf8', and reloading the resulting file). Then I had 2 issues:

  • I had to reduce primary key size of sitellite_filesystem and sitellite_filesystem_download because utf8 columns reserve themself 3 bytes.
  • I had to add SET CHARACTER SET utf8 SET NAMES utf8 MySQL commands just after it connects to MySQL database (saf/lib/Database/Driver/MySQL.php line 444) to get data in the right encoding.

I hope it can help.

Charles.

Comments and changes to this ticket

  • lux

    lux October 28th, 2008 @ 03:51 PM

    What might be best is converting specific columns to utf8 instead of all of them, that way key columns like the file names remain in latin1 and you gain the extra filename length. It looks like MySQL from 4.1 on can set character sets for individual columns, but I didn't find specifics on 4.0 yet.

  • Charles Brunet

    Charles Brunet October 29th, 2008 @ 05:59 AM

    In that case, I would rather use ascii instead of latin1. Mixing latin1 and utf8 together could be a source of confusion, but using ascii should not cause problems, since it's a subset of utf8.

    But notice that having utf8 columns doesn't use more bytes if you only have ascii chars, it only influence size of index fields (like primary key).

  • Charles Brunet

    Charles Brunet August 6th, 2009 @ 03:01 PM

    • Milestone set to 5.1.0
  • lux

    lux August 28th, 2009 @ 02:36 AM

    Some tables, such as sitellite_filesystem, need all the index they can get. In those cases, it would be best to use latin1 or ascii (I believe those are already hard-coded to latin1 anyway). Otherwise, converting the defaults to utf-8 in the database shouldn't change anything for new sites.

    Existing sites would either have to run as-is with no change, or may need a conversion of the data into utf8. I lean towards upgrades keeping their existing character sets so we're not doing anything to potentially mess with data on them.

    In this way, new installs would all use utf8 by default (except certain data fields as per the above) and existing sites would carry on unchanged. I think that works best.

Please Sign in or create a free account to add a new ticket.

With your very own profile, you can contribute to projects, track your activity, watch tickets, receive and update tickets through your email and much more.

New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

The Sitellite web content management system.

People watching this ticket

Pages