#1, 字符集相关的参数名和概念
MySQL的字符集设置比较自由。可以设置很多种组合,相关的变量和参数有:
(root@localhost)[sample3]> show global variables like '%cha%';
+-------------------------------+----------------------------+
| Variable_name | Value |
+-------------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | all |
+-------------------------------+----------------------------+
10 rows in set (0.00 sec)
(root@localhost)[sample3]> show variables like '%cha%';
+-------------------------------+----------------------------+
| Variable_name | Value |
+-------------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | all |
+-------------------------------+----------------------------+
10 rows in set (0.00 sec)
其中global variables表示全局变量。也就是默认情况下,新建立的数据库如果不显式的指定字符集相关参数,将使用这些参数。也叫做全局字符集变量
其中variables没带global,表示当前session生效的的参数。所谓当前session,表示如果更改过相关的参数,离开这个session以后,就会恢复默认的参数。也叫做连接时字符集变量
各个变量的概念:
1,character_set_client:客户端字符集,即数据在client端时字符集状态。
2,character_set_connection:连接时转换字符集,即客户端和服务端连接时,字符集装换成的字符集
3,character_set_server:服务端处理时候使用的字符集
4,character_set_database:数据库层面存储默认使用的字符集
5,character_set_results:数据返回时所用的字符集
指定字符集参数有多种方式,
1,在编译时指定,主要是:
-DDEFAULT_CHARSET=utf8 \
-DDEFAULT_COLLATION=utf8-general_ci \
其中CHARSET是指字符集,COLLATION是指相关的校对规则(也称排序规则)
2,参数文件,也就是my.cnf中设定
character_set_server=utf8
collation_server=utf8_general_ci
参数人间的设定将会覆盖编译时设定的字符集和校对规则。
3,启动MySQL服务的时候指定:
--character_set_server: 指定全局粒度的默认字符集
--collation_server:指定全局粒度的默认校对规则
启动时指定的参数将覆盖参数文件以及编译时指定的字符集和校对规则。
这些参数,如果从大方向分的话,可以分为两类:
1,连接时使用的字符集,即为show variable like '%character%' 显示的那些字符集
2,存储时使用的字符集,分为4个级别
1)SERVER,全局级别
2)DATABASE,数据库级别
3)TABLE,表级别
4)column,列级别
2# MySQL查询的基本过程和乱码的形成以及如何避免乱码
1)查询的基本过程
MySQL查询基本过程如下:
1, 程序将字符转换成二进制格式
2,MySQL 客户端发出查询(client端字符集)====>到达server端连接器(connection字符集)====>
内部转换并查询(列字符集、表字符集、数据库字符集、server端字符集,转换优先级逐级递减)=====>
查询结果返回给result(result字符集)
这里先要回答一个latin字符集为何能存放汉字的问题(汉字每个字符占用2个字节长度,latin不支持双字节长度)。
实际上是因为OS/APP层已经将汉字转换为单字符串的形式。一般来说,为了正常显示和处理汉字,
OS层面和程序层面也一定设定了字符集,这个字符集就会将汉字先一步处理成二进制。比如OS和程序层面设置的是UTF8,
那么实际上输入一个汉字,我们在mysql client端实际上是获得3个单字节,而非双字节,这样latin字符集就可以处理了。
这些字符集的转换,实际上是字符长度的转换。比如如果CLIENT端时LATIN1,connection是UTF8,
那么就会发生3个字节长度转换成6个字节长度。每个单字符后面都会被填0,这样变长了自然是没事的,
顶多显示的时候再转回来,把后面填的零都切掉。但是反过来,如果是client段是utf8,connection是latin1,
那么3个字节长度的utf8就会被转成1个字节长度的latin1,后两位丢失,以后是怎么都变不回来的。
这种丢失非零位的过程是不可逆的。所以我们必须得保证设定字符集的时候
(列字符集、表字符集、数据库字符集、server段字符集)>= connection字符集 >= client字符集来避免字符编码丢失问题。
一种可能发生的状况: client(latin1)==>connection(utf8)==>服务端内部的存储时字符集(latin1),
并不会发生字符编码丢失,因为connection转换到存储时字符集时,只是切掉了client转connection时后面填的零,相当于转回来了。
3,实验:
1)client字符集为utf8,connection 为latin1,存储字符集为latin1
(root@localhost)[sample3]> show create table test2;
+-------+---------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+---------------------------------------------------------------------------------------------------------------------------+
| test2 | CREATE TABLE `test2` (
`id` int(11) DEFAULT NULL,
`name` char(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
(root@localhost)[sample3]> show variables like '%cha%';
+-------------------------------+----------------------------+
| Variable_name | Value |
+-------------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | all |
+-------------------------------+----------------------------+
10 rows in set (0.00 sec)
(root@localhost)[sample3]> insert into test2 values (1,'中国');
Query OK, 1 row affected, 1 warning (0.00 sec)
(root@localhost)[sample3]> select * from test2;
+------+------+
| id | name |
+------+------+
| 1 | ?? |
+------+------+
1 row in set (0.00 sec)
这种情况,已经丢失了字符,转换result是没有用的。
(root@localhost)[sample3]> set character_set_results=latin1;
Query OK, 0 rows affected (0.00 sec)
(root@localhost)[sample3]> select * from test2;
+------+------+
| id | name |
+------+------+
| 1 | ?? |
+------+------+
1 row in set (0.00 sec)
2)client字符集为utf8,connection为utf8,存储字符集为latin1
(root@localhost)[sample3]> show variables like '%cha%';
+-------------------------------+----------------------------+
| Variable_name | Value |
+-------------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | all |
+-------------------------------+----------------------------+
10 rows in set (0.00 sec)
(root@localhost)[sample3]>
(root@localhost)[sample3]> insert into test2 values (1,'中国');
ERROR 1366 (HY000): Incorrect string value: '\xE4\xB8\xAD\xE5\x9B\xBD' for column 'name' at row 1
(root@localhost)[sample3]>
这里直接报错,5.6以后加强了数据库数据的安全性,因为会丢失数据,所以不允许插入。
3)client字符集为latin1,connection为utf8,存储字符集为latin1
(root@localhost)[sample3]> show variables like '%cha%';
+-------------------------------+----------------------------+
| Variable_name | Value |
+-------------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | all |
+-------------------------------+----------------------------+
10 rows in set (0.00 sec)
(root@localhost)[sample3]> select * from test2;
+------+---------------+
| id | name |
+------+---------------+
| 1 | ä¸å›½ |
+------+---------------+
1 row in set (0.00 sec)
乱码了,这里为何会乱码呢?按理说转换过程中只会切掉填充的零才对。实际上是result的问题。这个result是utf8,
而client存的时候就是3位,到connection转到6位,到存储时转回3位,这时候到result又转到6位,自然是乱码的。
只要result转回latin1,就可以了。
(root@localhost)[sample3]> set character_set_results=latin1;
Query OK, 0 rows affected (0.00 sec)
(root@localhost)[sample3]> select * from test2;
+------+--------+
| id | name |
+------+--------+
| 1 | 中国 |
+------+--------+
1 row in set (0.00 sec)
4)不丢失字符,但是字符集不同的乱码.
(root@localhost)[sample2]> show create table test;
+-------+-------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`name` char(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+-------+-------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
(root@localhost)[sample2]> show variables like '%cha%';
+-------------------------------+----------------------------+
| Variable_name | Value |
+-------------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | all |
+-------------------------------+----------------------------+
10 rows in set (0.00 sec)
(root@localhost)[sample2]> insert into test values('中国');
Query OK, 1 row affected (0.01 sec)
(root@localhost)[sample2]>
(root@localhost)[sample2]> select * from test;
+------+
| name |
+------+
| |
+------+
1 row in set (0.00 sec)
(root@localhost)[sample2]> set character_set_results=utf8;
Query OK, 0 rows affected (0.00 sec)
(root@localhost)[sample2]> select * from test;
+--------+
| name |
+--------+
| 中国 |
+--------+
1 row in set (0.00 sec)
(root@localhost)[sample2]>
修改客户端字符集的5中方法:
1、 运行,set names <字符集>;
2、 在SQL文件中指定set names <字符集>;,用source命令导入sql文件
如: mysql> source test.sql
3、 在SQL文件中指定set names <字符集>;,然后通过重定向符,或者-e参数来执行
[root@mysql01 3307]# vi test.sql
set names utf8;
select * from mysql.user;
[root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock <test.sql
[root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock -e "set names <字符集>; select * from <database>.<table>;"
4、 通过指定mysql命令的字符集参数实现 --default-character-set=<字符集>
[root@mysql01 3307]# mysql -uroot -p<password> -S /data/3307/mysql.sock --default-chratacter-set=utf8 <database> <test.sql
5、 在配置文件里设置客户端即服务器端相关参数,此设置永久生效,
[client]
default-character-set=utf8
###对于client参数,退出重新登录,即可生效。
更改服务端的方法:
[mysqld]
default-character-set=utf8 ####5.1
character-set-server=utf8 ####5.5
[root@mysql01 3307]# mysql -uroot -p -S /data/3307/mysql.sock -e "show variables like '%chara%';"
Warning: Using a password on the command line interface can be insecure.
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/share/charsets/ |
+--------------------------+----------------------------+
[root@mysql01 3307]#
这些参数中,其中client,connection,results默认会跟随系统的字符集设置,/etc/systemconfig/i18n
迷思。。。
character-set-server=utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake
设置
[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
[mysql]
default-character-set=utf8
不能锁定client端字符集
[client]
default-character-set=utf8
[mysqld]
character-set-server=utf8
skip-character-set-client-handshake ##加入忽略客户端设置,使用服务端设置
[mysql]
default-character-set=utf8
这样设置以后,mysql客户端字符集锁定为utf8