背景
当MongoDB存储海量的数据时,一台机器可能不足以存储数据,也可能不足以提供可接受的读写吞吐量。这时,我们就可以通过在多台机器上分割数据,使得数据库系统能存储和处理更多的数据。
1、MongoDB sharding简介
三种角色:
配置服务器(config):是一个独立的mongod进程,保存集群和分片的元数据,即各分片包含了哪些数据的信息。
路由服务器(mongos):起到一个路由的功能,供程序连接。本身不保存数据,在启动时从配置服务器加载集群信息.
分片服务器(sharding):是一个独立mongod进程,保存数据信息。可以是一台服务器,如果想要高可用也可以配置成副本集。
2、实验环境
两台机器的IP:
172.16.101.54 sht-sgmhadoopcm-01
172.16.101.55 sht-sgmhadoopnn-01
config server:
172.16.101.55:27017
mongos:
172.16.101.55:27018
sharding:
172.16.101.54:27017
172.16.101.54:27018
172.16.101.54:27019
2、启动config服务
修改配置服务器的配置文件,主要是参数clusterRole指定角色为configsvr
[root@sht-sgmhadoopnn-01 mongodb]# cat /etc/mongod27017.conf
systemLog:
destination: file
path: "/usr/local/mongodb/log/mongod27017.log"
logAppend: true
storage:
dbPath: /usr/local/mongodb/data/db27017
journal:
enabled: true
processManagement:
fork: true
pidFilePath: /usr/local/mongodb/data/db27017/mongod27017.pid
net:
port: 27017
bindIp: 0.0.0.0
setParameter:
enableLocalhostAuthBypass: false
sharding:
clusterRole: configsvr
archiveMovedChunks: true
[root@sht-sgmhadoopnn-01 mongodb]# bin/mongod --config /etc/mongod27017.conf
warning: bind_ip of 0.0.0.0 is unnecessary; listens on all ips by default
about to fork child process, waiting until server is ready for connections.
forked process: 31033
child process started successfully, parent exiting
3、启动mongos服务
修改路由服务器的配置文件,主要是参数configDB指定config服务器的IP和port,不需要配置有关数据文件的信息,因为路由服务器不存储数据
[root@sht-sgmhadoopnn-01 mongodb]# cat /etc/mongod27018.conf
systemLog:
destination: file
path: "/usr/local/mongodb/log/mongod27018.log"
logAppend: true
processManagement:
fork: true
pidFilePath: /usr/local/mongodb/data/db27018/mongod27018.pid
net:
port: 27018
bindIp: 0.0.0.0
setParameter:
enableLocalhostAuthBypass: false
sharding:
autoSplit: true
configDB: 172.16.101.55:27017
chunkSize: 64
[root@sht-sgmhadoopnn-01 mongodb]# bin/mongos --config /etc/mongod27018.conf
warning: bind_ip of 0.0.0.0 is unnecessary; listens on all ips by default
2018-11-10T18:57:13.705+0800 W SHARDING running with 1 config server should be done only for testing purposes and is not recommended for production
about to fork child process, waiting until server is ready for connections.
forked process: 31167
child process started successfully, parent exiting
4、启动sharding服务
就是一个普通的mongodb进程,普通的配置文件
[root@sht-sgmhadoopcm-01 mongodb]# cat /etc/mongod27017.conf
systemLog:
destination: file
path: "/usr/local/mongodb/log/mongod27017.log"
logAppend: true
storage:
dbPath: /usr/local/mongodb/data/db27017
journal:
enabled: true
processManagement:
fork: true
pidFilePath: /usr/local/mongodb/data/db27017/mongod27017.pid
net:
port: 27017
bindIp: 0.0.0.0
setParameter:
enableLocalhostAuthBypass: false
[root@sht-sgmhadoopcm-01 mongodb]# bin/mongod --config /etc/mongod27017.conf
[root@sht-sgmhadoopcm-01 mongodb]# bin/mongod --config /etc/mongod27018.conf
[root@sht-sgmhadoopcm-01 mongodb]# bin/mongod --config /etc/mongod27019.conf
5、登陆mongos服务并添加sharding信息
[root@sht-sgmhadoopnn-01 mongodb]# bin/mongo --port=27018
mongos> sh.addShard("172.16.101.54:27017")
{ "shardAdded" : "shard0000", "ok" : 1 }
mongos> sh.addShard("172.16.101.54:27018")
{ "shardAdded" : "shard0001", "ok" : 1 }
mongos> sh.addShard("172.16.101.54:27019")
{ "shardAdded" : "shard0002", "ok" : 1 }
查看集群分片信息
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5be6b98a507b3e0370eb36b4")
}
shards:
{ "_id" : "shard0000", "host" : "172.16.101.54:27017" }
{ "_id" : "shard0001", "host" : "172.16.101.54:27018" }
{ "_id" : "shard0002", "host" : "172.16.101.54:27019" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
mongos> db.runCommand({listshards:1})
{
"shards" : [
{
"_id" : "shard0000",
"host" : "172.16.101.54:27017"
},
{
"_id" : "shard0001",
"host" : "172.16.101.54:27018"
},
{
"_id" : "shard0002",
"host" : "172.16.101.54:27019"
}
],
"ok" : 1
}
6、开启分片
需要执行分片的库和集合,以及分片模式,分片模式分为两种hash和range
[root@sht-sgmhadoopnn-01 mongodb]# bin/mongo --port=27018
分片库是testdb
mongos> sh.enableSharding("testdb")
{ "ok" : 1 }
(1)hash分片模式测试
分片的集合是collection1,根据id进行hash分片
mongos> sh.shardCollection("testdb.collection1",{"_id":"hashed"})
{ "collectionsharded" : "testdb.collection1", "ok" : 1 }
共插入10个测试document
mongos> use testdb
switched to db testdb
mongos> for(var i=0;i<10;i++){db.collection1.insert({name:"jack"+i});}
WriteResult({ "nInserted" : 1 })
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5be6b98a507b3e0370eb36b4")
}
shards:
{ "_id" : "shard0000", "host" : "172.16.101.54:27017" }
{ "_id" : "shard0001", "host" : "172.16.101.54:27018" }
{ "_id" : "shard0002", "host" : "172.16.101.54:27019" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
2 : Success
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
{ "_id" : "testdb", "partitioned" : true, "primary" : "shard0000" }
testdb.collection1
shard key: { "_id" : "hashed" }
chunks:
shard0000 2
shard0001 2
shard0002 2
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-6148914691236517204") } on : shard0000 Timestamp(3, 2)
{ "_id" : NumberLong("-6148914691236517204") } -->> { "_id" : NumberLong("-3074457345618258602") } on : shard0000 Timestamp(3, 3)
{ "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong(0) } on : shard0001 Timestamp(3, 4)
{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("3074457345618258602") } on : shard0001 Timestamp(3, 5)
{ "_id" : NumberLong("3074457345618258602") } -->> { "_id" : NumberLong("6148914691236517204") } on : shard0002 Timestamp(3, 6)
{ "_id" : NumberLong("6148914691236517204") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0002 Timestamp(3, 7)
查看每个sharding上的数据分布情况db.collection.stats()
mongos> db.collection1.stats()
{
"sharded" : true,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"ns" : "testdb.collection1",
"count" : 10, #共十个document数据
"numExtents" : 3,
"size" : 480,
"storageSize" : 24576,
"totalIndexSize" : 49056,
"indexSizes" : {
"_id_" : 24528,
"_id_hashed" : 24528
},
"avgObjSize" : 48,
"nindexes" : 2,
"nchunks" : 6,
"shards" : {
"shard0000" : {
"ns" : "testdb.collection1",
"count" : 0,
"size" : 0,
"numExtents" : 1,
"storageSize" : 8192,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"_id_hashed" : 8176
},
"ok" : 1
},
"shard0001" : {
"ns" : "testdb.collection1",
"count" : 6,
"size" : 288,
"avgObjSize" : 48,
"numExtents" : 1,
"storageSize" : 8192,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"_id_hashed" : 8176
},
"ok" : 1
},
"shard0002" : {
"ns" : "testdb.collection1",
"count" : 4,
"size" : 192,
"avgObjSize" : 48,
"numExtents" : 1,
"storageSize" : 8192,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"_id_hashed" : 8176
},
"ok" : 1
}
},
"ok" : 1
}
分别登陆sharding节点查看数据分布,和通过命令db.collection.stats()看到的结果一致
可以发现节点27017上没有数据,节点27018上有6个document,节点27019上有4个document,出现这种情况的原因可能是插入的数据量太小,没有分布均匀,数据量越大,分布越均匀。
[root@sht-sgmhadoopcm-01 mongodb]# bin/mongo 172.16.101.54:27017/testdb
> db.collection1.find()
[root@sht-sgmhadoopcm-01 mongodb]# bin/mongo 172.16.101.54:27018/testdb
> db.collection1.find()
{ "_id" : ObjectId("5be6c467e6467cc8077da816"), "name" : "jack1" }
{ "_id" : ObjectId("5be6c467e6467cc8077da817"), "name" : "jack2" }
{ "_id" : ObjectId("5be6c467e6467cc8077da818"), "name" : "jack3" }
{ "_id" : ObjectId("5be6c467e6467cc8077da81a"), "name" : "jack5" }
{ "_id" : ObjectId("5be6c467e6467cc8077da81c"), "name" : "jack7" }
{ "_id" : ObjectId("5be6c467e6467cc8077da81e"), "name" : "jack9" }
[root@sht-sgmhadoopcm-01 mongodb]# bin/mongo 172.16.101.54:27019/testdb
> db.collection1.find()
{ "_id" : ObjectId("5be6c467e6467cc8077da815"), "name" : "jack0" }
{ "_id" : ObjectId("5be6c467e6467cc8077da819"), "name" : "jack4" }
{ "_id" : ObjectId("5be6c467e6467cc8077da81b"), "name" : "jack6" }
{ "_id" : ObjectId("5be6c467e6467cc8077da81d"), "name" : "jack8" }
(2)range分片模式测试
分片的集合是collection2,根据name进行range分片
mongos> sh.shardCollection("testdb.collection2",{"name":1})
{ "collectionsharded" : "testdb.collection2", "ok" : 1 }
mongos> for(var i=0;i<1000;i++){db.collection2.insert({name:"jack"+i});}
WriteResult({ "nInserted" : 1 })
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("5be6b98a507b3e0370eb36b4")
}
shards:
{ "_id" : "shard0000", "host" : "172.16.101.54:27017" }
{ "_id" : "shard0001", "host" : "172.16.101.54:27018" }
{ "_id" : "shard0002", "host" : "172.16.101.54:27019" }
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
4 : Success
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }
{ "_id" : "testdb", "partitioned" : true, "primary" : "shard0000" }
testdb.collection1
shard key: { "_id" : "hashed" }
chunks:
shard0000 2
shard0001 2
shard0002 2
{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-6148914691236517204") } on : shard0000 Timestamp(3, 2)
{ "_id" : NumberLong("-6148914691236517204") } -->> { "_id" : NumberLong("-3074457345618258602") } on : shard0000 Timestamp(3, 3)
{ "_id" : NumberLong("-3074457345618258602") } -->> { "_id" : NumberLong(0) } on : shard0001 Timestamp(3, 4)
{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("3074457345618258602") } on : shard0001 Timestamp(3, 5)
{ "_id" : NumberLong("3074457345618258602") } -->> { "_id" : NumberLong("6148914691236517204") } on : shard0002 Timestamp(3, 6)
{ "_id" : NumberLong("6148914691236517204") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0002 Timestamp(3, 7)
testdb.collection2
shard key: { "name" : 1 }
chunks:
shard0000 1
shard0001 1
shard0002 1
{ "name" : { "$minKey" : 1 } } -->> { "name" : "jack1" } on : shard0001 Timestamp(2, 0)
{ "name" : "jack1" } -->> { "name" : "jack5" } on : shard0002 Timestamp(3, 0)
{ "name" : "jack5" } -->> { "name" : { "$maxKey" : 1 } } on : shard0000 Timestamp(3, 1)
mongos> db.collection2.stats()
{
"sharded" : true,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"ns" : "testdb.collection2",
"count" : 1000,
"numExtents" : 6,
"size" : 48032,
"storageSize" : 221184,
"totalIndexSize" : 130816,
"indexSizes" : {
"_id_" : 65408,
"name_1" : 65408
},
"avgObjSize" : 48.032,
"nindexes" : 2,
"nchunks" : 3,
"shards" : {
"shard0000" : {
"ns" : "testdb.collection2",
"count" : 555,
"size" : 26656,
"avgObjSize" : 48,
"numExtents" : 3,
"storageSize" : 172032,
"lastExtentSize" : 131072,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 65408,
"indexSizes" : {
"_id_" : 32704,
"name_1" : 32704
},
"ok" : 1
},
"shard0001" : {
"ns" : "testdb.collection2",
"count" : 1,
"size" : 48,
"avgObjSize" : 48,
"numExtents" : 1,
"storageSize" : 8192,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"name_1" : 8176
},
"ok" : 1
},
"shard0002" : {
"ns" : "testdb.collection2",
"count" : 444,
"size" : 21328,
"avgObjSize" : 48,
"numExtents" : 2,
"storageSize" : 40960,
"lastExtentSize" : 32768,
"paddingFactor" : 1,
"paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
"userFlags" : 1,
"capped" : false,
"nindexes" : 2,
"totalIndexSize" : 49056,
"indexSizes" : {
"_id_" : 24528,
"name_1" : 24528
},
"ok" : 1
}
},
"ok" : 1
}
参考链接
Sharding