我在文章Redis failover中介绍过如何安装Redis并且通过sentinel(哨兵)实现Redis的高可用。随着Redis的不断更新,现在的Redis(我使用的版本是6.2.6)已经支持了集群功能,本文记录了如何搭建一个Redis集群并使用。

我们使用如下的6台机器来构建一个Redis集群

  1. 172.19.65.196
  2. 172.19.72.108
  3. 172.19.72.112
  4. 172.19.72.203
  5. 172.19.65.228
  6. 172.19.65.136

下载源码并编译

首先在172.19.65.196上下载Redis源代码并进行编译,这里我下载的版本是6.2.6

useradd -m redis
su - redis
wget https://download.redis.io/redis-stable.tar.gz
tar -zxvf redis-stable.tar.gz
cd redis-stable
make

编译生成的可执行文件在src目录下

文件名 功能
redis-server Redis服务的启动程序
redis-cli Redis命令操作工具
redis-sentinel Redis哨兵,在Redis failover介绍过
redis-benchmark Redis性能测试工具
redis-check-rdb 检查快照文件状态
redis-check-aof 检查aof文件状态

这里我们只需要用到编译生成的redis-server程序,复制redis-server服务程序和redis.conf配置文件到用户根目录

cp /home/redis/redis-stable/src/redis-server /home/redis
cp /home/redis/redis-stable/redis.conf /home/redis

修改配置文件,同步文件到所有的机器

修改redis.conf文件,对一些属性进行设置,设置内容如下

cluster-enabled yes                 # 启用Redis集群设置
cluster-config-file nodes.conf      # 集群配置信息的存储文件,该文件由Redis管理,不需要手动修改
cluster-node-timeout 15000          # 集群节点超过指定时间(毫秒)无响应,就认为该节点已经挂掉了
appendonly yes                      # 开启aof方式的数据持久化
bind 0.0.0.0                        # 允许任何主机访问Redis的服务

随后我们将这两个文件同步到剩余的5台机器上,在5台机器上执行如下命令

useradd -m redis
su - redis
rsync -azvhP root@172.19.65.196:/home/redis/redis-server :/home/redis/redis.conf ./

启动Redis进程并构建集群

将redis-server和redis.conf这两个文件分发到所有的机器上之后,在所有的机器上启动Redis进程

./redis-server redis.conf

6台机器上面的redis-server进程都启动好了之后,复制刚刚我们编译好的redis-cli程序到任意一台器上,连接所有的redis-server创建Redis集群并设置副本为1

~ ./redis-cli --cluster create 172.19.65.196:6379 172.19.72.108:6379 \
172.19.72.112:6379 172.19.72.203:6379 172.19.65.228:6379 \
172.19.65.136:6379 --cluster-replicas 1

>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 172.19.65.228:6379 to 172.19.65.196:6379
Adding replica 172.19.65.136:6379 to 172.19.72.108:6379
Adding replica 172.19.72.203:6379 to 172.19.72.112:6379
M: 8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379
slots:[0-5460] (5461 slots) master
M: 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379
slots:[5461-10922] (5462 slots) master
M: 2335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379
slots:[10923-16383] (5461 slots) master
S: 31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379
replicates 2335076efd1d6f38eac1228d5b326380d92056f4
S: 0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379
replicates 8e172b28314aad39c31ace1229f7d1ae4cdb4973
S: f76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379
replicates 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03
Can I set the above configuration? (type 'yes' to accept):

如上显示了将要创建的集群的状态信息。redis会提示你是否使用如上的配置,输入yes并回车

>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join

>>> Performing Cluster Check (using node 172.19.65.196:6379)
M: 8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
S: f76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379
slots: (0 slots) slave
replicates 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03
M: 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
M: 2335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: 31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379
slots: (0 slots) slave
replicates 2335076efd1d6f38eac1228d5b326380d92056f4
S: 0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379
slots: (0 slots) slave
replicates 8e172b28314aad39c31ace1229f7d1ae4cdb4973
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

执行完命令之后,集群就已经创建了。根据如上显示的信息,此时6个节点的角色如下

节点 功能
172.19.65.196 master节点,保存slots 0-5460
172.19.72.108 master节点,保存slots 5461-10922
172.19.72.112 master节点,保存slots 10923-16383
172.19.72.203 172.19.72.112:6379的replica
172.19.65.228 172.19.65.196:6379的replica
172.19.65.136 172.19.72.108:6379的replica

集群启动后新生成的文件

观察用户的根目录中除了redis-server和redis.conf之外,还生成了appendonly.aof、dump.rdb和nodes.conf文件

文件 作用
appendonly.aof AOF文件,通过追加的方式记录Redis的每一次写操作到磁盘
dump.rdb RDB快照文件,是将Redis内存中的数据持久化到磁盘中生成的
nodes.conf Redis进程用于保存Redis集群相关的配置信息,不需要手动修改

nodes.conf的内容如下,保存了一些和集群配置相关的信息,记录了哪些节点是master,哪些节点是slave并且它所追随的master节点是谁

f76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379@16379 slave 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 0 1650355208025 2 connected
5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379@16379 master - 0 1650355209991 2 connected 5461-10922
2335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379@16379 master - 0 1650355210995 3 connected 10923-16383
31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379@16379 slave 2335076efd1d6f38eac1228d5b326380d92056f4 0 1650355213006 3 connected
0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379@16379 slave 8e172b28314aad39c31ace1229f7d1ae4cdb4973 0 1650355212000 1 connected
8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379@16379 myself,master - 0 0 1 connected 0-5460
vars currentEpoch 6 lastVoteEpoch 0

触发failover

我们可以通过客户端连接redis-server执行命令,-c表示连接的是一个集群。执行命令cluster nodes查看当前集群的节点信息,这里显示了master节点和slave节点

~ ./redis-cli -c -h 172.19.65.196 -p 6379
> cluster nodes
f76fff860057dfab9d4df63b7ee183bb0a23e7df 172.19.65.136:6379@16379 slave 5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 0 1650356492455 2 connected
5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379@16379 master - 0 1650356495473 2 connected 5461-10922
2335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379@16379 master - 0 1650356491451 3 connected 10923-16383
31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379@16379 slave 2335076efd1d6f38eac1228d5b326380d92056f4 0 1650356494468 3 connected
0146973e61ffe3d9f63da5dfb9e565e02b1774b6 172.19.65.228:6379@16379 slave 8e172b28314aad39c31ace1229f7d1ae4cdb4973 0 1650356493462 1 connected
8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379@16379 myself,master - 0 0 1 connected 0-5460

执行命令./redis-cli -h 172.19.72.112 -p 6379 debug segfault停止112节点的Redis进程,之后再使用cluster nodes查看集群信息

~ ./redis-cli -h 172.19.65.196 cluster nodes | grep master
5e2aedd8c0b8ca9cc7839b3779fc34ceabfeda03 172.19.72.108:6379@16379 master - 0 1650356951720 2 connected 5461-10922
2335076efd1d6f38eac1228d5b326380d92056f4 172.19.72.112:6379@16379 master,fail - 1650356740773 1650356736743 3 disconnected
31e83cc017e9d15190b349e11c4762a0d33a3162 172.19.72.203:6379@16379 master - 0 1650356954752 7 connected 10923-16383
8e172b28314aad39c31ace1229f7d1ae4cdb4973 172.19.65.196:6379@16379 myself,master - 0 0 1 connected 0-5460

此时可以看到112已经挂掉了,而203接过112的职责成为了新的master,此时集群已经恢复正常。

进行数据读写操作

为了方便使用,我们可以在redis.conf中添加配置daemonize yes使得Redis以守护进程的方式运行。我们可以使用依次停止节点修改配置再启动节点的方式,不停止整个集群修改配置。

使用命令./redis-cli -c -h 172.19.65.196 -p 6379进入Redis的交互式命令行

> set counter 100
-> Redirected to slot [6680] located at 172.19.65.136:6379
OK
> incr counter
(integer) 101
> incr counter
(integer) 102
> incr counter
(integer) 103
> incr counter
(integer) 104
> incr counter
(integer) 105
> incr counter
(integer) 106
> incr counter
(integer) 107
> incr counter
(integer) 108
> incr counter
(integer) 109
> incr counter
(integer) 110
> RPUSH mylist 11
-> Redirected to slot [5282] located at 172.19.65.228:6379
(integer) 1
> RPUSH mylist 22
(integer) 2
> RPUSH mylist 33
(integer) 3
> LRANGE mylist 0 -1
1) "11"
2) "22"
3) "33"
> hmset user:1000 username antirez birthyear 1977 verified 1
OK
> hget user:1000 username
"antirez"
> hgetall user:1000
1) "username"
2) "antirez"
3) "birthyear"
4) "1977"
5) "verified"
6) "1"
> hget user:1000 birthyear
"1977"
> SADD myset 1 12 3 3 1 2 33 88 1 2 3
(integer) 6
> SMEMBERS myset
1) "1"
2) "2"
3) "3"
4) "12"
5) "33"
6) "88"

参考

Scaling with Redis Cluster
深入学习Redis之Redis Cluster