Rook-ceph

Rook概述

Rook(https://rook.io/)
是由CNCF社区管理的云原生存储编排系统,rook并不是一个实际的存储软件,它做的是将存储软件的部署和运维动作通过Kubernetes来实现自动化。比如rook-ceph项目实际上是在Kubernetes中定义了对应的operator和CRD资源对象来对ceph集群进行操作。

rook目前支持的存储

  • Ceph
  • EdgeFS
  • CockroachDB
  • Cassandra
  • NFS
  • Yugabyte DB

目前比较成熟的是rook-ceph。通过rook-ceph可以将ceph非常简单方便的部署到Kubernetes,通过Kubernetes的资源对象来对ceph进行控制。

rook架构

使用Rook部署ceph

环境概述

软件 版本
centos 7.7
Kubernetes 1.17.4
rook v1.3

每个节点预留了块50G的磁盘做为osd节点

节点磁盘信息如下:

1
2
3
4
5
lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
vda
└─vda1 ext4 995d4542-f0dd-47e6-90eb-690de3b64430 /
vdb

将节点vdb磁盘做为ceph-osd节点

1
git clone https://github.com/rook/rook.git -b release-1.3

若clone慢也可以使用

1
git clone https://gitee.com/wanshaoyuan/rook.git -b release-1.3

部署

1
cd rook/cluster/examples/kubernetes/ceph

1
2
kubectl create -f common.yaml
kubectl create -f operator.yaml

cluster.yaml文件内包含对ceph初始化的配置

1
kubectl create -f cluster.yaml

参数:
默认情况会下rook-ceph会将集群内全部节点以及节点上全部磁盘做为osd节点,生产环境不建议这样使用,建议指定节点和指定节点设备。

1
2
useAllDevices: true //将host上全部空余设备做为ceph-osd磁盘
useAllNodes: true //将集群内全部节点做为ceph节点

使用指定节点的指定设备配置

useAllNodes和useAllDevices设置为false

1
2
3
4
5
6
7
8
9
10
nodes:
- name: "172.24.234.128"
devices: # specific devices to use for storage can be specified for each node
- name: "vdb"
- name: "172.24.234.147"
devices:
- name: "vdb"
- name: "172.24.234.156"
devices:
- name: "vdb"

注意:nodes:name处要与kubectl get node出来的显示一致,若为ip显示ip若为主机名显示主机名

ceph-dashboard访问

1
kubectl apply -f dashboard-external-https.yaml

获取访问端口

1
2
3
4
kubectl get svc/rook-ceph-mgr-dashboard-external-https -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr-dashboard-external-https NodePort 10.43.117.2 <none> 8443:30519/TCP 53s

获取访问密码

1
kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath='{.data.password}' | base64 --decode

查看集群健康状态

1
2
3
kubectl get CephCluster -n rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH
rook-ceph /var/lib/rook 3 20m Ready Cluster updated successfully HEALTH_OK

创建存储池和storageclass

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
# clusterID is the namespace where the rook cluster is running
clusterID: rook-ceph
# Ceph pool into which the RBD image shall be created
pool: replicapool
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering
# The secrets contain Ceph admin credentials.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`.
csi.storage.k8s.io/fstype: xfs
# Delete the rbd volume when a PVC is deleted
reclaimPolicy: Delete

部署应用测试

1
kubectl apply -f /root/rook/cluster/examples/kubernetes/mysql.yaml -f /root/rook/cluster/examples/kubernetes/wordpress.yaml

扩容存储节点

需要保证磁盘的gpt分区表类型,先对磁盘进行初始化

1
2
3
4
parted -s /dev/xxxx mklabel gpt
sgdisk --zap-all /dev/xxx

编辑集群资源

1
kubectl edit CephCluster/rook-ceph -n rook-ceph

添加对应节点磁盘字段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
- config: null
devices:
- config: null
name: vdb
name: rke-node5
resources: {}
- config: null
devices:
- config: null
name: vdb
name: rke-node6
resources: {}
- config: null
devices:
- config: null
name: vdb
name: rke-node7
resources: {}

常见问题:

1、RKE部署问题

1、通过Rancher RKE部署的Kubernetes集群
因为rancher rke部署的集群kubelet是运行在容器中的,所以需要将flexvolume插件映射到kubelet容器中,不然无法挂载pvc到workload中。

为kubelt添加以下参数:

1
2
3
4
extra_args:
volume-plugin-dir: /usr/libexec/kubernetes/kubelet-plugins/volume/exec
extra_binds:
/usr/libexec/kubernetes/kubelet-plugins/volume/exec:/usr/libexec/kubernetes/kubelet-plugins/volume/exec

2、ubuntu16.04操作系统部署问题

ubuntu16.04默认4.4内核无法挂载rbd块到workload中,提示缺少特性,需要将内核升级到4.15。
升级步骤如下:

升级内核到4.15

1
2
uname -a
Linux kworker2 4.4.0-142-generic #168-Ubuntu SMP Wed Jan 16 21:00:45 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
1
apt-get install --install-recommends linux-generic-hwe-16.04
1
reboot
1
2
uname -a
Linux kworker2 4.15.0-60-generic #67~16.04.1-Ubuntu SMP Mon Aug 26 08:57:33 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

清理集群

1
2
3
kubectl delete -f cluster.yaml
kubectl delete -f operator.yaml
kubectl delete -f common.yaml

清理宿主机目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/env bash
DISK="/dev/sdb"
# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)
# You will have to run this step for all disks.
sgdisk --zap-all $DISK
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
# These steps only have to be run once on each node
# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
# ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter)
rm -rf /dev/ceph-*
rm /var/lib/rook/ -rf
rm /var/lib/kubelet/plugins/ -rf
rm /var/lib/kubelet/plugins_registry/ -rf