搭建Rook Ceph存储集群

搭建rook ceph存储集群

一、安装Rook集群

https://www.rook.io/docs/rook/v1.8/quickstart.html

  1. 下载代码

    1
    2
    git clone --single-branch --branch v1.8.3 https://github.com/rook/rook.git
    cd rook/deploy/examples
  1. 修改operator.yaml文件(主要是修改镜像文件下载地址,无法直接在国内网络环境下从k8s.gcr.io下载镜像,最好使用翻墙工具在自己电脑上下好镜像然后在生产服务器上直接使用)。如下图所示:

    1
    2
    3
    4
    5
    6
    ROOK_CSI_CEPH_IMAGE: "testharbor.zuoyejia.com/k8s_image/cephcsi@sha256:19634b6ef9fc6df2902cf6ff0b3dbccc56a6663d0cbfd065da44ecd2f955d848"
    ROOK_CSI_REGISTRAR_IMAGE: "testharbor.zuoyejia.com/k8s_image/csi-node-driver-registrar@sha256:01b341312ea19cefc29f46fa0dd54255530b9039dd80834f50d582ecd93cc3ca"
    ROOK_CSI_RESIZER_IMAGE: "testharbor.zuoyejia.com/k8s_image/csi-resizer@sha256:d2d2e429a0a87190ee73462698a02a08e555055246ad87ad979b464b999fedae"
    ROOK_CSI_PROVISIONER_IMAGE: "testharbor.zuoyejia.com/k8s_image/csi-provisioner@sha256:bbae7cde811054f6a51060ba7a42d8bf2469b8c574abb50fec8b46c13e32541e"
    ROOK_CSI_SNAPSHOTTER_IMAGE: "testharbor.zuoyejia.com/k8s_image/csi-snapshotter@sha256:551b9692943f915b5ee4b7274e3a918692a6175bb028f1f0236a38596c46cbe0"
    ROOK_CSI_ATTACHER_IMAGE: "testharbor.zuoyejia.com/k8s_image/csi-attacher@sha256:221c1c6930fb1cb93b57762a74ccb59194c4c74a63c0fd49309d1158d4f8c72c"

  2. 修改cluster.yaml文件(主要修改镜像地址、mon数量、mgr数量、dashboard的ssl)

    • mgr数量:Ceph集群需要可用,所以最好配置为2(这里有个深坑这里的高可用是主备模式,在配置dashboard的时候需要注意)
    • dashboard的ssl:修改这个是因为我们使用ingress的方式,并且在ingress的前面还有一层是华为云的ELB(华为云的ELB不支持后端协议为tcp,所以这里只能将dashboard的ssl关掉,在ELB层配置https),

      如下图所示:

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      mon:
         # Set the number of mons to be started. Generally recommended to be 3.
         # For highest availability, an odd number of mons should be specified.
        count: 3
         # The mons should be on unique nodes. For production, at least 3 nodes are recommended for this reason.
         # Mons should only be allowed on the same node for test environments where data loss is acceptable.
        allowMultiplePerNode: false
      mgr:
         # When higher availability of the mgr is needed, increase the count to 2.
         # In that case, one mgr will be active and one in standby. When Ceph updates which
         # mgr is active, Rook will update the mgr services to match the active mgr.
        count: 2
        modules:
           # Several modules should not need to be included in this list. The "dashboard" and "monitoring" modules
           # are already enabled by other settings in the cluster CR.
           - name: pg_autoscaler
            enabled: true
       # enable the ceph dashboard for viewing cluster status
      dashboard:
        enabled: true
         # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
         # urlPrefix: /ceph-dashboard
         # serve the dashboard at the given port.
         # port: 8443
         # serve the dashboard using SSL
         # ssl: true
       # enable prometheus alerting for cluster
      monitoring:
         # requires Prometheus to be pre-installed
        enabled: false
         # namespace to deploy prometheusRule in. If empty, namespace of the cluster will be used.
         # Recommended:
         # If you have a single rook-ceph cluster, set the rulesNamespace to the same namespace as the cluster or keep it empty.
         # If you have multiple rook-ceph clusters in the same k8s cluster, choose the same namespace (ideally, namespace with prometheus
         # deployed) to set rulesNamespace for all the clusters. Otherwise, you will get duplicate alerts with multiple alert definitions.
        rulesNamespace: rook-ceph

  3. 部署Rook Operator

    1
    2
    cd rook/deploy/examples
    kubectl apply -f crds.yaml -f common.yaml -f operator.yaml
  1. 创建Ceph集群

    1
    kubectl apply -f cluster.yaml
  1. 检测Ceph集群是否正常。如下图所示

    1
    kubectl -n rook-ceph get pod

二、搭建Ceph Dashboard面板

  1. 配置在集群外部查看Dashboard。这里我们使用service http的NodePort模式。这个时候我们会发现Dashboard并不可以访问,解决办法看2

    1
    2
    cd rook/deploy/examples
    kubectl create -f dashboard-external-http.yaml
  1. 检测mgr主备高可用模式下哪个pod真正能用
    由于主备模式下有一个Pod是不能用的,所以在配置service的时候可能代理的mgr pod不能用,所以导致Dashboard不能访问。

    • 找到第一步查看上面的svc。我们看到pod所在的端口(targetPort)是7000

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      apiVersion: v1
      kind: Service
      metadata:
      name: rook-ceph-mgr-dashboard-external-http
      namespace: rook-ceph # namespace:cluster
      labels:
        app: rook-ceph-mgr
        rook_cluster: rook-ceph # namespace:cluster
      spec:
      ports:
        - name: dashboard
          port: 7000
          protocol: TCP
          targetPort: 7000
      selector:
        app: rook-ceph-mgr
        ceph_daemon_id: a
        rook_cluster: rook-ceph
      sessionAffinity: None
      type: NodePort
    • 找到mgr的两个podIP地址

      1
      kubectl -n rook-ceph get pod -owide

    • 使用curl访问7000端口查看哪个Pod正常返回

      1
      2
      curl 10.244.8.83:7000
      curl 10.244.68.188:7000

    • 查看这两个pod的lables信息,并找到ceph_daemon_id信息

      1
      kubectl get pod -n rook-ceph --show-labels

    • 修改dashboard-external-http.yaml文件。在selector中加上可用Pod的ceph_daemon_id

      1
      2
      3
      4
      selector:
        app: rook-ceph-mgr
        ceph_daemon_id: a
        rook_cluster: rook-ceph

  1. 重新加载dashboard-external-http.yaml service。这次发现使用公网ip+Nodeport可以正常访问Dashboard

    1
    kubectl apply -f dashboard-external-http.yaml
  1. 创建 ingress文件dashboard-ingress-http.yaml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    #
    # This example is for Kubernetes running an ngnix-ingress
    # and an ACME (e.g. Let's Encrypt) certificate service
    #
    # The nginx-ingress annotations support the dashboard
    # running using HTTPS with a self-signed certificate
    #
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
    name: rook-ceph-mgr-dashboard
    namespace: rook-ceph # namespace:cluster
    spec:
    ingressClassName: nginx
    rules:
      - host: rookceph.zuoyejia.com
        http:
          paths:
            - path: /
              pathType: Prefix
              backend:
                service:
                  name: rook-ceph-mgr-dashboard-external-http
                  port:
                    number: 7000
  1. 加载ingress

    1
    kubectl apply -f dashboard-ingress-http.yaml
  1. 注意:本文第4、5部需要使用ingress-nginx和华为云的ELB。如果没有的话直接使用官方文档上的文件。再次提醒,service上必须加上可以访问正常Pod的选择器。

三、Ceph存储

1.1 块存储(RBD)

RDB: RADOS Block Devices

RADOS: Reliable, Autonomic Distributed Object Store

  1. 配置
    RWO:(ReadWriteOnce)
    常用 块存储 。RWO模式;STS删除,pvc不会删除,需要自己手动维护

    https://www.rook.io/docs/rook/v1.8/ceph-block.html

    1
    kubectl create -f deploy/examples/csi/rbd/storageclass.yaml

1.2 共享文件存储(CephFS)

  1. 配置
    常用 文件存储。 RWX模式;如:10个Pod共同操作一个地方
    https://rook.io/docs/rook/v1.8/ceph-filesystem.html

    1
    2
    3
    cd rook
    kubectl apply -f filesystem.yaml
    kubectl create -f deploy/examples/csi/cephfs/storageclass.yaml

四、卸载Rook Ceph

参考:https://rook.io/docs/rook/v1.8/ceph-teardown.html

1. 清理集群

1
rm -rf /var/lib/rook

2.删除块和部署的文件

1
2
3
4
kubectl delete -f crds.yaml -f common.yaml -f operator.yaml
kubectl delete -f cluster.yaml
kubectl delete -n rook-ceph cephblockpool replicapool
kubectl delete storageclass rook-ceph-block

3.删除CephCluster CRD

1
2
3
4
5
6
# 1.编辑CephCluster并添加cleanupPolicy
# 2.删除CephClusterCR
# 3.确认已删除集群 CR
kubectl -n rook-ceph patch cephcluster rook-ceph --type merge -p '{"spec":{"cleanupPolicy":{"confirmation":"yes-really-destroy-data"}}}'
kubectl -n rook-ceph delete cephcluster rook-ceph
kubectl -n rook-ceph get cephcluster

4.删除Operator 及相关资源

1
2
3
kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml

5.删除主机上的数据

1
2
3
4
5
6
7
8
9
10
DISK="/dev/vdb"
sgdisk --zap-all $DISK
dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
# 如果是SSD
# blkdiscard $DISK
partprobe $DISK

ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
rm -rf /dev/ceph-*
rm -rf /dev/mapper/ceph--*

6.故障排除

  • 查看Pod

    1
    kubectl -n rook-ceph get pod
  • 查看集群CRD

    1
    kubectl -n rook-ceph get cephcluster
  • 删除CRD

    1
    2
    3
    4
    5
    # 删除CRD
    for CRD in $(kubectl get crd -n rook-ceph | awk '/ceph.rook.io/ {print $1}'); do
    kubectl get -n rook-ceph "$CRD" -o name | \
    xargs -I {} kubectl patch -n rook-ceph {} --type merge -p '{"metadata":{"finalizers": [null]}}'
    done
  • 如果如果namespace仍然停留在Terminting状态,可以检查哪些资源正在阻止删除并删除finalizers并删除

    1
    2
    kubectl api-resources --verbs=list --namespaced -o name \
    | xargs -n 1 kubectl get --show-kind --ignore-not-found -n rook-ceph
  • 删除finalizers资源

    1
    2
    3
    4
    5
    6
    kubectl -n rook-ceph patch configmap rook-ceph-mon-endpoints --type merge -p '{"metadata":{"finalizers": []}}'
    kubectl -n rook-ceph patch secrets rook-ceph-mon --type merge -p '{"metadata":{"finalizers": []}}'

    # 如果cluster和replicapool存在执行下面命令
    kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge
    kubectl -n rook-ceph patch cephblockpool.ceph.rook.io replicapool -p '{"metadata":{"finalizers": []}}' --type=merge