1 2 3 4 5 作者:李晓辉 微信联系:lxh_chat 联系邮箱: 939958092@qq.com
还在用 Docker 跑 K8s?该升级啦!这篇文章带你一起在 Rocky Linux 9.4 上,用更原生的容器运行时containerd 2.1.1,搭配命令行神器 nerdctl 2.1.2,部署一套最新的 Kubernetes 1.33.1 集群。咱们不走传统套路,全程抛弃 Docker,来一场纯粹的 containerd 冒险。
整篇教程从零开始,包括:环境准备、containerd 配置、nerdctl 测试、Kubernetes 安装与初始化等步骤,一步一步撸下来,轻松搞定一个能跑业务的高质量集群。如果你也好奇 nerdctl 到底好不好用,或者想在不依赖 Docker 的前提下部署 K8s,那这篇文章可能正合你口味~
项目 详细信息 K8s 版本 1.33.1 操作系统 Rocky Linux 9.4 Master 主机名 k8s-master Worker 主机名 k8s-worker1 containerd 版本 v2.1.1
准备DNS解析 这一步需要在所有机器上完成
1 2 3 4 cat >> /etc/hosts <<EOF 192.168.8.200 k8s-master 192.168.8.201 k8s-worker1 EOF
首先呢,我们要下载一个叫 nerdctl 的工具,它是个很厉害的容器工具哦,这个工具使用containerd作为底层运行时,命令行又高度兼容docker,是我们的首选方案,下载好之后,我们把这个压缩包解压到 /usr/local 目录下,这样就可以方便我们后面使用啦。
1 2 wget https://github.com/containerd/nerdctl/releases/download/v2.1.2/nerdctl-full-2.1.2-linux-amd64.tar.gz tar Cxzvvf /usr/local nerdctl-full-2.1.2-linux-amd64.tar.gz
看看我们的containerd安装好的版本
1 2 [root@k8s-master ~]# containerd -v containerd github.com/containerd/containerd/v2 v2.1.1 cb1076646aa3740577fafbf3d914198b7fe8e3f7
接下来,我们要给 containerd 做一些配置,让它能更好地工作。首先,我们创建一个配置文件夹 /etc/containerd,然后生成默认的配置文件。
1 2 mkdir /etc/containerdcontainerd config default > /etc/containerd/config.toml
然后呢,我们要修改一下配置文件,把默认的 pause 镜像地址换成一个国内的镜像地址,这样在国内访问会更快哦。
1 sed -i "s|sandbox = 'registry.k8s.io/pause:3.10'|sandbox = 'registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.10'|" /etc/containerd/config.toml
再接着,我们还要设置一下镜像仓库的配置路径,让 containerd 知道去哪里找镜像之类的,如果你有容器镜像加速器,就应该配置在这个路径下。
需要注意的是,containerd的1.X版本和containerd的2.X版本,这里的路径是不一样的
1 sed -i '/^\s*\[plugins.' "'" 'io.containerd.cri.v1.images' "'" '.registry\]/{n;s|^\(\s*\)config_path = .*$|\1config_path = ' "'" '/etc/containerd/certs.d' "'" '|}' /etc/containerd/config.toml
为了能让 containerd 正常拉取镜像,我们还要配置一下镜像仓库的证书信息。首先,创建一个目录来存放证书,我们创建一个配置文件,告诉 containerd 我们要使用一个国内的镜像代理地址,这样拉取镜像会更快哦。
在中国的网络下,无法成功下载容器镜像,你可以从互联网上看看有没有免费的容器镜像仓库可以帮你,或者从我的站点上购买一个容器镜像加速器即可
https://www.linuxcenter.cn/k8s/docker-image-speed.html
我这里为docker.io提供了加速,也就是我们俗称的docker容器镜像,dockerhub之类的
1 2 3 4 5 6 mkdir /etc/containerd/certs.d/docker.io -pcat > /etc/containerd/certs.d/docker.io/hosts.toml <<-'EOF' server = "https://registry.myk8s.cn" [host."https://registry.myk8s.cn" ] capabilities = ["pull" , "resolve" , "push" ] EOF
配置都做好了,接下来我们就要启动 containerd 和 buildkit 服务啦,这样它们就能开始工作啦。
1 2 3 systemctl daemon-reload systemctl enable --now containerd systemctl enable --now buildkit
最后,为了让我们在使用 nerdctl 的时候更方便,我们可以配置一下命令补全。这样在输入命令的时候,就可以直接按 Tab 键来补全啦。
1 2 nerdctl completion bash > /etc/bash_completion.d/nerdctl source /etc/bash_completion.d/nerdctl
创建第一个容器 我们这里创建容器主要是为了测试containerd的工作是否正常
运行容器 1 2 nerdctl run -d -p 8000:80 --name container1 registry.myk8s.cn/library/nginx nerdctl ps
输出
1 2 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1353d09a9df3 registry.myk8s.cn/library/nginx:latest "/docker-entrypoint.…" 21 seconds ago Up 0.0.0.0:8000->80/tcp container1
-d 是指后台运行
-p 是端口映射,此处是将宿主机的8000端口和容器内的80端口映射到一起
–name 是指容器的名字
nginx 是指本次使用的镜像名字
进入容器 1 2 3 root@k8s-master:~# nerdctl exec -it container1 /bin/bash root@1353d09a9df3:/# echo hello lixiaohui > /usr/share/nginx/html/index.html root@1353d09a9df3:/# exit
exec -it 是指通过交互式进入terminal
访问容器内容 1 2 curl http://127.0.0.1:8000 hello lixiaohui
部署Kubernetes kubernetes集群部署之前的步骤都需要在所有节点完成
关闭swap分区 1 2 swapoff -a sed -i 's/.*swap.*/#&/' /etc/fstab
允许 iptables 检查桥接流量 1 2 3 4 5 6 7 8 9 10 cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf br_netfilter EOF modprobe br_netfilter cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 EOF sudo sysctl --system
安装 kubeadm 这里选择使用南京大学软件仓库进行加速,本次安装的k8s版本是1.33
1 2 3 4 5 6 7 8 9 cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=https://mirrors.nju.edu.cn/kubernetes/core%3A/stable%3A/v1.33/rpm/ enabled=1 gpgcheck=1 gpgkey=https://mirrors.nju.edu.cn/kubernetes/core%3A/stable%3A/v1.33/rpm/repodata/repomd.xml.key exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni EOF
安装kubelet kubeadm kubectl这三个包,把kubelet服务启动一下
1 2 sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetessudo systemctl enable --now kubelet
添加命令自动补齐功能 1 2 3 4 kubectl completion bash > /etc/bash_completion.d/kubectl kubeadm completion bash > /etc/bash_completion.d/kubeadm source /etc/bash_completion.d/kubectlsource /etc/bash_completion.d/kubeadm
集成Containerd crictl 是一个用来管理容器运行时的命令行工具,它就像是一个“中间人”,帮助 Kubernetes 和容器运行时(比如 containerd)之间进行通信。
runtime-endpoint 这个字段告诉 crictl,容器运行时(containerd)的运行时接口地址在哪里。这里写的是 unix:///run/containerd/containerd.sock
,意思就是通过 Unix 套接字(socket)的方式,连接到 /run/containerd/containerd.sock
这个地址。简单来说,就是告诉 crictl 怎么和 containerd 通信。
image-endpoint 这个字段和 runtime-endpoint
类似,不过它是用来指定镜像服务的接口地址。这里也是 unix:///run/containerd/containerd.sock
,说明镜像服务和运行时服务是同一个地址,都是通过 containerd 来管理的。
1 2 3 4 5 6 cat > /etc/crictl.yaml <<-'EOF' runtime-endpoint: unix:///run/containerd/containerd.sock image-endpoint: unix:///run/containerd/containerd.sock timeout : 10debug: false EOF
集成是否成功,可以用下面的命令来简单检查一下,不报错就是成功了
1 2 [root@k8s-master ~]# crictl images IMAGE TAG IMAGE ID SIZE
顺便给crictl做一个命令补齐功能
1 2 crictl completion bash > /etc/bash_completion.d/crictl source /etc/bash_completion.d/crictl
开通防火墙 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 sudo firewall-cmd --zone=public --add-protocol=ipip --permanentsudo firewall-cmd --zone=public --add-port=179/tcp --permanentsudo firewall-cmd --zone=public --add-port=4789/udp --permanentsudo firewall-cmd --zone=public --add-port=51820/udp --permanentsudo firewall-cmd --zone=public --add-port=51821/udp --permanentsudo firewall-cmd --zone=public --add-port=5473/tcp --permanentsudo firewall-cmd --zone=public --add-port=443/tcp --permanentsudo firewall-cmd --zone=public --add-port=6443/tcp --permanentsudo firewall-cmd --zone=public --add-port=8080/tcp --permanentsudo firewall-cmd --zone=public --add-port=5443/tcp --permanentsudo firewall-cmd --zone=public --add-port=9090/tcp --permanentsudo firewall-cmd --zone=public --add-port=9081/tcp --permanentsudo firewall-cmd --zone=public --add-port=9900/tcp --permanentsudo firewall-cmd --zone=public --add-port=9200/tcp --permanentsudo firewall-cmd --zone=public --add-port=9443/tcp --permanentsudo firewall-cmd --zone=public --add-port=5444/tcp --permanentsudo firewall-cmd --zone=public --add-port=5601/tcp --permanentsudo firewall-cmd --zone=public --add-port=8444/tcp --permanentsudo firewall-cmd --zone=public --add-port=9443/tcp --permanentsudo firewall-cmd --zone=public --add-port=9449/tcp --permanentsudo firewall-cmd --zone=public --add-port=4790/udp --permanentsudo firewall-cmd --reload
以上内容属于所有节点都需要完成的,而集群部署这里,就只在主节点完成即可
集群部署 下方kubeadm.yaml中name字段必须在网络中可被解析,也可以将解析记录添加到集群中所有机器的/etc/hosts中
这个初始化集群部署的操作只能在master上执行
我们在这里生产了k8s安装所需要的配置文件,将配置文件中的API服务器地址改成本地的IP,然后集群名字我们用的k8s-master,容器镜像拉取我们选择了阿里云,为将来的pod网络预留了一个172.16.0.0/16
1 2 3 4 5 kubeadm config print init-defaults > kubeadm.yaml sed -i 's/.*advert.*/ advertiseAddress: 192.168.8.200/g' kubeadm.yaml sed -i 's/.*name.*/ name: k8s-master/g' kubeadm.yaml sed -i 's|imageRepo.*|imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers|g' kubeadm.yaml sed -i "/^\\s*networking:/a\\ podSubnet: 172.16.0.0/16" kubeadm.yaml
1 2 modprobe br_netfilter kubeadm init --config kubeadm.yaml
出现下面的提示就是成功了,保存好join的命令
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME /.kube sudo cp -i /etc/kubernetes/admin.conf $HOME /.kube/config sudo chown $(id -u):$(id -g) $HOME /.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.8.200:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:c23e7b2f6db14c9d8d5e7adaed6ec60e08778ef3f421e38cd6c22615d95d50c2
授权管理权限
1 2 3 mkdir -p $HOME /.kubesudo cp -i /etc/kubernetes/admin.conf $HOME /.kube/configsudo chown $(id -u):$(id -g) $HOME /.kube/config
部署Calico网络插件 这个Calico网络插件部署的操作只能在master上执行
在开始之前,请打开以下calico官方网址,将所有镜像都下载到机器上,不然无法成功
我写文章的时候,最新的calico稳定版是1.30.0,1.31还是beta版本
https://docs.tigera.io/calico/3.30/operations/image-options/alternate-registry#push-calico-images-to-your-registry
在中国的网络下,无法成功下载容器镜像,你可以从互联网上看看有没有免费的容器镜像仓库可以帮你,或者从我的站点上购买一个容器镜像加速器即可
https://www.linuxcenter.cn/k8s/docker-image-speed.html
镜像下载后,用下面这个operator把calico组件给安装上
1 kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/refs/tags/v3.30.0/manifests/tigera-operator.yaml
然后用下面的自定义资源设置一下calico在集群中的网段
1 wget https://raw.githubusercontent.com/projectcalico/calico/refs/tags/v3.30.0/manifests/custom-resources.yaml
1 vim custom-resources.yaml
1 2 3 4 5 6 7 apiVersion: operator.tigera.io/v1 kind: Installation spec: calicoNetwork: ipPools: - name: default-ipv4-ippool cidr: 192.168.0.0/16 #这里换成我们上面规定好的172.16.0.0/16
将自定义资源发布到集群中
1 kubectl apply -f custom-resources.yaml
查询集群组件是否工作正常,正常应该都处于running
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [root@k8s-master ~]# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE calico-apiserver calico-apiserver-b8b9d6bb7-2brsr 1/1 Running 0 14m calico-apiserver calico-apiserver-b8b9d6bb7-zs77c 1/1 Running 0 14m calico-system calico-kube-controllers-657877cd7d-g2cgl 1/1 Running 0 14m calico-system calico-node-nv9qj 1/1 Running 0 14m calico-system calico-typha-6b8cb96766-p5k9l 1/1 Running 0 14m calico-system csi-node-driver-xkdnt 2/2 Running 0 14m calico-system goldmane-7b5b4cd5d9-b9lxx 1/1 Running 0 14m calico-system whisker-cf7ccfdc4-j5vkd 2/2 Running 0 31s kube-system coredns-746c97786-j9bsw 1/1 Running 2 (5m52s ago) 35m kube-system coredns-746c97786-qhg9n 1/1 Running 2 (5m52s ago) 35m kube-system etcd-k8s-master 1/1 Running 2 (5m52s ago) 36m kube-system kube-apiserver-k8s-master 1/1 Running 2 (5m52s ago) 36m kube-system kube-controller-manager-k8s-master 1/1 Running 2 (5m52s ago) 36m kube-system kube-proxy-lg9x5 1/1 Running 2 (5m52s ago) 35m kube-system kube-scheduler-k8s-master 1/1 Running 2 (5m52s ago) 36m tigera-operator tigera-operator-844669ff44-x8g6k 1/1 Running 2 (5m52s ago) 16m
加入Worker节点 加入节点操作需在所有的worker节点完成,这里要注意,Worker节点需要完成以下先决条件才能执行kubeadm join
hosts解析、Containerd部署 Swap 分区关闭 iptables 桥接流量的允许 安装kubeadm等软件 集成Containerd Calico镜像下载 如果时间长忘记了join参数,可以在master节点 上用以下方法重新生成
1 2 3 [root@k8s-master ~]# kubeadm token create --print-join-command kubeadm join 192.168.8.200:6443 --token 8quh20.u8vz730xamy2e2gv --discovery-token-ca-cert-hash sha256:c23e7b2f6db14c9d8d5e7adaed6ec60e08778ef3f421e38cd6c22615d95d50c2
如果有多个CRI对象,在worker节点 上执行以下命令加入节点时,指定CRI对象,案例如下:
1 2 3 kubeadm join 192.168.8.200:6443 --token 8quh20.u8vz730xamy2e2gv \ --discovery-token-ca-cert-hash sha256:c23e7b2f6db14c9d8d5e7adaed6ec60e08778ef3f421e38cd6c22615d95d50c2 \ --cri-socket=unix:///run/containerd/containerd.sock
输出以下内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 [preflight] Running pre-flight checks [preflight] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system" ... [preflight] Use 'kubeadm init phase upload-config --config your-config-file' to re-upload it. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-check] Waiting for a healthy kubelet at http://127.0.0.1:10248/healthz. This can take up to 4m0s [kubelet-check] The kubelet is healthy after 501.800008ms [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
注意上描述命令最后的–cri-socket参数,在系统中部署了docker和cri-docker时,必须明确指明此参数,并将此参数指向我们的cri-docker,不然命令会报告有两个重复的CRI的错误,而我们部署的是containerd,就没这个问题
在master机器上执行以下内容给节点打上worker角色标签
1 2 kubectl label nodes k8s-worker1 node-role.kubernetes.io/worker= kubectl get nodes
1 2 3 4 [root@k8s-master ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master Ready control-plane 11m v1.33.1 k8s-worker1 Ready worker 2m3s v1.33.1
开放worker节点的防火墙
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 sudo firewall-cmd --zone=public --add-protocol=ipip --permanentsudo firewall-cmd --zone=public --add-port=179/tcp --permanentsudo firewall-cmd --zone=public --add-port=4789/udp --permanentsudo firewall-cmd --zone=public --add-port=51820/udp --permanentsudo firewall-cmd --zone=public --add-port=51821/udp --permanentsudo firewall-cmd --zone=public --add-port=5473/tcp --permanentsudo firewall-cmd --zone=public --add-port=443/tcp --permanentsudo firewall-cmd --zone=public --add-port=6443/tcp --permanentsudo firewall-cmd --zone=public --add-port=8080/tcp --permanentsudo firewall-cmd --zone=public --add-port=5443/tcp --permanentsudo firewall-cmd --zone=public --add-port=9090/tcp --permanentsudo firewall-cmd --zone=public --add-port=9081/tcp --permanentsudo firewall-cmd --zone=public --add-port=9900/tcp --permanentsudo firewall-cmd --zone=public --add-port=9200/tcp --permanentsudo firewall-cmd --zone=public --add-port=9443/tcp --permanentsudo firewall-cmd --zone=public --add-port=5444/tcp --permanentsudo firewall-cmd --zone=public --add-port=5601/tcp --permanentsudo firewall-cmd --zone=public --add-port=8444/tcp --permanentsudo firewall-cmd --zone=public --add-port=9443/tcp --permanentsudo firewall-cmd --zone=public --add-port=9449/tcp --permanentsudo firewall-cmd --zone=public --add-port=4790/udp --permanentsudo firewall-cmd --reload
一定要开通端口,不然calico的node pod会报告这些日志
1 2 3 4 2025-05-23 09:26:26.390 [INFO][4264] tunnel-ip-allocator/discovery.go 188: (Re)discovering Typha endpoints using the Kubernetes API... W0523 09:26:26.393216 4264 warnings.go:70] v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice 2025-05-23 09:26:26.393 [INFO][4264] tunnel-ip-allocator/discovery.go 243: Found ready Typha addresses. addresses=[]discovery.Typha{discovery.Typha{Addr:"192.168.8.200:5473" , IP:"192.168.8.200" , NodeName:(*string)(0xc000b31790)}} 2025-05-23 09:26:26.393 [FATAL][4264] tunnel-ip-allocator/startsyncerclient.go 86: Failed to connect to Typha error=failed to load next Typha address to try: tried all available discovered addresses
确认集群状态 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [root@k8s-master ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-master Ready control-plane 25m v1.33.1 k8s-worker1 Ready worker 15m v1.33.1 [root@k8s-master ~]# [root@k8s-master ~]# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE calico-apiserver calico-apiserver-b8b9d6bb7-2brsr 1/1 Running 0 14m calico-apiserver calico-apiserver-b8b9d6bb7-zs77c 1/1 Running 0 14m calico-system calico-kube-controllers-657877cd7d-g2cgl 1/1 Running 0 14m calico-system calico-node-nv9qj 1/1 Running 0 14m calico-system calico-node-asuqj 1/1 Running 0 14m calico-system calico-typha-6b8cb96766-p5k9l 1/1 Running 0 14m calico-system csi-node-driver-xkdnt 2/2 Running 0 14m calico-system csi-node-driver-uffnt 2/2 Running 0 14m calico-system goldmane-7b5b4cd5d9-b9lxx 1/1 Running 0 14m calico-system whisker-cf7ccfdc4-j5vkd 2/2 Running 0 31s kube-system coredns-746c97786-j9bsw 1/1 Running 2 (5m52s ago) 35m kube-system coredns-746c97786-qhg9n 1/1 Running 2 (5m52s ago) 35m kube-system etcd-k8s-master 1/1 Running 2 (5m52s ago) 36m kube-system kube-apiserver-k8s-master 1/1 Running 2 (5m52s ago) 36m kube-system kube-controller-manager-k8s-master 1/1 Running 2 (5m52s ago) 36m kube-system kube-proxy-lg9x5 1/1 Running 2 (5m52s ago) 35m kube-system kube-scheduler-k8s-master 1/1 Running 2 (5m52s ago) 36m tigera-operator tigera-operator-844669ff44-x8g6k 1/1 Running 2 (5m52s ago) 16m
这里的pod必须是全部running
到此我们已经成功部署了k8s集群,恭喜恭喜
重置集群 如果在安装好集群的情况下,想重复练习初始化集群,或者包括初始化集群报错在内的任何原因,想重新初始化集群时,可以用下面的方法重置集群,重置后,集群就会被删除,可以用于重新部署,一般来说,这个命令仅用于k8s-master这个节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 root@k8s-master:~# kubeadm reset --cri-socket=unix:///run/containerd/containerd.sock ... [reset] Are you sure you want to proceed? [y/N]: y ... The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command . If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file.
根据提示,手工完成文件和规则的清理
1 2 3 root@k8s-master:~# rm -rf /etc/cni/net.d root@k8s-master:~# iptables -F root@k8s-master:~# rm -rf $HOME /.kube/config
清理后就可以重新部署集群了