Docker守护进程连接错误
Reverse Lv4

问题描述

在使用 docker 时,发现 docker 守护进程出现连接错误,如下:

1
2
3
[root@localhost ~]# docker restart -t1 tools-nginx-1
Error response from daemon: Cannot restart container tools-nginx-1: connection error: desc = "transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused": unavailable
[root@localhost ~]#

尝试解决

先检查了一下containerd的状态,看了下是异常了,目前退出代码为2(status=2)

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@localhost ~]# systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/usr/local/lib/systemd/system/containerd.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-12-22 18:36:42 CST; 3s ago
Docs: https://containerd.io
Process: 48729 ExecStart=/usr/local/bin/containerd (code=exited, status=2)
Process: 48727 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 48729 (code=exited, status=2)

Dec 22 18:36:42 localhost systemd[1]: containerd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 22 18:36:42 localhost systemd[1]: Unit containerd.service entered failed state.
Dec 22 18:36:42 localhost systemd[1]: containerd.service failed.
[root@localhost ~]#

尝试重启一下containerd服务:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[root@localhost ~]# systemctl restart containerd
[root@localhost ~]# systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/usr/local/lib/systemd/system/containerd.service; disabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Sun 2024-12-22 18:37:32 CST; 1s ago
Docs: https://containerd.io
Process: 49241 ExecStart=/usr/local/bin/containerd (code=exited, status=2)
Process: 49239 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 49241 (code=exited, status=2)

Dec 22 18:37:32 localhost containerd[49241]: github.com/containerd/containerd/gc/scheduler.(*gcScheduler).run(0xc00078ec00, 0x5636f1cb28b0, 0xc000040038)
Dec 22 18:37:32 localhost containerd[49241]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/g...0 +0x4bf
Dec 22 18:37:32 localhost containerd[49241]: created by github.com/containerd/containerd/gc/scheduler.init.0.func1
Dec 22 18:37:32 localhost containerd[49241]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/g...2 +0x445
Dec 22 18:37:32 localhost systemd[1]: Unit containerd.service entered failed state.
Dec 22 18:37:32 localhost systemd[1]: containerd.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
[root@localhost ~]#

但是并没有卵用,还是没起来,尝试下其他办法,查看错误日志,如下:

1
journalctl -u containerd -n 50 --no-pager
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- Logs begin at Sun 2024-12-22 17:12:15 CST, end at Sun 2024-12-22 18:40:57 CST. --
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.241193631+08:00" level=info msg="loading plugin \"io.containerd.runtime.v1.linux\"..." type=io.containerd.runtime.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.241259832+08:00" level=info msg="loading plugin \"io.containerd.runtime.v2.task\"..." type=io.containerd.runtime.v2
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.241334756+08:00" level=info msg="loading plugin \"io.containerd.monitor.v1.cgroups\"..." type=io.containerd.monitor.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242006839+08:00" level=info msg="loading plugin \"io.containerd.service.v1.tasks-service\"..." type=io.containerd.service.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242068886+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.restart\"..." type=io.containerd.internal.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242163233+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.containers\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242191794+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.content\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242212102+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.diff\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242231735+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.events\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242253326+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.healthcheck\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242273932+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.images\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242296738+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.leases\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242321583+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.namespaces\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242340220+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.opt\"..." type=io.containerd.internal.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242390209+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.snapshots\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242431370+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.tasks\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242456434+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242489898+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.introspection\"..." type=io.containerd.grpc.v1
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242827737+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.242932534+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
Dec 22 18:40:57 localhost containerd[50977]: time="2024-12-22T18:40:57.243027297+08:00" level=info msg="containerd successfully booted in 0.043640s"
Dec 22 18:40:57 localhost containerd[50977]: panic: invalid page type: 2: 10 // [!code highlight]
Dec 22 18:40:57 localhost containerd[50977]: goroutine 114 [running]:
Dec 22 18:40:57 localhost containerd[50977]: go.etcd.io/bbolt.(*Cursor).search(0xc00026b080, 0x561b6f733918, 0x6, 0x6, 0x2)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/cursor.go:250 +0x345
Dec 22 18:40:57 localhost containerd[50977]: go.etcd.io/bbolt.(*Cursor).seek(0xc00026b080, 0x561b6f733918, 0x6, 0x6, 0xc00033c200, 0x4, 0x4, 0x7fd8a035d034, 0x10, 0x10, ...)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/cursor.go:159 +0x7f
Dec 22 18:40:57 localhost containerd[50977]: go.etcd.io/bbolt.(*Bucket).Bucket(0xc00033c200, 0x561b6f733918, 0x6, 0x6, 0xc00033c200)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/bucket.go:105 +0xda
Dec 22 18:40:57 localhost containerd[50977]: github.com/containerd/containerd/metadata.scanRoots.func2(0x7fd8a035d030, 0x4, 0x4, 0x0, 0x0, 0x0, 0x0, 0x561b6d685131)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/metadata/gc.go:101 +0xf1
Dec 22 18:40:57 localhost containerd[50977]: go.etcd.io/bbolt.(*Bucket).ForEach(0xc00033c180, 0xc00026b6e8, 0x6, 0x6)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/bucket.go:390 +0x104
Dec 22 18:40:57 localhost containerd[50977]: github.com/containerd/containerd/metadata.scanRoots(0x561b6edc8878, 0xc00033c000, 0xc00043c000, 0xc00013c840, 0xc0005ac000, 0x0)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/metadata/gc.go:94 +0x875
Dec 22 18:40:57 localhost containerd[50977]: github.com/containerd/containerd/metadata.(*DB).getMarked.func1(0xc00043c000, 0x0, 0x0)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/metadata/db.go:384 +0x16e
Dec 22 18:40:57 localhost containerd[50977]: go.etcd.io/bbolt.(*DB).View(0xc00020e400, 0xc0001d2890, 0x0, 0x0)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/vendor/go.etcd.io/bbolt/db.go:725 +0x93
Dec 22 18:40:57 localhost containerd[50977]: github.com/containerd/containerd/metadata.(*DB).getMarked(0xc0003170a0, 0x561b6edc88b0, 0xc000040038, 0x203000, 0x203000, 0x0)
Dec 22 18:40:57 localhost systemd[1]: containerd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/metadata/db.go:367 +0x7e
Dec 22 18:40:57 localhost containerd[50977]: github.com/containerd/containerd/metadata.(*DB).GarbageCollect(0xc0003170a0, 0x561b6edc88b0, 0xc000040038, 0x0, 0x3, 0x1, 0x2)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/metadata/db.go:284 +0xa5
Dec 22 18:40:57 localhost containerd[50977]: github.com/containerd/containerd/gc/scheduler.(*gcScheduler).run(0xc000131020, 0x561b6edc88b0, 0xc000040038)
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:310 +0x4bf
Dec 22 18:40:57 localhost containerd[50977]: created by github.com/containerd/containerd/gc/scheduler.init.0.func1
Dec 22 18:40:57 localhost containerd[50977]: /home/runner/work/containerd-nightlies/containerd-nightlies/src/github.com/containerd/containerd/gc/scheduler/scheduler.go:132 +0x445
Dec 22 18:40:57 localhost systemd[1]: Unit containerd.service entered failed state.
Dec 22 18:40:57 localhost systemd[1]: containerd.service failed.
You have mail in /var/spool/mail/root

确定故障原因

上面看到有一个错误,panic: invalid page type: 2: 10应该是由于数据库文件损坏导致的。

那么可以尝试一下删除数据库文件并且重建后再启动containerd试试。

:::danger
请注意,删除之前建议先做个备份
:::

1
cp -r /var/lib/containerd /var/lib/containerd.bak

然后直接删除数据库文件:

1
rm -rf /var/lib/containerd/*

然后我们重启下containerd服务:

1
systemctl restart containerd

重启完成后,再查看一下containerd服务状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@localhost ~]# systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/usr/local/lib/systemd/system/containerd.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2024-12-22 18:47:08 CST; 4s ago
Docs: https://containerd.io
Process: 54168 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 54172 (containerd)
Tasks: 22
Memory: 25.0M
CGroup: /system.slice/containerd.service
└─54172 /usr/local/bin/containerd

Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314169308+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.....grpc.v1
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314183237+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.....grpc.v1
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314197462+08:00" level=info msg="loading plugin \"io.containerd.internal...ernal.v1
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314239313+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.....grpc.v1
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314260981+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.....grpc.v1
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314276778+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.....grpc.v1
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314292198+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.....grpc.v1
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314520493+08:00" level=info msg=serving... address=/run/containerd/conta...ck.ttrpc
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314560935+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
Dec 22 18:47:08 localhost containerd[54172]: time="2024-12-22T18:47:08.314621503+08:00" level=info msg="containerd successfully booted in 0.039119s"
Hint: Some lines were ellipsized, use -l to show in full.
[root@localhost ~]#

可以看到containerd服务已经正常启动了。

验证解决方案

我们再试试重启容器:

1
2
3
4
5
6
[root@localhost ~]# docker restart -t1 tools-nginx-1
tools-nginx-1
[root@localhost ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0b68949ab785 xxxxxxxxxxxxxx "tini -- sh -c 'bash…" 11 minutes ago Up 8 seconds tools-nginx-1
[root@localhost ~]# docker ps -a

可以看到容器已经重启成功,那么至此 containerd 服务异常的问题就解决了,得出结论,数据库文件损坏导致的containerd服务异常,可以尝试删除数据库文件,重建后再启动containerd服务。