Stworzyłem na aws 3 instancje EC2:
- public ip: 18.196.82.2, private ip: 172.31.18.164
- public ip: 18.195.160.26, private ip: 172.31.29.158
- public ip: 3.64.10.160, private ip: 172.31.20.230
Na podstawie dokumentacji: https://github.com/kubernetes-sigs/kubespray utworzyłem plik inventory/mycluster/hosts.yaml z konfiguracją:
all:
hosts:
node1:
ansible_host: 18.196.82.2
ip: 172.31.18.164
access_ip: 18.196.82.2
node2:
ansible_host: 18.195.160.26
ip: 172.31.29.158
access_ip: 18.195.160.26
node3:
ansible_host: 3.64.10.160
ip: 172.31.20.230
access_ip: 3.64.10.160
children:
kube-master:
hosts:
node1:
kube-node:
hosts:
node1:
node2:
node3:
etcd:
hosts:
node1:
k8s-cluster:
children:
kube-master:
kube-node:
calico-rr:
hosts: {}
Pliku ```cluster.yaml ``` nie zmieniałem
Wywołałem komendę:
```ansible-playbook -i inventory/mycluster/hosts.yaml cluster.yml -e ansible_user=ubuntu -e bootstrap_os=ubuntu -e ansible_ssh_private_key_file=/home/mk/prkey.pem -e cloud_provider=aws -b --become-user=root --flush-cache```
I niestety tworzenie klastra sypie mi się na sprawdzaniu etcd
Błąd jaki otrzymuję:
```TASK [etcd : Configure | Wait for etcd cluster to be healthy] ***************************************************************************************************************************
fatal: [node1]: FAILED! => {"attempts": 4, "changed": false, "cmd": "set -o pipefail && /usr/local/bin/etcdctl endpoint --cluster status && /usr/local/bin/etcdctl endpoint --cluster health 2>&1 | grep -v 'Error: unhealthy cluster' >/dev/null", "delta": "0:00:05.017265", "end": "2020-12-11 01:01:43.378016", "msg": "non-zero return code", "rc": 1, "start": "2020-12-11 01:01:38.360751", "stderr": "{\"level\":\"warn\",\"ts\":\"2020-12-11T01:01:43.376+0100\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-5625f710-d0d6-4953-bcd2-4d1eab01f59b/172.31.18.164:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 172.31.18.164:2379: connect: connection refused\\\"\"}\nError: failed to fetch endpoints from etcd cluster member list: context deadline exceeded", "stderr_lines": ["{\"level\":\"warn\",\"ts\":\"2020-12-11T01:01:43.376+0100\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-5625f710-d0d6-4953-bcd2-4d1eab01f59b/172.31.18.164:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp 172.31.18.164:2379: connect: connection refused\\\"\"}", "Error: failed to fetch endpoints from etcd cluster member list: context deadline exceeded"], "stdout": "", "stdout_lines": []}
Logi z maszyny na której mił być etcd (journalctl -u etcd)
Dec 11 00:27:34 node1 etcd[21795]: 2020-12-10 23:27:34.524977 W | pkg/fileutil: check file permission: directory "/var/lib/etcd" exist, but the permission is "drwxr-xr-x". The recommended permission is "-rwx------" to prevent possible unprivileged access to the data.
Dec 11 00:27:34 node1 etcd[21795]: 2020-12-10 23:27:34.529463 C | etcdmain: --initial-cluster has etcd1=https://172.31.18.164:2380 but missing from --initial-advertise-peer-urls=https://18.196.82.2:2380 ("https://18.196.82.2:2380"(resolved from "https://18.196.82.2:2380") != "https://172.31.18.164:2380"(resolved from "https://172.31.18.164:2380"))
Dec 11 00:27:34 node1 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Dec 11 00:27:34 node1 systemd[1]: etcd.service: Failed with result 'exit-code'.