Run like Hell: Docker-Swarm: One manager, two nodes with Alpine Linux

After creating a Alpine Linux VM inside virtualbox and after adding docker because of the small disk footprint (Alpine Linux: 170MB | with docker: 280MB) i performed the following steps to create a docker swarm:

cloning the vm twice
assigning a static ip to the manager node
create new MACs for the network interface cards on the nodes

Then i followed the tutorial https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/ but without running the docker-machine commands, because i have 3 VMs and do not want to run the node on top of docker.

manager:

alpine:~# docker swarm init --advertise-addr 192.168.178.46
Swarm initialized: current node (wy1z8jxmr1cyupdqgkm6lxhe2) is now a manager.

To add a worker to this swarm, run the following command:

docker swarm join --token SWMTKN-1-3b7f69d3wgty0u68oab8724z07fkyvgc0w8j37ng1l7jsmbghl-0yfr1eu5u66z8pinweisltmci 192.168.178.46:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

nodes

# docker swarm join --token SWMTKN-1-3b7f69d3wgty0u68oab8724z07fkyvgc0w
8j37ng1l7jsmbghl-0yfr1eu5u66z8pinweisltmci 192.168.178.46:2377
This node joined a swarm as a worker.

And then a check on the master:

alpine:~# docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
wy1z8jxmr1cyupdqgkm6lxhe2 *   alpine              Ready               Active              Leader
pusf5o5buetjqrsmx3kzusbyt     node01              Ready               Active
io3z3b6nf8xbzkyzjq6sa7cuc     node02              Ready               Active

Run a first job:

alpine:~# docker service create --replicas 1 --name helloworld alpine ping 192.168.178.1
rsn6igby4f6d7uuy8eny7sbfb
overall progress: 1 out of 1 tasks
1/1: running
verify: Service converged

But on my manager i get no output for "docker ps". But this is, because the service is not running here:

alpine:~# docker service ps helloworld
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
wrrobalt4oe7 helloworld.1 alpine:latest node01 Running Running 2 minutes ago

Node 1 shows:

node01:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
40c5e9b2ffbc alpine:latest "ping 192.168.178.1" 3 minutes ago Up 3 minutes helloworld.1.wrrobalt4oe7mrbhxjlweuxgk

If i do a kill on the ping process, it is immediately restarted:

node01:~# ps aux|grep ping
2457 root       0:00 ping 192.168.178.1
2597 root       0:00 grep ping
node01:~# kill 2597
node01:~# ps aux|grep ping
2457 root       0:00 ping 192.168.178.1
2600 root       0:00 grep ping

A scale up is no problem:

alpine:~# docker service create --replicas 2 --name helloworld alpine ping 192.168.178.1
3lrdqdpjuqml6creswdcqpn2p
overall progress: 2 out of 2 tasks
1/2: running   [==================================================>]
2/2: running   [==================================================>]
verify: Service converged
alpine:~# docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
616scw68s8bv        helloworld.1        alpine:latest       node02              Running             Running 8 seconds ago
n8ovvsw0m4id        helloworld.2        alpine:latest       node01              Running             Running 8 seconds ago

And a shutdown of node02 is no problem:

alpine:~# docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE                ERROR               PORTS
bne2enbkabfo        helloworld.1        alpine:latest       alpine              Ready               Ready 2 seconds ago
616scw68s8bv         \_ helloworld.1    alpine:latest       node02              Shutdown            Running 17 seconds ago
n8ovvsw0m4id        helloworld.2        alpine:latest       node01              Running             Running about a minute ago

After a switchoff of node01 both service are running on the remaining master:

alpine:~# docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE                ERROR               PORTS
bne2enbkabfo        helloworld.1        alpine:latest       alpine              Running             Running about a minute ago
616scw68s8bv         \_ helloworld.1    alpine:latest       node02              Shutdown            Running about a minute ago
pd8dfp4133yw        helloworld.2        alpine:latest       alpine              Running             Running 2 seconds ago
n8ovvsw0m4id         \_ helloworld.2    alpine:latest       node01              Shutdown            Running 2 minutes ago

So failover is working.
But failback does not occur. After switching on node01 again, the service remains on the manager:

alpine:~# docker service ps helloworld
ID                  NAME                IMAGE               NODE                DESIRED STATE       CURRENT STATE               ERROR                         PORTS
bne2enbkabfo        helloworld.1        alpine:latest       alpine              Running             Running 4 minutes ago
616scw68s8bv         \_ helloworld.1    alpine:latest       node02              Shutdown            Running 4 minutes ago
pd8dfp4133yw        helloworld.2        alpine:latest       alpine              Running             Running 2 minutes ago
n8ovvsw0m4id         \_ helloworld.2    alpine:latest       node01              Shutdown            Failed about a minute ago   "task: non-zero exit (255)"
alpine:~# docker node ls
ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
wy1z8jxmr1cyupdqgkm6lxhe2 *   alpine              Ready               Active              Leader
pusf5o5buetjqrsmx3kzusbyt     node01              Ready               Active
io3z3b6nf8xbzkyzjq6sa7cuc     node02              Down                Active

Last thing: How to stop the service?

alpine:~# docker service rm helloworld
helloworld
alpine:~# docker service ps helloworld
no such service: helloworld

Remaining open points:

Is it possible to do a failback or limit the number of a service per node?
How to do this with a server application?
(load balancer needed?)
What happens, if the manager fails / is shutdown?

Run like Hell

Dec 5, 2017

Docker-Swarm: One manager, two nodes with Alpine Linux

2 comments: