How to Gracefully Shut Down Servers with Ansible Without Breaking Your Playbook


One of the most confusing experiences for Ansible users is writing a simple task to shut down a server, only to see the playbook fail with a sea of red error messages.

You issue a shutdown command, the server actually shuts down, but Ansible reports failure. Why?

This article explains the mechanics behind this “successful failure” and provides the proven patterns to handle shutdowns gracefully.

The Problem: “Sawing Off the Branch You Sit On”

By default, Ansible runs tasks synchronously. The flow looks like this:

  1. Ansible SSHs into the server.
  2. Ansible runs the command (e.g., sudo shutdown -h now).
  3. Ansible waits for the command to finish and return an “Exit Code 0” (Success).

The Catch: The shutdown -h now command kills the network networking and SSH services almost instantly. It kills the connection before it has a chance to send that “Exit Code 0” back to your Ansible controller.

Ansible interprets this sudden loss of connection as a network failure, resulting in the dreaded UNREACHABLE or Shared connection closed error.


Method 1: The “Fire and Forget” (Recommended for Immediate Shutdown)

To fix this, we need to instruct Ansible to send the command and disconnect immediately, without waiting to see if it finished. We do this using async and poll.

The Code

- name: Shutdown the host immediately
  shell: sleep 2 && sudo shutdown -h now
  async: 20
  poll: 0
  ignore_errors: yes

Breakdown of Parameters

  1. shell: sleep 2 && ...:
    • We add a tiny delay (sleep 2) before the shutdown command runs. This ensures the SSH session has completely received the command and returned a status to Ansible before the shutdown sequence begins.
  2. async: 20:
    • This sets a maximum runtime for the task (20 second). It tells Ansible: “Run this in the background.”
  3. poll: 0:
    • This is the most important line. Usually, Ansible checks back periodically to see if an async task is done. Setting this to 0 tells Ansible: “Don’t check back. Just assume it started and move on.”
  4. ignore_errors: yes:
    • A safety net in case the command returns a non-zero code during that split second.

Method 2: The “Scheduled” Shutdown (The Cleanest Way)

If you don’t need the machine to go dark this exact second, the standard Linux shutdown timer is much cleaner. It doesn’t require async or poll because the command sets a timer and exits successfully while the server is still running.

The Code

- name: Schedule shutdown in 1 minute
  shell: sudo shutdown -h +1 "System is going down for maintenance"

Why use this?

  • Stability: The shutdown command returns “Success” immediately. SSH remains active. Ansible marks the task as OK and moves on.
  • Safety: The server will shut down 1 minute later. This gives time for logs to write and connections to close cleanly.

Method 3: Handling Already Dead Hosts

Sometimes you run a shutdown playbook against a group of hosts, but some are already offline. By default, Ansible will stop execution for those hosts with an UNREACHABLE error.

To skip hosts that are already down without failing the whole playbook, use ignore_unreachable.

See also: Mastering the Linux Command Line — Your Complete Free Training Guide

The Code

- name: Shutdown host if it is reachable
  shell: sleep 2 && sudo shutdown -h now
  async: 1
  poll: 0
  ignore_unreachable: yes

Note: ignore_unreachable requires Ansible 2.7 or higher.


Advanced: Verifying the Shutdown

If you are automating a complex workflow (e.g., shutdown -> resize VM -> startup), simply firing the shutdown command isn’t enough. You need to wait until the port actually closes to confirm the machine is down.

You can use local_action to check the port from the Ansible controller’s perspective.

The Complete Workflow

- hosts: all
  become: yes
  tasks:
    - name: Trigger shutdown
      shell: sleep 2 && shutdown -h now
      async: 1
      poll: 0

    - name: Wait for the host to actually go down
      local_action:
        module: wait_for
        host: "{{ inventory_hostname }}"
        port: 22
        state: stopped
        timeout: 300
      become: no

How this works:

  1. Trigger: Sends the shutdown command and disconnects (poll: 0).
  2. Wait: The controller (local machine) begins polling port 22 of the target IP.
  3. Success: The task only completes when port 22 stops responding (state: stopped), confirming the server is truly off.

Summary

RequirementRecommended Method
Instant ShutdownUse shell with sleep 2, async: 1, and poll: 0.
Safe/clean ShutdownUse shutdown -h +1 (no async needed).
Host might be offlineAdd ignore_unreachable: yes.
Need confirmationFollow up with wait_for (state: stopped).
David Cao
David Cao

David is a Cloud & DevOps Enthusiast. He has years of experience as a Linux engineer. He had working experience in AMD, EMC. He likes Linux, Python, bash, and more. He is a technical blogger and a Software Engineer. He enjoys sharing his learning and contributing to open-source.

Articles: 592

Leave a Reply

Your email address will not be published. Required fields are marked *