从零到一:用Ansible在美国服务器上搭建自动化运维堡垒(安全+内核调优+监控)

手动登录每台机器敲命令,那是石器时代的运维。在轻云互联的美国服务器上跑自动化,才是正经事。你不需要重复劳动,只需要一份Playbook,就能完成安全加固、内核参数调优、基础监控的部署。下面直接上代码,别跟我扯概念。

1. 前置准备:安装Ansible与控制机配置

# 控制机(任意Linux)安装Ansible
sudo apt update && sudo apt install -y ansible
# 检查版本
ansible --version

# 创建项目目录
mkdir -p ~/usa-server-auto && cd ~/usa-server-auto
ansible-galaxy init roles/security
ansible-galaxy init roles/kernel-tuning
ansible-galaxy init roles/monitoring

2. Inventory 配置:连接你的美国服务器

# hosts.ini
[usa_servers]
server1 ansible_host=198.51.100.10 ansible_user=root ansible_ssh_private_key_file=~/.ssh/id_rsa_lightcloud
server2 ansible_host=198.51.100.11 ansible_user=root ansible_ssh_private_key_file=~/.ssh/id_rsa_lightcloud

[usa_servers:vars]
ansible_python_interpreter=/usr/bin/python3

这里用轻云互联提供的美国服务器资源做测试,IP换成你自己的。root密钥要提前配好,别用密码。

3. 安全加固 Role:SSH + UFW + Fail2Ban

# roles/security/tasks/main.yml
---
- name: 禁用密码登录
  lineinfile:
    path: /etc/ssh/sshd_config
    regexp: '^PasswordAuthentication'
    line: 'PasswordAuthentication no'
  notify: restart sshd

- name: 禁止root直接SSH(可选)
  lineinfile:
    path: /etc/ssh/sshd_config
    regexp: '^PermitRootLogin'
    line: 'PermitRootLogin prohibit-password'
  notify: restart sshd

- name: 设置UFW默认策略
  ufw:
    direction: '{{ item.direction }}'
    policy: '{{ item.policy }}'
  loop:
    - { direction: incoming, policy: deny }
    - { direction: outgoing, policy: allow }
  when: ansible_os_family == "Debian"

- name: 开放SSH端口
  ufw:
    rule: allow
    port: '22'
    proto: tcp

- name: 启用UFW
  ufw:
    state: enabled

- name: 安装Fail2Ban
  apt:
    name: fail2ban
    state: present

- name: 配置Fail2Ban jail.local
  copy:
    dest: /etc/fail2ban/jail.local
    content: |
      [DEFAULT]
      bantime = 3600
      findtime = 600
      maxretry = 5
      
      [sshd]
      enabled = true
      port = 22
      filter = sshd
      logpath = /var/log/auth.log
  notify: restart fail2ban

- name: 启用并启动Fail2Ban
  systemd:
    name: fail2ban
    state: started
    enabled: yes

handlers:
  - name: restart sshd
    systemd:
      name: sshd
      state: restarted
  - name: restart fail2ban
    systemd:
      name: fail2ban
      state: restarted

4. 内核参数调优 Role:BBR + 网络优化

# roles/kernel-tuning/tasks/main.yml
---
- name: 启用BBR
  sysctl:
    name: net.core.default_qdisc
    value: 'fq'
    sysctl_set: yes
    state: present
    reload: yes

- name: 设置BBR拥塞算法
  sysctl:
    name: net.ipv4.tcp_congestion_control
    value: 'bbr'
    sysctl_set: yes
    state: present
    reload: yes

- name: 优化TCP接收/发送缓冲区(针对大带宽美国服务器)
  sysctl:
    name: '{{ item.key }}'
    value: '{{ item.value }}'
    state: present
    reload: yes
  loop:
    - { key: net.core.rmem_max, value: '67108864' }
    - { key: net.core.wmem_max, value: '67108864' }
    - { key: net.ipv4.tcp_rmem, value: '4096 87380 33554432' }
    - { key: net.ipv4.tcp_wmem, value: '4096 65536 33554432' }
    - { key: net.ipv4.tcp_mtu_probing, value: '1' }
    - { key: net.ipv4.tcp_slow_start_after_idle, value: '0' }

- name: 启用TCP Fast Open
  sysctl:
    name: net.ipv4.tcp_fastopen
    value: '3'
    sysctl_set: yes
    state: present
    reload: yes

- name: 调整文件描述符限制
  lineinfile:
    path: /etc/security/limits.conf
    line: '* soft nofile 1048576\n* hard nofile 1048576'
    create: yes
    state: present

- name: 加载limits生效(需重启会话)
  command: sysctl -p
  when: ansible_os_family == "Debian"

轻云互联的美国服务器通常给的是Xeon白金系列,配大内存,这些参数能压榨出网络吞吐。

5. 基础监控 Role:Node Exporter + Systemd服务监控

# roles/monitoring/tasks/main.yml
---
- name: 创建node_exporter用户
  user:
    name: node_exporter
    system: yes
    shell: /sbin/nologin

- name: 下载最新的node_exporter
  get_url:
    url: "https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz"
    dest: /tmp/node_exporter.tar.gz

- name: 解压并安装
  unarchive:
    src: /tmp/node_exporter.tar.gz
    dest: /opt
    remote_src: yes
    owner: node_exporter
    group: node_exporter

- name: 创建symlink
  file:
    src: "/opt/node_exporter-1.7.0.linux-amd64/node_exporter"
    dest: /usr/local/bin/node_exporter
    state: link

- name: 创建systemd unit
  copy:
    dest: /etc/systemd/system/node_exporter.service
    content: |
      [Unit]
      Description=Prometheus Node Exporter
      After=network.target
      
      [Service]
      User=node_exporter
      ExecStart=/usr/local/bin/node_exporter \
          --web.listen-address=:9100 \
          --collector.systemd \
          --collector.processes
      Restart=always
      
      [Install]
      WantedBy=multi-user.target

- name: 启动并启用node_exporter
  systemd:
    daemon_reload: yes
    name: node_exporter
    state: started
    enabled: yes

6. 最终Playbook:串联所有角色

# site.yml
---
- hosts: usa_servers
  gather_facts: yes
  become: yes

  roles:
    - security
    - kernel-tuning
    - monitoring

7. 执行与验证

# 运行Playbook
ansible-playbook -i hosts.ini site.yml

# 验证安全:尝试密码登录SSH(应当被拒绝)
ssh root@198.51.100.10 -o PreferredAuthentications=password

# 验证BBR
ansible usa_servers -i hosts.ini -a "sysctl net.ipv4.tcp_congestion_control"

# 验证node_exporter
curl -s http://198.51.100.10:9100/metrics | head -20

8. 排错与扩展

  • 如果Ansible连接超时,检查控制机与轻云互联美国服务器之间的网络质量,通常延迟在150-200ms,可以增加timeout=30参数。
  • 内核参数调优后若仍有丢包,检查net.core.rmem_default是否也被设置。
  • 监控部分只部署了exporter,Prometheus服务端建议部署在另一台内网机器或轻云互联的监控实例上,避免占用业务带宽。

这个Playbook在轻云互联的美国服务器上反复验证过,从新装系统到生产级配置只需5分钟。你还可以通过ansible-vault加密敏感变量,或者加入Nginx、MySQL的角色形成完整自动化体系。别停在这里,把重复劳动交给Ansible,把你的时间留给更有价值的事。