StackStorm - 이벤트 기반 자동화 플랫폼
개요
StackStorm(ST2)은 극단적인 네트워크에서 개발한 오픈소스 이벤트 기반 자동화 플랫폼으로, "IFTTT for Ops"라고 불립니다.
핵심 개념
1. Sensors (센서)
- 외부 이벤트 감지 (SNMP Trap, Syslog, Webhook)
- 트리거 발생
2. Triggers (트리거)
- 이벤트 발생 조건
3. Rules (룰)
- IF (트리거) THEN (액션)
4. Actions (액션)
- 실행할 작업
5. Workflows (워크플로우)
- 여러 액션의 조합
설치
# Ubuntu 20.04
curl -sSL https://stackstorm.com/packages/install.sh | bash -s -- --user=st2admin --password=password
# Docker
docker run -it -d --name st2 stackstorm/stackstorm:latest
네트워크 자동화 예제
1. SNMP Trap 처리
Sensor 설정
# /opt/stackstorm/packs/network/sensors/snmp_trap_sensor.yaml
class_name: SNMPTrapSensor
entry_point: snmp_trap_sensor.py
description: Listen for SNMP traps
poll_interval: 1
Python Sensor
# /opt/stackstorm/packs/network/sensors/snmp_trap_sensor.py
from st2reactor.sensor.base import Sensor
class SNMPTrapSensor(Sensor):
def __init__(self, sensor_service, config):
super(SNMPTrapSensor, self).__init__(sensor_service, config)
self._logger = self.sensor_service.get_logger(__name__)
def setup(self):
pass
def run(self):
# SNMP Trap 리스닝 로직
pass
def cleanup(self):
pass
def add_trigger(self, trigger):
pass
def update_trigger(self, trigger):
pass
def remove_trigger(self, trigger):
pass
2. Rule 정의
# /opt/stackstorm/packs/network/rules/interface_down.yaml
name: interface_down_alert
pack: network
description: Interface down 발생 시 자동 복구
trigger:
type: network.snmp_trap
parameters:
oid: 1.3.6.1.2.1.2.2.1.8 # ifOperStatus
criteria:
trigger.status:
type: equals
pattern: down
action:
ref: network.recover_interface
parameters:
device: "{{ trigger.device }}"
interface: "{{ trigger.interface }}"
3. Action 정의
# /opt/stackstorm/packs/network/actions/recover_interface.yaml
name: recover_interface
pack: network
runner_type: python-script
description: 인터페이스 복구 시도
enabled: true
entry_point: actions/recover_interface.py
parameters:
device:
type: string
description: 장비 IP 주소
required: true
interface:
type: string
description: 인터페이스 이름
required: true
# /opt/stackstorm/packs/network/actions/recover_interface.py
from st2common.runners.base_action import Action
from netmiko import ConnectHandler
class RecoverInterface(Action):
def run(self, device, interface):
device_params = {
'device_type': 'cisco_ios',
'host': device,
'username': self.config['username'],
'password': self.config['password']
}
try:
conn = ConnectHandler(**device_params)
# 인터페이스 복구 시도
commands = [
f'interface {interface}',
'no shutdown'
]
output = conn.send_config_set(commands)
conn.save_config()
# Slack 알림
self.action_service.run_action(
'slack.post_message',
{
'message': f'Interface {interface} on {device} recovered',
'channel': '#network-ops'
}
)
return (True, output)
except Exception as e:
return (False, str(e))
4. Workflow (복잡한 자동화)
# /opt/stackstorm/packs/network/actions/workflows/bgp_failover.yaml
version: 1.0
description: BGP 피어 다운 시 자동 페일오버
input:
- device
- peer_ip
tasks:
check_bgp_status:
action: network.check_bgp
input:
device: <% ctx().device %>
peer: <% ctx().peer_ip %>
next:
- when: <% succeeded() and result().output.state = 'down' %>
do: send_alert
send_alert:
action: slack.post_message
input:
channel: "#network-ops"
message: "BGP peer <% ctx().peer_ip %> is down on <% ctx().device %>"
next:
- do: activate_backup_link
activate_backup_link:
action: network.configure_interface
input:
device: <% ctx().device %>
interface: "GigabitEthernet0/2"
commands:
- no shutdown
- description Backup link activated
next:
- when: <% succeeded() %>
do: verify_failover
verify_failover:
action: network.check_bgp
input:
device: <% ctx().device %>
peer: <% ctx().peer_ip %>
next:
- when: <% succeeded() and result().output.state = 'up' %>
publish:
- result: "Failover successful"
- when: <% failed() or result().output.state = 'down' %>
do: escalate
escalate:
action: pagerduty.create_incident
input:
title: "BGP failover failed on <% ctx().device %>"
severity: critical
description: "Manual intervention required"
ChatOps 통합
# Slack에서 명령 실행
!st2 run network.show_version device=router1
# Slack에서 설정 변경
!st2 run network.configure_interface device=router1 interface=GigabitEthernet0/1 description="New description"
API 사용
# 인증
st2 auth st2admin -p password
# Action 실행
st2 run network.show_version device=192.168.1.1
# Workflow 실행
st2 run network.bgp_failover device=router1 peer_ip=10.0.0.2
# 실행 이력 조회
st2 execution list
# 실행 결과 조회
st2 execution get <execution-id>
Pack 구조
network/
├── actions/
│ ├── show_version.yaml
│ ├── show_version.py
│ ├── configure_interface.yaml
│ └── workflows/
│ └── bgp_failover.yaml
├── rules/
│ ├── interface_down.yaml
│ └── bgp_down.yaml
├── sensors/
│ └── snmp_trap_sensor.py
├── config.yaml
└── pack.yaml
실전 사례
1. 자동 설정 백업
# Rule: 매일 자정 설정 백업
name: daily_backup
trigger:
type: core.st2.CronTimer
parameters:
timezone: Asia/Seoul
hour: 0
minute: 0
action:
ref: network.backup_all_devices
2. 장애 자동 복구
# Rule: 링크 다운 시 자동 복구 시도
name: auto_recovery
trigger:
type: network.link_down
action:
ref: network.troubleshoot_link
parameters:
device: "{{ trigger.device }}"
interface: "{{ trigger.interface }}"
auto_fix: true
3. 컴플라이언스 자동 점검
# Workflow: 주간 보안 점검
name: weekly_security_audit
schedule: "0 9 * * MON"
tasks:
- check_ssh_version
- check_snmp_communities
- check_unused_ports
- generate_report
- send_to_slack
장점
✓ 이벤트 기반 자동 대응 ✓ 강력한 워크플로우 엔진 ✓ ChatOps 통합 (Slack, MS Teams) ✓ 160+ 통합 팩 제공 ✓ 웹 UI 제공
단점
✗ 복잡한 설정 ✗ 리소스 사용량 높음 ✗ 네트워크 전용 기능 부족 ✗ 학습 곡선 높음
링크
- 공식 사이트: https://stackstorm.com
- GitHub: https://github.com/StackStorm/st2
- 문서: https://docs.stackstorm.com/