16 KiB
Baffle Hub - Rule Architecture
Overview
Baffle Hub uses a distributed rule system where the Hub generates and manages rules, and Agents download and enforce them locally using optimized SQLite queries. This architecture provides sub-millisecond rule evaluation while maintaining centralized intelligence and control.
Core Principles
- Hub-side Intelligence: Pattern detection and rule generation happens on the Hub
- Agent-side Enforcement: Rule evaluation happens locally on Agents for speed
- Incremental Sync: Agents poll for rule updates using timestamp-based cursors
- Dynamic Backpressure: Hub controls event sampling based on load
- Temporal Rules: Rules can expire automatically (e.g., 24-hour bans)
- Soft Deletes: Rules are disabled, not deleted, for proper sync and audit trail
Rule Types
1. Network Rules (network_v4, network_v6)
Block or allow traffic based on IP address or CIDR ranges.
Use Cases:
- Block scanner IPs (temporary or permanent)
- Block datacenter/VPN/proxy ranges
- Allow trusted IP ranges
- Geographic blocking via IP ranges
Evaluation:
- Most specific CIDR wins (smallest prefix)
/32beats/24beats/16beats/8- Agent uses optimized range queries on
ipv4_ranges/ipv6_rangestables
Example:
{
"id": 12341,
"rule_type": "network_v4",
"action": "deny",
"conditions": { "cidr": "185.220.100.0/22" },
"priority": 22,
"expires_at": "2024-11-04T12:00:00Z",
"enabled": true,
"source": "auto:scanner_detected",
"metadata": {
"reason": "Tor exit node hitting /.env",
"auto_generated": true
}
}
2. Rate Limit Rules (rate_limit)
Control request rate per IP or per CIDR range.
Scopes (Phase 1):
- Global per-IP: Limit requests per IP across all paths
- Per-CIDR: Different limits for different network ranges
Scopes (Phase 2+):
- Per-path per-IP: Different limits for
/api/*,/login, etc.
Evaluation:
- Agent maintains in-memory counters per IP
- Finds most specific CIDR rule for the IP
- Applies that rule's rate limit configuration
- Optional: Persist counters to SQLite for restart resilience
Example (Phase 1):
{
"id": 12342,
"rule_type": "rate_limit",
"action": "rate_limit",
"conditions": {
"cidr": "0.0.0.0/0",
"scope": "global"
},
"priority": 0,
"enabled": true,
"source": "manual",
"metadata": {
"limit": 100,
"window": 60,
"per_ip": true
}
}
Example (Phase 2+):
{
"id": 12343,
"rule_type": "rate_limit",
"action": "rate_limit",
"conditions": {
"cidr": "0.0.0.0/0",
"scope": "per_path",
"path_pattern": "/api/login"
},
"metadata": {
"limit": 5,
"window": 60,
"per_ip": true
}
}
3. Path Pattern Rules (path_pattern)
Detect suspicious path access patterns (mainly for Hub analytics).
Use Cases:
- Detect scanners hitting
/.env,/.git,/wp-admin - Identify bots with suspicious path traversal
- Trigger automatic IP bans when patterns match
Evaluation:
- Agent does lightweight pattern matching
- When matched, sends event to Hub with
matched_pattern: true - Hub analyzes and creates IP block rules if needed
- Agent picks up new IP block rule in next sync (~10s)
Example:
{
"id": 12344,
"rule_type": "path_pattern",
"action": "log",
"conditions": {
"patterns": ["/.env", "/.git/*", "/wp-admin/*", "/.aws/*", "/phpMyAdmin/*"]
},
"enabled": true,
"source": "default:scanner_detection",
"metadata": {
"auto_ban_ip": true,
"ban_duration_hours": 24,
"description": "Common scanner paths"
}
}
Rule Actions
| Action | Description | HTTP Response |
|---|---|---|
allow |
Pass request through | Continue to app |
deny |
Block request | 403 Forbidden |
rate_limit |
Enforce rate limit | 429 Too Many Requests |
redirect |
Redirect to URL | 301/302 + Location header |
challenge |
Show CAPTCHA (Phase 2+) | 403 with challenge |
log |
Log only, don't block | Continue to app |
Rule Priority & Specificity
Network Rules
- Priority is determined by CIDR prefix length
- Smaller prefix (more specific) = higher priority
/32(single IP) beats/24(256 IPs) beats/8(16M IPs)- Example: Block
10.0.0.0/8but allow10.0.1.0/24- Request from
10.0.1.5→ matches/24→ allowed - Request from
10.0.2.5→ matches/8only → blocked
- Request from
Rate Limit Rules
- Most specific CIDR match wins
- Per-path rules take precedence over global (Phase 2+)
Path Pattern Rules
- All patterns are evaluated (not exclusive)
- Used for detection, not blocking
- Multiple pattern matches = stronger signal for ban
Rule Synchronization
Timestamp-Based Cursor
Agents use updated_at timestamps as sync cursors to handle rule updates and deletions.
Why updated_at instead of id?
- Handles rule updates (e.g., disabling a rule updates
updated_at) - Handles rule deletions via
enabled=falseflag - Simple for agents: "give me everything that changed since X"
Agent Sync Flow:
1. Agent starts: last_sync = nil
2. GET /api/:key/rules → Full sync, store latest updated_at
3. Every 10s or 1000 events: GET /api/:key/rules?since=<last_sync>
4. Process rules: add new, update existing, remove disabled
5. Update last_sync to latest updated_at from response
Query Overlap: Hub queries updated_at >= since - 0.5s to handle clock skew and millisecond duplicates.
API Endpoints
1. Version Check (Lightweight)
GET /api/:public_key/rules/version
Response:
{
"version": 1730646645123000,
"count": 150,
"sampling": {
"allowed_requests": 0.5,
"blocked_requests": 1.0,
"rate_limited_requests": 1.0,
"effective_until": "2024-11-03T12:30:55.123Z"
}
}
Timestamp Format: The version field uses microsecond Unix timestamp (e.g., 1730646645123000) for efficient machine comparison. For backward compatibility, the API also accepts ISO8601 timestamps in the since parameter.
2. Incremental Sync
GET /api/:public_key/rules?since=1730646000000000
Response:
{
"version": 1730646645123000,
"sampling": { ... },
"rules": [
{
"id": 12341,
"rule_type": "network_v4",
"action": "deny",
"conditions": { "cidr": "1.2.3.4/32" },
"priority": 32,
"expires_at": "2024-11-04T12:00:00Z",
"enabled": true,
"source": "auto:scanner_detected",
"metadata": { "reason": "Hitting /.env" },
"created_at": "2024-11-03T12:00:00Z",
"updated_at": "2024-11-03T12:00:00Z"
},
{
"id": 12340,
"rule_type": "network_v4",
"action": "deny",
"conditions": { "cidr": "5.6.7.8/32" },
"priority": 32,
"enabled": false,
"source": "manual",
"metadata": { "reason": "False positive" },
"created_at": "2024-11-02T10:00:00Z",
"updated_at": "2024-11-03T12:25:00Z"
}
]
}
3. Full Sync
GET /api/:public_key/rules
Response:
{
"version": 1730646645123000,
"sampling": { ... },
"rules": [ ...all enabled rules... ]
}
Dynamic Event Sampling
Hub controls how many events Agents send based on load.
Sampling Strategy
Hub monitors:
- SolidQueue job depth
- Events/second rate
- Database write latency
Sampling rates:
Queue Depth | Allowed | Blocked | Rate Limited
----------------|---------|---------|-------------
0-1,000 | 100% | 100% | 100%
1,001-5,000 | 50% | 100% | 100%
5,001-10,000 | 20% | 100% | 100%
10,001+ | 5% | 100% | 100%
Phase 2+: Path-based sampling:
{
"sampling": {
"allowed_requests": 0.1,
"blocked_requests": 1.0,
"paths": {
"block": ["/.env", "/.git/*"],
"allow": ["/health", "/metrics"]
}
}
}
Agent respects sampling:
- Always sends blocked/rate-limited events
- Samples allowed events based on rate
- Can prioritize suspicious paths over routine traffic
Temporal Rules (Expiration)
Rules can have an expires_at timestamp for automatic expiration.
Use Cases:
- 24-hour scanner bans
- Temporary rate limit adjustments
- Time-boxed maintenance blocks
Cleanup:
ExpiredRulesCleanupJobruns hourly- Disables rules where
expires_at < now - Agent picks up disabled rules in next sync
Example:
# Hub auto-generates rule when scanner detected:
Rule.create!(
rule_type: "network_v4",
action: "deny",
conditions: { cidr: "1.2.3.4/32" },
expires_at: 24.hours.from_now,
source: "auto:scanner_detected",
metadata: { reason: "Hit /.env 5 times in 10 seconds" }
)
# 24 hours later: ExpiredRulesCleanupJob disables it
# Agent syncs and removes from ipv4_ranges table
Rule Sources
The source field tracks rule origin for audit and filtering.
Source Formats:
manual- Created by user via UIauto:scanner_detected- Auto-generated from scanner patternauto:rate_limit_exceeded- Auto-generated from rate limit abuseauto:bot_detected- Auto-generated from bot behaviorimported:fail2ban- Imported from external sourceimported:crowdsec- Imported from CrowdSecdefault:scanner_paths- Default rule set
Database Schema
Hub Schema
create_table "rules" do |t|
# Identification
t.integer :id, primary_key: true
t.string :source, limit: 100
# Rule definition
t.string :rule_type, null: false
t.string :action, null: false
t.json :conditions, null: false
t.json :metadata
# Priority & lifecycle
t.integer :priority
t.datetime :expires_at
t.boolean :enabled, default: true, null: false
# Timestamps (updated_at is sync cursor!)
t.timestamps
# Indexes
t.index [:updated_at, :id] # Primary sync query
t.index :enabled
t.index :expires_at
t.index :source
t.index :rule_type
end
Agent Schema (Existing)
create_table "ipv4_ranges" do |t|
t.integer :network_start, limit: 8, null: false
t.integer :network_end, limit: 8, null: false
t.integer :network_prefix, null: false
t.integer :waf_action, default: 0, null: false
t.integer :priority, default: 100
t.string :redirect_url, limit: 500
t.integer :redirect_status
t.string :source, limit: 50
t.timestamps
t.index [:network_start, :network_end, :network_prefix]
t.index :waf_action
end
create_table "ipv6_ranges" do |t|
t.binary :network_start, limit: 16, null: false
t.binary :network_end, limit: 16, null: false
t.integer :network_prefix, null: false
t.integer :waf_action, default: 0, null: false
t.integer :priority, default: 100
t.string :redirect_url, limit: 500
t.integer :redirect_status
t.string :source, limit: 50
t.timestamps
t.index [:network_start, :network_end, :network_prefix]
t.index :waf_action
end
Agent Rule Processing
Network Rules
# Agent receives network rule from Hub:
rule = {
id: 12341,
rule_type: "network_v4",
action: "deny",
conditions: { cidr: "10.0.0.0/8" },
priority: 8,
enabled: true
}
# Agent converts to ipv4_ranges entry:
cidr = IPAddr.new("10.0.0.0/8")
Ipv4Range.upsert({
source: "hub:12341",
network_start: cidr.to_i,
network_end: cidr.to_range.end.to_i,
network_prefix: 8,
waf_action: 0, # deny
priority: 8
}, unique_by: :source)
# Agent evaluates request:
# SELECT * FROM ipv4_ranges
# WHERE ? BETWEEN network_start AND network_end
# ORDER BY network_prefix DESC
# LIMIT 1
Rate Limit Rules
# Agent stores in memory:
@rate_limit_rules = {
"global" => { limit: 100, window: 60, cidr: "0.0.0.0/0" }
}
@rate_counters = {
"1.2.3.4" => { count: 50, window_start: Time.now }
}
# On each request:
def check_rate_limit(ip)
rule = find_most_specific_rate_limit_rule(ip)
counter = @rate_counters[ip] ||= { count: 0, window_start: Time.now }
# Reset window if expired
if Time.now - counter[:window_start] > rule[:window]
counter = { count: 0, window_start: Time.now }
end
counter[:count] += 1
if counter[:count] > rule[:limit]
{ action: "rate_limit", status: 429 }
else
{ action: "allow" }
end
end
Path Pattern Rules
# Agent evaluates patterns:
PATH_PATTERNS = [/.env$/, /.git/, /wp-admin/]
def check_path_patterns(path)
matched = PATH_PATTERNS.any? { |pattern| path.match?(pattern) }
if matched
# Send event to Hub with flag
send_event_to_hub(
path: path,
matched_pattern: true,
waf_action: "log" # Don't block yet
)
# Hub will analyze and create IP block rule if needed
end
end
Hub Intelligence (Auto-Generation)
Scanner Detection
# PathScannerDetectorJob
class PathScannerDetectorJob < ApplicationJob
SCANNER_PATHS = %w[/.env /.git /wp-admin /phpMyAdmin /.aws]
def perform
# Find IPs hitting scanner paths
scanner_ips = Event
.where("request_path IN (?)", SCANNER_PATHS)
.where("timestamp > ?", 5.minutes.ago)
.group(:ip_address)
.having("COUNT(*) >= 3")
.pluck(:ip_address)
scanner_ips.each do |ip|
# Create 24h ban rule
Rule.create!(
rule_type: "network_v4",
action: "deny",
conditions: { cidr: "#{ip}/32" },
priority: 32,
expires_at: 24.hours.from_now,
source: "auto:scanner_detected",
metadata: {
reason: "Hit #{SCANNER_PATHS.join(', ')}",
auto_generated: true
}
)
end
end
end
Rate Limit Abuse Detection
# RateLimitAnomalyJob
class RateLimitAnomalyJob < ApplicationJob
def perform
# Find IPs exceeding normal rate
abusive_ips = Event
.where("timestamp > ?", 1.minute.ago)
.group(:ip_address)
.having("COUNT(*) > 200") # >200 req/min
.pluck(:ip_address)
abusive_ips.each do |ip|
# Create aggressive rate limit or block
Rule.create!(
rule_type: "rate_limit",
action: "rate_limit",
conditions: { cidr: "#{ip}/32", scope: "global" },
priority: 32,
expires_at: 1.hour.from_now,
source: "auto:rate_limit_exceeded",
metadata: {
limit: 10,
window: 60,
per_ip: true
}
)
end
end
end
Performance Characteristics
Hub
- Rule query: O(log n) with
(updated_at, id)index - Version check: Single index lookup
- Rule generation: Background jobs, no request impact
Agent
- Network rule lookup: O(log n) via B-tree index on
(network_start, network_end) - Rate limit check: O(1) hash lookup in memory
- Path pattern check: O(n) regex match (n = number of patterns)
- Overall request evaluation: <1ms for typical case
Sync Efficiency
- Incremental sync: Only changed rules since last sync
- Typical sync payload: <10 KB for 50 rules
- Sync frequency: Every 10s or 1000 events
- Version check: <1 KB response
Future Enhancements (Phase 2+)
Per-Path Rate Limiting
- Different limits for
/api/*,/login,/admin - Agent tracks multiple counters per IP
Path-Based Event Sampling
- Send all
/adminrequests - Skip
/health,/metrics - Sample 10% of regular traffic
Challenge Actions
- CAPTCHA challenges for suspicious IPs
- JavaScript challenges for bot detection
Scheduled Rules
- Block during maintenance windows
- Time-of-day rate limits
Multi-Project Rules (Phase 10+)
- Global rules across all projects
- Per-project rule overrides
Summary
The Baffle Hub rule system provides:
- Fast local enforcement (sub-millisecond)
- Centralized intelligence (Hub analytics)
- Efficient synchronization (timestamp-based incremental sync)
- Dynamic adaptation (backpressure control via sampling)
- Temporal flexibility (auto-expiring rules)
- Audit trail (soft deletes, source tracking)
This architecture scales from single-server deployments to distributed multi-agent installations while maintaining simplicity and pragmatic design choices focused on the "low-hanging fruit" of WAF functionality.