Files
baffle-hub/docs/maxmind.md

358 lines
11 KiB
Markdown

# MaxMind GeoIP Integration
This document describes the MaxMind GeoIP integration implemented in the Baffle Hub WAF analytics system.
## Overview
The Baffle Hub application uses MaxMind's free GeoLite2-Country database to provide geographic location information for IP addresses. The system automatically enriches WAF events with country codes and provides manual lookup capabilities for both IPv4 and IPv6 addresses.
## Features
- **On-demand lookup** - Country code lookup by IP address
- **Automatic enrichment** - Events are enriched with geo-location data during processing
- **Manual lookup capability** - Rake tasks and model methods for manual lookups
- **GeoLite2-Country database** - Uses MaxMind's free country-level database
- **Automatic updates** - Weekly background job updates the database
- **IPv4/IPv6 support** - Full protocol support for both IP versions
- **Performance optimized** - Database caching and efficient lookups
- **Graceful degradation** - Fallback handling when database is unavailable
## Architecture
### Core Components
#### 1. GeoIpService
- Central service for all IP geolocation operations
- Handles database loading from file system
- Provides batch lookup capabilities
- Manages database updates from MaxMind CDN
- Uses MaxMind's built-in metadata for version information
#### 2. UpdateGeoIpDatabaseJob
- Background job for automatic database updates
- Runs weekly to keep the database current
- Simple file-based validation and updates
#### 3. Enhanced Models
- **Event Model** - Automatic geo-location enrichment for WAF events
- **IPv4Range/IPv6Range Models** - Manual lookup methods for IP ranges
#### 4. File-System Management
- Database stored as single file: `db/geoip/GeoLite2-Country.mmdb`
- Version information queried directly from MaxMind database metadata
- No database tables needed - simplified approach
## Installation & Setup
### Dependencies
The integration uses the following gems:
- `maxmind-db` - Official MaxMind database reader (with built-in caching)
- `httparty` - HTTP client for database downloads
### Database Storage
- Location: `db/geoip/GeoLite2-Country.mmdb`
- Automatic creation of storage directory
- File validation and integrity checking
- Version information queried directly from database metadata
- No additional caching needed - MaxMind DB has its own internal caching
### Initial Setup
```bash
# Install dependencies
bundle install
# Download the GeoIP database
rails geoip:update
# Verify installation
rails geoip:status
```
## Configuration
The system is configurable via environment variables or application configuration:
| Variable | Default | Description |
|----------|---------|-------------|
| `MAXMIND_DATABASE_URL` | MaxMind CDN URL | Database download URL |
| `MAXMIND_AUTO_UPDATE` | `true` | Enable automatic weekly updates |
| `MAXMIND_UPDATE_INTERVAL_DAYS` | `7` | Days between update checks |
| `MAXMIND_MAX_AGE_DAYS` | `30` | Maximum database age before forced update |
| Note: MaxMind DB has built-in caching, no additional caching needed |
| `MAXMIND_FALLBACK_COUNTRY` | `nil` | Fallback country when lookup fails |
| `MAXMIND_ENABLE_FALLBACK` | `false` | Enable fallback country usage |
### Example Configuration
```bash
# config/application.rb or .env file
MAXMIND_AUTO_UPDATE=true
MAXMIND_UPDATE_INTERVAL_DAYS=7
MAXMIND_MAX_AGE_DAYS=30
MAXMIND_FALLBACK_COUNTRY=US
MAXMIND_ENABLE_FALLBACK=true
# Note: No caching configuration needed - MaxMind has built-in caching
```
## Usage
### Rake Tasks
#### Database Management
```bash
# Download/update the GeoIP database
rails geoip:update
# Check database status and configuration
rails geoip:status
# Test the implementation with sample IPs
rails geoip:test
# Manual lookup for a specific IP
rails geoip:lookup[8.8.8.8]
rails geoip:lookup[2001:4860:4860::8888]
```
#### Data Management
```bash
# Enrich existing events missing country codes
rails geoip:enrich_missing
# Clean up old inactive database records
rails geoip:cleanup
```
### Ruby API
#### Service-Level Lookups
```ruby
# Direct country lookup
country = GeoIpService.lookup_country('8.8.8.8')
# => "US"
# Batch lookup
countries = GeoIpService.new.lookup_countries(['8.8.8.8', '1.1.1.1'])
# => { "8.8.8.8" => "US", "1.1.1.1" => nil }
# Check database availability
service = GeoIpService.new
service.database_available? # => true/false
service.database_info # => Database metadata
```
#### Event Model Integration
```ruby
# Automatic enrichment during event processing
event = Event.find(123)
event.enrich_geo_location! # Updates event with country code
event.lookup_country # => "US" (with fallback to service)
event.has_geo_data? # => true/false
event.geo_location # => { country_code: "US", city: nil, ... }
# Batch enrichment of existing events
updated_count = Event.enrich_geo_location_batch
puts "Enriched #{updated_count} events with geo data"
```
#### IP Range Model Integration
```ruby
# IPv4 Range lookups
range = Ipv4Range.find(123)
range.geo_lookup_country! # Updates range with country code
range.geo_lookup_country # => "US" (without updating)
range.has_country_info? # => true/false
range.primary_country # => "US" (best available country)
# Class methods
country = Ipv4Range.lookup_country_by_ip('8.8.8.8')
updated_count = Ipv4Range.enrich_missing_geo_data(limit: 1000)
# IPv6 Range lookups (same interface)
country = Ipv6Range.lookup_country_by_ip('2001:4860:4860::8888')
updated_count = Ipv6Range.enrich_missing_geo_data(limit: 1000)
```
### Background Processing
#### Automatic Updates
The system automatically schedules database updates:
```ruby
# Manually trigger an update (usually scheduled automatically)
UpdateGeoIpDatabaseJob.perform_later
# Force update regardless of age
UpdateGeoIpDatabaseJob.perform_later(force_update: true)
```
#### Event Processing Integration
Geo-location enrichment is automatically included in WAF event processing:
```ruby
# This is called automatically in ProcessWafEventJob
event = Event.create_from_waf_payload!(event_id, payload, project)
event.enrich_geo_location! if event.ip_address.present? && event.country_code.blank?
```
## Database Information
### GeoLite2-Country Database
- **Source**: MaxMind GeoLite2-Country (free version)
- **Update Frequency**: Weekly (Tuesdays)
- **Size**: ~9.5 MB
- **Coverage**: Global IP-to-country mapping
- **Format**: MaxMind DB (.mmdb)
### Database Fields
- `country.iso_code` - Two-letter ISO country code
- Supports both IPv4 and IPv6 addresses
- Includes anonymous/proxy detection metadata
## Performance Considerations
### Performance
- MaxMind DB has built-in internal caching optimized for lookups
- Typical lookup time: <1ms
- Database size optimized for fast lookups
- No additional caching layer needed
### Lookup Performance
- Typical lookup time: <1ms
- Database size optimized for fast lookups
- Efficient range queries for IP networks
### Memory Usage
- Database loaded into memory for fast access
- Approximate memory usage: 15-20 MB for the country database
- Automatic cleanup of old database files
## Error Handling
### Graceful Degradation
- Service returns `nil` when database unavailable
- Logging at appropriate levels for different error types
- Event processing continues even if geo-location fails
### Common Error Scenarios
1. **Database Missing** - Automatic download triggered
2. **Database Corrupted** - Automatic re-download attempted
3. **Network Issues** - Graceful fallback with error logging
4. **Invalid IP Address** - Returns `nil` with warning log
## Troubleshooting
### Check System Status
```bash
# Verify database status
rails geoip:status
# Test with known IPs
rails geoip:test
# Check logs for errors
tail -f log/production.log | grep GeoIP
```
### Common Issues
#### Database Not Available
```bash
# Force database update
rails geoip:update
# Check file permissions
ls -la db/geoip/
```
#### Lookup Failures
```bash
# Test specific IPs
rails geoip:lookup[8.8.8.8]
# Check database validity
rails runner "puts GeoIpService.new.database_available?"
```
#### Performance Issues
- Increase cache size in configuration
- Check memory usage on deployment server
- Monitor lookup times with application metrics
## Monitoring & Maintenance
### Health Checks
```ruby
# Rails console health check
service = GeoIpService.new
puts "Database available: #{service.database_available?}"
puts "Database age: #{service.database_record&.age_in_days} days"
```
### Scheduled Maintenance
- Database automatically updated weekly
- Old database files cleaned up after 7 days
- No manual maintenance required
### Monitoring Metrics
Consider monitoring:
- Database update success/failure rates
- Lookup performance (response times)
- Database age and freshness
- Cache hit/miss ratios
## Security & Privacy
### Data Privacy
- No personal data stored in the GeoIP database
- Only country-level information provided
- No tracking or logging of IP lookups by default
### Network Security
- Database downloaded from official MaxMind CDN
- File integrity validated with MD5 checksums
- Secure temporary file handling during updates
## API Reference
### GeoIpService
#### Class Methods
- `lookup_country(ip_address)` - Direct lookup
- `update_database!` - Force database update
#### Instance Methods
- `lookup_country(ip_address)` - Country lookup
- `lookup_countries(ip_addresses)` - Batch lookup
- `database_available?` - Check database status
- `database_info` - Get database metadata
- `update_from_remote!` - Download new database
### Model Methods
#### Event Model
- `enrich_geo_location!` - Update with country code
- `lookup_country` - Get country code (with fallback)
- `has_geo_data?` - Check if geo data exists
- `geo_location` - Get full geo location hash
#### IPv4Range/IPv6Range Models
- `geo_lookup_country!` - Update range with country code
- `geo_lookup_country` - Get country code (without update)
- `has_country_info?` - Check for existing country data
- `primary_country` - Get best available country code
- `lookup_country_by_ip(ip)` - Class method for IP lookup
- `enrich_missing_geo_data(limit:)` - Class method for batch enrichment
## Support & Resources
### MaxMind Documentation
- [MaxMind Developer Site](https://dev.maxmind.com/)
- [GeoLite2 Databases](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data)
- [Database Accuracy](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data#accuracy)
### Ruby Libraries
- [maxmind-db gem](https://github.com/maxmind/MaxMind-DB-Reader-ruby)
- [httparty gem](https://github.com/jnunemaker/httparty)
### Troubleshooting Resources
- Application logs: `log/production.log`
- Rails console for manual testing
- Database status via `rails geoip:status`