[이제와서 시작하는 Metabase 마스터하기 #13] 멀티 데이터소스 - 여러 DB 활용
[이제와서 시작하는 Metabase 마스터하기 #13] 멀티 데이터소스 - 여러 DB 활용
학습 목표
이 포스트를 마치면 다음을 할 수 있습니다:
- 여러 데이터베이스 연결 및 관리
- Cross-database joins로 서로 다른 DB 데이터 결합
- 데이터 통합 전략 수립
- 성능과 보안 고려사항 이해
- 실전 멀티 소스 분석 구현
멀티 데이터소스 개요
왜 필요한가?
graph TD
A[회사 데이터] --> B[PostgreSQL]
A --> C[MySQL]
A --> D[MongoDB]
A --> E[Google Sheets]
B --> B1[Transactional DB]
C --> C1[Legacy System]
D --> D1[User Events]
E --> E1[Manual Data]
F[Metabase] --> B
F --> C
F --> D
F --> E
F --> G[통합 분석]
일반적인 시나리오:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Production DB (PostgreSQL):
- Orders
- Customers
- Products
Analytics DB (BigQuery):
- Web analytics
- User behavior
- Campaign data
CRM (MySQL):
- Sales leads
- Customer interactions
Spreadsheets (Google Sheets):
- Manual forecasts
- Budget data
데이터베이스 연결
지원되는 데이터베이스
Commercial:
- PostgreSQL
- MySQL / MariaDB
- Microsoft SQL Server
- Oracle
- IBM DB2
Cloud:
- Amazon RDS / Aurora
- Google BigQuery
- Snowflake
- Amazon Redshift
- Azure SQL
NoSQL:
- MongoDB
- DynamoDB (via SQL)
Others:
- Google Sheets
- SQLite
- Presto / Trino
- Apache Druid
데이터베이스 추가
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Admin > Databases > Add database
Connection details:
Name: Production PostgreSQL
Type: PostgreSQL
Host: db.company.com
Port: 5432
Database name: production
Username: metabase_readonly
Password: ••••••••
Advanced options:
☑ Use SSL
☑ Tunnel through SSH
Schema: public
Test connection → Save
연결 전략
읽기 전용 사용자 생성:
1
2
3
4
5
6
7
8
9
-- PostgreSQL
CREATE USER metabase_readonly WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE production TO metabase_readonly;
GRANT USAGE ON SCHEMA public TO metabase_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO metabase_readonly;
-- 새 테이블에도 자동 권한
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO metabase_readonly;
Read Replica 사용:
1
2
3
4
5
6
7
8
9
10
Production Master (writes):
app.company.com:5432
Read Replica (analytics):
analytics.company.com:5432 ← Metabase 연결
장점:
- Production DB에 부하 없음
- 복잡한 쿼리 실행 가능
- 분석 전용 최적화
Models로 데이터 통합
개념
Models를 사용하여 서로 다른 DB의 데이터를 Metabase 내에서 결합
graph LR
A[PostgreSQL Orders] --> C[Model: Order Details]
B[MySQL CRM] --> C
C --> D[Dashboard]
실전 예제: 주문 + CRM 통합
Step 1: PostgreSQL Model (Orders)
1
2
3
4
5
6
7
8
9
10
11
12
-- Model: "Order Base"
-- Database: Production PostgreSQL
SELECT
id as order_id,
created_at as order_date,
total as order_total,
customer_id,
status
FROM orders
WHERE created_at >= '2024-01-01'
AND status = 'completed'
Step 2: MySQL Model (CRM Data)
1
2
3
4
5
6
7
8
9
10
11
-- Model: "Customer CRM"
-- Database: CRM MySQL
SELECT
id as customer_id,
account_owner as sales_rep,
customer_tier,
industry,
annual_contract_value as acv
FROM accounts
WHERE is_active = true
Step 3: Question에서 Models 결합
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
New Question > Pick data
Step 1: Select "Order Base" model
Step 2: Join with "Customer CRM" model
Join type: Left join
Order Base.customer_id = Customer CRM.customer_id
Step 3: Select fields:
- Order date
- Order total
- Sales rep
- Customer tier
- Industry
Step 4: Summarize:
Sum of Order total
Group by: Sales rep
Result:
영업 담당자별 매출 (Production DB + CRM DB 통합)
Cross-Database Limitations
Metabase의 제약:
1
2
3
4
5
6
7
8
9
✅ 가능:
- GUI builder로 Models 간 join
- 각 DB에서 Model 생성 후 결합
- Dashboard 레벨에서 통합 뷰
❌ 불가능:
- Native SQL에서 직접 cross-database join
(SELECT * FROM pg.orders JOIN mysql.customers ...)
- 매우 복잡한 multi-step transformations
해결 방법:
1
2
3
4
5
6
7
8
9
10
11
Option 1: Models + GUI builder
- 각 DB에서 Model 생성
- GUI로 결합
Option 2: ETL to single DB
- DBT, Airflow 등으로 데이터 이동
- 단일 DB에서 분석
Option 3: Data warehouse
- Snowflake, BigQuery 등에 모든 데이터 집중
- Metabase는 warehouse에만 연결
통합 분석 패턴
패턴 1: 주문 + 웹 분석
시나리오:
1
2
3
4
5
6
7
8
주문 DB (PostgreSQL):
- Orders table
웹 분석 (BigQuery):
- GA4 events
목표:
광고 클릭부터 구매까지 전체 퍼널 분석
구현:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Model 1: Orders (PostgreSQL)
CREATE MODEL orders_with_ref AS
SELECT
id,
created_at,
total,
referral_source, -- utm_source
campaign_id -- utm_campaign
FROM orders
-- Model 2: Campaign Clicks (BigQuery)
CREATE MODEL campaign_clicks AS
SELECT
campaign_id,
DATE(event_timestamp) as click_date,
COUNT(*) as clicks,
SUM(cost) as ad_spend
FROM ga4_events
WHERE event_name = 'campaign_click'
GROUP BY campaign_id, DATE(event_timestamp)
-- Question: ROI by Campaign
Data: orders_with_ref
Join: campaign_clicks ON campaign_id
Metrics:
- Total revenue (from orders)
- Total ad spend (from clicks)
- ROI = (Revenue - Spend) / Spend
Group by: campaign_id
패턴 2: 판매 + 재고
시나리오:
1
2
3
4
5
6
7
8
판매 DB (MySQL):
- Orders
재고 DB (PostgreSQL):
- Inventory
목표:
재고 부족 예측 및 재주문 추천
구현:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Model 1: Sales Velocity (MySQL)
CREATE MODEL product_sales_velocity AS
SELECT
product_id,
AVG(daily_sales) as avg_daily_sales,
STDDEV(daily_sales) as sales_variance
FROM (
SELECT
product_id,
DATE(created_at) as sale_date,
SUM(quantity) as daily_sales
FROM order_items
WHERE created_at >= CURRENT_DATE - INTERVAL 90 DAY
GROUP BY product_id, DATE(created_at)
) daily_stats
GROUP BY product_id
-- Model 2: Current Inventory (PostgreSQL)
CREATE MODEL current_inventory AS
SELECT
product_id,
stock_quantity,
reorder_point,
lead_time_days
FROM inventory
WHERE is_active = true
-- Question: Reorder Recommendations
Data: current_inventory
Join: product_sales_velocity ON product_id
Custom field: days_until_stockout
= stock_quantity / avg_daily_sales
Filter: days_until_stockout < lead_time_days
Sort: days_until_stockout ASC
Result:
우선 재주문해야 할 상품 목록
패턴 3: 실적 + 예산
시나리오:
1
2
3
4
5
6
7
8
실적 DB (PostgreSQL):
- Actual sales
예산 (Google Sheets):
- Budget forecasts
목표:
예산 대비 실적 추적
구현:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Model 1: Actual Sales (PostgreSQL)
-- (이미 존재)
-- Model 2: Budget (Google Sheets 연결)
Database type: Google Sheets
Spreadsheet: "2025 Budget"
Sheet: "Monthly Targets"
Columns:
- month (YYYY-MM)
- department
- budget_amount
-- Dashboard: Budget vs Actual
Question 1: Actual by Month
Data: Actual Sales
Metrics: Sum(revenue)
Group by: month, department
Question 2: Budget by Month
Data: Budget sheet
Metrics: Sum(budget_amount)
Group by: month, department
Question 3: Variance
Custom Expression:
[Actual] - [Budget]
Color:
Green if positive
Red if negative
Visualization: Line chart with 2 series
- Actual (solid line)
- Budget (dashed line)
성능 최적화
1. Models 미리 집계
❌ 나쁜 예:
1
2
3
4
5
-- Model에서 raw data 그대로
SELECT * FROM orders -- 1M rows
-- Question에서 집계
-- 매번 1M rows 처리
✅ 좋은 예:
1
2
3
4
5
6
7
8
9
10
11
-- Model에서 미리 집계
SELECT
DATE(created_at) as order_date,
customer_id,
SUM(total) as total_revenue,
COUNT(*) as order_count
FROM orders
GROUP BY DATE(created_at), customer_id
-- 10K rows only
-- Question은 집계된 데이터 사용
2. 캐싱 활용
1
2
3
4
5
6
7
Model settings:
Cache TTL: 1 hour
Reasoning:
- Cross-DB join은 비용이 높음
- 결과를 캐시하여 재사용
- 실시간성이 덜 중요한 분석에 적합
3. 데이터 이동 최소화
1
2
3
4
5
6
7
8
9
Principle:
"Bring compute to data, not data to compute"
✅ 좋은 방법:
- 각 DB에서 최대한 집계
- Metabase는 결과만 결합
❌ 나쁜 방법:
- Raw data를 모두 가져와서 Metabase에서 처리
4. Incremental Models
1
2
3
4
5
6
7
-- Full refresh (매일)
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
-- Incremental (매시간)
SELECT * FROM orders
WHERE created_at >= CURRENT_DATE
보안 고려사항
1. 최소 권한 원칙
1
2
3
4
5
-- ❌ 나쁜 예
GRANT ALL ON DATABASE production TO metabase;
-- ✅ 좋은 예
GRANT SELECT ON specific_tables TO metabase_readonly;
2. Row-level Security
1
2
3
4
5
6
7
8
9
10
-- Model에서 보안 적용
SELECT *
FROM orders
WHERE
-- 민감한 데이터 제외
order_total < 10000
-- 특정 상태만
AND status IN ('completed', 'shipped')
-- 최근 데이터만
AND created_at >= CURRENT_DATE - INTERVAL '2 years'
3. 데이터 마스킹
1
2
3
4
5
6
7
8
-- PII 마스킹
SELECT
id,
CONCAT('Customer ', customer_id) as customer_ref, -- ID 대신
LEFT(email, 3) || '***@' || SPLIT_PART(email, '@', 2) as masked_email,
-- john@example.com → joh***@example.com
order_total
FROM orders
4. 네트워크 보안
1
2
3
4
5
6
7
8
9
10
VPN / SSH Tunnel:
☑ Use SSH tunnel
SSH host: bastion.company.com
SSH port: 22
SSH username: metabase
Private key: [upload]
SSL/TLS:
☑ Use SSL connection
Certificate: [upload CA certificate]
실전 연습 문제
연습 1: 기본 멀티 DB 분석 (초급)
과제: 주문 DB와 고객 DB를 결합하여 고객 세그먼트별 매출 분석
DB 1 (PostgreSQL): Orders DB 2 (MySQL): Customers with segments
해답 보기
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Model 1: Orders Summary (PostgreSQL)
CREATE MODEL orders_summary AS
SELECT
customer_id,
COUNT(*) as order_count,
SUM(total) as total_revenue,
AVG(total) as avg_order_value,
MAX(created_at) as last_order_date
FROM orders
WHERE status = 'completed'
GROUP BY customer_id
-- Model 2: Customer Segments (MySQL)
CREATE MODEL customer_segments AS
SELECT
id as customer_id,
name,
segment, -- VIP, Regular, New
industry,
signup_date
FROM customers
-- Question: Revenue by Segment
Data: orders_summary
Join: customer_segments
ON orders_summary.customer_id = customer_segments.customer_id
Metrics:
- SUM(total_revenue)
- COUNT(customer_id)
- AVG(avg_order_value)
Group by: segment
Visualization: Bar chart
X-axis: segment
Y-axis: SUM(total_revenue)
Color: by segment
Result:
VIP: $500K (100 customers, $5K avg)
Regular: $300K (500 customers, $600 avg)
New: $50K (200 customers, $250 avg)
연습 2: 마케팅 퍼널 분석 (중급)
과제: 광고 클릭 (BigQuery) → 가입 (PostgreSQL) → 구매 (PostgreSQL) 퍼널
해답 보기
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Model 1: Ad Clicks (BigQuery)
CREATE MODEL ad_clicks AS
SELECT
DATE(event_timestamp) as click_date,
campaign_id,
user_pseudo_id,
COUNT(*) as clicks
FROM `project.dataset.events_*`
WHERE event_name = 'campaign_click'
AND _TABLE_SUFFIX BETWEEN '20250101' AND '20251231'
GROUP BY click_date, campaign_id, user_pseudo_id
-- Model 2: Signups (PostgreSQL)
CREATE MODEL signups AS
SELECT
DATE(created_at) as signup_date,
id as user_id,
referral_campaign_id as campaign_id
FROM users
WHERE created_at >= '2025-01-01'
-- Model 3: Purchases (PostgreSQL)
CREATE MODEL first_purchases AS
SELECT
user_id,
MIN(DATE(created_at)) as first_purchase_date,
SUM(total) as first_purchase_amount
FROM orders
WHERE status = 'completed'
GROUP BY user_id
-- Dashboard: Marketing Funnel
Question 1: Funnel Overview
Data: ad_clicks (base)
Join 1: signups
ON ad_clicks.user_pseudo_id = signups.user_id::text
AND ad_clicks.click_date = signups.signup_date
Join 2: first_purchases
ON signups.user_id = first_purchases.user_id
Metrics:
- Total clicks: COUNT(ad_clicks.user_pseudo_id)
- Signups: COUNT(signups.user_id)
- Purchases: COUNT(first_purchases.user_id)
- Signup rate: signups / clicks * 100
- Purchase rate: purchases / signups * 100
- Overall conversion: purchases / clicks * 100
Group by: campaign_id
Question 2: Funnel Visualization
Visualization: Funnel
Steps:
1. Clicked ad: 10,000 (100%)
2. Signed up: 2,000 (20%)
3. Made purchase: 400 (4% of clicks, 20% of signups)
Question 3: ROI by Campaign
Custom expressions:
- CAC = Ad Spend / Purchases
- LTV (estimate from first purchase)
- ROI = (LTV - CAC) / CAC * 100
Filter: ROI > 0 (profitable campaigns only)
Sort: ROI descending
연습 3: 통합 리포팅 시스템 (고급)
과제: 4개 데이터소스를 통합한 Executive Dashboard
Sources:
- Sales DB (PostgreSQL)
- CRM (MySQL)
- Marketing (BigQuery)
- Finance (Google Sheets)
해답 보기
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
Architecture:
1. Sales DB (PostgreSQL)
→ Model: "Sales Metrics"
- Daily revenue
- Order count
- Product mix
2. CRM (MySQL)
→ Model: "Pipeline Metrics"
- Lead count by stage
- Conversion rates
- Sales cycle length
3. Marketing (BigQuery)
→ Model: "Marketing Metrics"
- Campaign performance
- Channel attribution
- CAC by channel
4. Finance (Google Sheets)
→ Database connection
- Budget by department
- Forecast by month
Dashboard: "Executive Summary"
Row 1: Company KPIs (4 Number cards)
┌─────────────┬─────────────┬─────────────┬─────────────┐
│ Revenue │ Pipeline │ Marketing │ Burn Rate │
│ (Sales) │ (CRM) │ (BigQuery)│ (Sheets) │
└─────────────┴─────────────┴─────────────┴─────────────┘
1. Monthly Revenue (Sales DB)
Model: Sales Metrics
Metric: SUM(revenue) WHERE month = CURRENT_MONTH
vs Last Month: +12%
2. Pipeline Value (CRM)
Model: Pipeline Metrics
Metric: SUM(deal_value) WHERE stage IN ('Proposal', 'Negotiation')
3. Marketing ROI (BigQuery)
Model: Marketing Metrics
Calculation: (Revenue from campaigns - Ad spend) / Ad spend
4. Burn Rate (Google Sheets)
Source: "Finance" sheet
Metric: Monthly expenses / Cash balance
Runway: 18 months
Row 2: Trends (2 Line charts)
┌──────────────────────────┬──────────────────────────┐
│ Revenue vs Budget │ Customer Acquisition │
│ (Sales + Sheets) │ (Marketing + Sales) │
└──────────────────────────┴──────────────────────────┘
Chart 1: Revenue vs Budget
Data sources:
- Actual: Sales Metrics (PostgreSQL)
- Budget: Finance sheet (Google Sheets)
Join: BY month
Series:
- Actual Revenue (solid line, blue)
- Budget (dashed line, gray)
- Variance (area fill, green/red)
Chart 2: Customer Acquisition
Data sources:
- Marketing spend: BigQuery
- New customers: PostgreSQL
Metrics:
- New customers (bar)
- CAC (line)
Row 3: Detailed Breakdowns (2 Tables)
┌──────────────────────────┬──────────────────────────┐
│ Sales by Channel │ Top Campaigns │
│ (CRM + Sales + Mktg) │ (BigQuery + Sales) │
└──────────────────────────┴──────────────────────────┘
Table 1: Sales by Channel
Data model integration:
- CRM: Lead source
- Sales: Revenue
- Marketing: Campaign data
Columns:
- Channel
- Leads (CRM)
- Customers (Sales)
- Revenue (Sales)
- Conversion % (calc)
- CAC (Marketing)
- LTV (Sales)
- LTV:CAC ratio (calc)
Table 2: Top Campaigns
Joined models:
- Marketing Metrics (BigQuery): Campaign details, spend
- Sales Metrics (PostgreSQL): Revenue attribution
Columns:
- Campaign name
- Impressions
- Clicks
- CTR
- Conversions
- Revenue
- ROAS
Filters (Dashboard-level):
- Date range: Last 90 days (default)
- Department: All (for budget comparison)
- Channel: All
Refresh schedule:
- Sales data: Every 15 minutes (recent)
- CRM data: Every hour (slower changes)
- Marketing data: Every 30 minutes (campaign updates)
- Finance data: Daily (manual updates)
Caching strategy:
- Sales metrics: 5 minutes
- CRM metrics: 1 hour
- Marketing metrics: 15 minutes
- Finance data: 24 hours
Performance:
- Dashboard load time: < 5 seconds
- Each model pre-aggregated
- Cross-DB joins minimized
- Heavy lifting done in Models
다음 단계
멀티 데이터소스를 마스터했습니다. 다음 포스트에서는:
- 커스터마이징: 화이트라벨링, 플러그인
- 고급 설정: 환경 변수, 설정 관리
- 확장성: Scale-out, 고가용성
요약
멀티 소스 전략
graph TD
A[Data Strategy] --> B{Complexity}
B -->|Simple| C[Models + GUI]
B -->|Medium| D[ETL + Warehouse]
B -->|Complex| E[Data Lake + Federation]
C --> C1[2-3 databases]
D --> D1[3-10 databases]
E --> E1[10+ databases]
체크리스트
연결 설정:
- Read-only 사용자 생성
- Read replica 사용 (가능한 경우)
- SSL/TLS 활성화
- 방화벽 규칙 설정
Models 설계:
- 각 DB에서 미리 집계
- 명확한 join key
- 적절한 캐싱 TTL
- 문서화
성능:
- 인덱스 최적화
- 쿼리 복잡도 최소화
- 캐싱 전략
- 모니터링 설정
보안:
- 최소 권한
- Row-level security
- 데이터 마스킹
- 접근 로그 감사
다음 포스트에서는 Metabase를 조직의 요구사항에 맞게 커스터마이징하는 방법을 배웁니다!
📚 시리즈 전체 목차
🚀 기초편 (1-5화)
💪 활용편 (6-10화)
🎯 고급편 (11-16화)
이 기사는 저작권자의 CC BY 4.0 라이센스를 따릅니다.