포스트

[이제와서 시작하는 Metabase 마스터하기 #13] 멀티 데이터소스 - 여러 DB 활용

[이제와서 시작하는 Metabase 마스터하기 #13] 멀티 데이터소스 - 여러 DB 활용

학습 목표

이 포스트를 마치면 다음을 할 수 있습니다:

  • 여러 데이터베이스 연결 및 관리
  • Cross-database joins로 서로 다른 DB 데이터 결합
  • 데이터 통합 전략 수립
  • 성능과 보안 고려사항 이해
  • 실전 멀티 소스 분석 구현

멀티 데이터소스 개요

왜 필요한가?

graph TD
    A[회사 데이터] --> B[PostgreSQL]
    A --> C[MySQL]
    A --> D[MongoDB]
    A --> E[Google Sheets]

    B --> B1[Transactional DB]
    C --> C1[Legacy System]
    D --> D1[User Events]
    E --> E1[Manual Data]

    F[Metabase] --> B
    F --> C
    F --> D
    F --> E

    F --> G[통합 분석]

일반적인 시나리오:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Production DB (PostgreSQL):
  - Orders
  - Customers
  - Products

Analytics DB (BigQuery):
  - Web analytics
  - User behavior
  - Campaign data

CRM (MySQL):
  - Sales leads
  - Customer interactions

Spreadsheets (Google Sheets):
  - Manual forecasts
  - Budget data

데이터베이스 연결

지원되는 데이터베이스

Commercial:

  • PostgreSQL
  • MySQL / MariaDB
  • Microsoft SQL Server
  • Oracle
  • IBM DB2

Cloud:

  • Amazon RDS / Aurora
  • Google BigQuery
  • Snowflake
  • Amazon Redshift
  • Azure SQL

NoSQL:

  • MongoDB
  • DynamoDB (via SQL)

Others:

  • Google Sheets
  • SQLite
  • Presto / Trino
  • Apache Druid

데이터베이스 추가

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Admin > Databases > Add database

Connection details:
  Name: Production PostgreSQL
  Type: PostgreSQL
  Host: db.company.com
  Port: 5432
  Database name: production
  Username: metabase_readonly
  Password: ••••••••

Advanced options:
  ☑ Use SSL
  ☑ Tunnel through SSH
  Schema: public

Test connection → Save

연결 전략

읽기 전용 사용자 생성:

1
2
3
4
5
6
7
8
9
-- PostgreSQL
CREATE USER metabase_readonly WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE production TO metabase_readonly;
GRANT USAGE ON SCHEMA public TO metabase_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO metabase_readonly;

-- 새 테이블에도 자동 권한
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO metabase_readonly;

Read Replica 사용:

1
2
3
4
5
6
7
8
9
10
Production Master (writes):
  app.company.com:5432

Read Replica (analytics):
  analytics.company.com:5432 ← Metabase 연결

장점:
  - Production DB에 부하 없음
  - 복잡한 쿼리 실행 가능
  - 분석 전용 최적화

Models로 데이터 통합

개념

Models를 사용하여 서로 다른 DB의 데이터를 Metabase 내에서 결합

graph LR
    A[PostgreSQL Orders] --> C[Model: Order Details]
    B[MySQL CRM] --> C

    C --> D[Dashboard]

실전 예제: 주문 + CRM 통합

Step 1: PostgreSQL Model (Orders)

1
2
3
4
5
6
7
8
9
10
11
12
-- Model: "Order Base"
-- Database: Production PostgreSQL

SELECT
  id as order_id,
  created_at as order_date,
  total as order_total,
  customer_id,
  status
FROM orders
WHERE created_at >= '2024-01-01'
  AND status = 'completed'

Step 2: MySQL Model (CRM Data)

1
2
3
4
5
6
7
8
9
10
11
-- Model: "Customer CRM"
-- Database: CRM MySQL

SELECT
  id as customer_id,
  account_owner as sales_rep,
  customer_tier,
  industry,
  annual_contract_value as acv
FROM accounts
WHERE is_active = true

Step 3: Question에서 Models 결합

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
New Question > Pick data

Step 1: Select "Order Base" model
Step 2: Join with "Customer CRM" model
  Join type: Left join
  Order Base.customer_id = Customer CRM.customer_id

Step 3: Select fields:
  - Order date
  - Order total
  - Sales rep
  - Customer tier
  - Industry

Step 4: Summarize:
  Sum of Order total
  Group by: Sales rep

Result:
  영업 담당자별 매출 (Production DB + CRM DB 통합)

Cross-Database Limitations

Metabase의 제약:

1
2
3
4
5
6
7
8
9
✅ 가능:
  - GUI builder로 Models 간 join
  - 각 DB에서 Model 생성 후 결합
  - Dashboard 레벨에서 통합 뷰

❌ 불가능:
  - Native SQL에서 직접 cross-database join
    (SELECT * FROM pg.orders JOIN mysql.customers ...)
  - 매우 복잡한 multi-step transformations

해결 방법:

1
2
3
4
5
6
7
8
9
10
11
Option 1: Models + GUI builder
  - 각 DB에서 Model 생성
  - GUI로 결합

Option 2: ETL to single DB
  - DBT, Airflow 등으로 데이터 이동
  - 단일 DB에서 분석

Option 3: Data warehouse
  - Snowflake, BigQuery 등에 모든 데이터 집중
  - Metabase는 warehouse에만 연결

통합 분석 패턴

패턴 1: 주문 + 웹 분석

시나리오:

1
2
3
4
5
6
7
8
주문 DB (PostgreSQL):
  - Orders table

웹 분석 (BigQuery):
  - GA4 events

목표:
  광고 클릭부터 구매까지 전체 퍼널 분석

구현:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Model 1: Orders (PostgreSQL)
CREATE MODEL orders_with_ref AS
SELECT
  id,
  created_at,
  total,
  referral_source,  -- utm_source
  campaign_id       -- utm_campaign
FROM orders

-- Model 2: Campaign Clicks (BigQuery)
CREATE MODEL campaign_clicks AS
SELECT
  campaign_id,
  DATE(event_timestamp) as click_date,
  COUNT(*) as clicks,
  SUM(cost) as ad_spend
FROM ga4_events
WHERE event_name = 'campaign_click'
GROUP BY campaign_id, DATE(event_timestamp)

-- Question: ROI by Campaign
Data: orders_with_ref
Join: campaign_clicks ON campaign_id
Metrics:
  - Total revenue (from orders)
  - Total ad spend (from clicks)
  - ROI = (Revenue - Spend) / Spend
Group by: campaign_id

패턴 2: 판매 + 재고

시나리오:

1
2
3
4
5
6
7
8
판매 DB (MySQL):
  - Orders

재고 DB (PostgreSQL):
  - Inventory

목표:
  재고 부족 예측 및 재주문 추천

구현:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Model 1: Sales Velocity (MySQL)
CREATE MODEL product_sales_velocity AS
SELECT
  product_id,
  AVG(daily_sales) as avg_daily_sales,
  STDDEV(daily_sales) as sales_variance
FROM (
  SELECT
    product_id,
    DATE(created_at) as sale_date,
    SUM(quantity) as daily_sales
  FROM order_items
  WHERE created_at >= CURRENT_DATE - INTERVAL 90 DAY
  GROUP BY product_id, DATE(created_at)
) daily_stats
GROUP BY product_id

-- Model 2: Current Inventory (PostgreSQL)
CREATE MODEL current_inventory AS
SELECT
  product_id,
  stock_quantity,
  reorder_point,
  lead_time_days
FROM inventory
WHERE is_active = true

-- Question: Reorder Recommendations
Data: current_inventory
Join: product_sales_velocity ON product_id
Custom field: days_until_stockout
  = stock_quantity / avg_daily_sales

Filter: days_until_stockout < lead_time_days
Sort: days_until_stockout ASC

Result:
  우선 재주문해야  상품 목록

패턴 3: 실적 + 예산

시나리오:

1
2
3
4
5
6
7
8
실적 DB (PostgreSQL):
  - Actual sales

예산 (Google Sheets):
  - Budget forecasts

목표:
  예산 대비 실적 추적

구현:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Model 1: Actual Sales (PostgreSQL)
-- (이미 존재)

-- Model 2: Budget (Google Sheets 연결)
Database type: Google Sheets
Spreadsheet: "2025 Budget"
Sheet: "Monthly Targets"

Columns:
  - month (YYYY-MM)
  - department
  - budget_amount

-- Dashboard: Budget vs Actual
Question 1: Actual by Month
  Data: Actual Sales
  Metrics: Sum(revenue)
  Group by: month, department

Question 2: Budget by Month
  Data: Budget sheet
  Metrics: Sum(budget_amount)
  Group by: month, department

Question 3: Variance
  Custom Expression:
    [Actual] - [Budget]

  Color:
    Green if positive
    Red if negative

Visualization: Line chart with 2 series
  - Actual (solid line)
  - Budget (dashed line)

성능 최적화

1. Models 미리 집계

❌ 나쁜 예:

1
2
3
4
5
-- Model에서 raw data 그대로
SELECT * FROM orders  -- 1M rows

-- Question에서 집계
-- 매번 1M rows 처리

✅ 좋은 예:

1
2
3
4
5
6
7
8
9
10
11
-- Model에서 미리 집계
SELECT
  DATE(created_at) as order_date,
  customer_id,
  SUM(total) as total_revenue,
  COUNT(*) as order_count
FROM orders
GROUP BY DATE(created_at), customer_id
-- 10K rows only

-- Question은 집계된 데이터 사용

2. 캐싱 활용

1
2
3
4
5
6
7
Model settings:
  Cache TTL: 1 hour

Reasoning:
  - Cross-DB join은 비용이 높음
  - 결과를 캐시하여 재사용
  - 실시간성이 덜 중요한 분석에 적합

3. 데이터 이동 최소화

1
2
3
4
5
6
7
8
9
Principle:
  "Bring compute to data, not data to compute"

✅ 좋은 방법:
  - 각 DB에서 최대한 집계
  - Metabase는 결과만 결합

❌ 나쁜 방법:
  - Raw data를 모두 가져와서 Metabase에서 처리

4. Incremental Models

1
2
3
4
5
6
7
-- Full refresh (매일)
SELECT * FROM orders
WHERE created_at >= '2024-01-01'

-- Incremental (매시간)
SELECT * FROM orders
WHERE created_at >= CURRENT_DATE

보안 고려사항

1. 최소 권한 원칙

1
2
3
4
5
-- ❌ 나쁜 예
GRANT ALL ON DATABASE production TO metabase;

-- ✅ 좋은 예
GRANT SELECT ON specific_tables TO metabase_readonly;

2. Row-level Security

1
2
3
4
5
6
7
8
9
10
-- Model에서 보안 적용
SELECT *
FROM orders
WHERE
  -- 민감한 데이터 제외
  order_total < 10000
  -- 특정 상태만
  AND status IN ('completed', 'shipped')
  -- 최근 데이터만
  AND created_at >= CURRENT_DATE - INTERVAL '2 years'

3. 데이터 마스킹

1
2
3
4
5
6
7
8
-- PII 마스킹
SELECT
  id,
  CONCAT('Customer ', customer_id) as customer_ref,  -- ID 대신
  LEFT(email, 3) || '***@' || SPLIT_PART(email, '@', 2) as masked_email,
  -- john@example.com → joh***@example.com
  order_total
FROM orders

4. 네트워크 보안

1
2
3
4
5
6
7
8
9
10
VPN / SSH Tunnel:
  ☑ Use SSH tunnel
  SSH host: bastion.company.com
  SSH port: 22
  SSH username: metabase
  Private key: [upload]

SSL/TLS:
  ☑ Use SSL connection
  Certificate: [upload CA certificate]

실전 연습 문제

연습 1: 기본 멀티 DB 분석 (초급)

과제: 주문 DB와 고객 DB를 결합하여 고객 세그먼트별 매출 분석

DB 1 (PostgreSQL): Orders DB 2 (MySQL): Customers with segments

해답 보기
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Model 1: Orders Summary (PostgreSQL)
CREATE MODEL orders_summary AS
SELECT
  customer_id,
  COUNT(*) as order_count,
  SUM(total) as total_revenue,
  AVG(total) as avg_order_value,
  MAX(created_at) as last_order_date
FROM orders
WHERE status = 'completed'
GROUP BY customer_id

-- Model 2: Customer Segments (MySQL)
CREATE MODEL customer_segments AS
SELECT
  id as customer_id,
  name,
  segment,  -- VIP, Regular, New
  industry,
  signup_date
FROM customers

-- Question: Revenue by Segment
Data: orders_summary
Join: customer_segments
  ON orders_summary.customer_id = customer_segments.customer_id

Metrics:
  - SUM(total_revenue)
  - COUNT(customer_id)
  - AVG(avg_order_value)

Group by: segment

Visualization: Bar chart
  X-axis: segment
  Y-axis: SUM(total_revenue)
  Color: by segment

Result:
  VIP: $500K (100 customers, $5K avg)
  Regular: $300K (500 customers, $600 avg)
  New: $50K (200 customers, $250 avg)

연습 2: 마케팅 퍼널 분석 (중급)

과제: 광고 클릭 (BigQuery) → 가입 (PostgreSQL) → 구매 (PostgreSQL) 퍼널

해답 보기
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Model 1: Ad Clicks (BigQuery)
CREATE MODEL ad_clicks AS
SELECT
  DATE(event_timestamp) as click_date,
  campaign_id,
  user_pseudo_id,
  COUNT(*) as clicks
FROM `project.dataset.events_*`
WHERE event_name = 'campaign_click'
  AND _TABLE_SUFFIX BETWEEN '20250101' AND '20251231'
GROUP BY click_date, campaign_id, user_pseudo_id

-- Model 2: Signups (PostgreSQL)
CREATE MODEL signups AS
SELECT
  DATE(created_at) as signup_date,
  id as user_id,
  referral_campaign_id as campaign_id
FROM users
WHERE created_at >= '2025-01-01'

-- Model 3: Purchases (PostgreSQL)
CREATE MODEL first_purchases AS
SELECT
  user_id,
  MIN(DATE(created_at)) as first_purchase_date,
  SUM(total) as first_purchase_amount
FROM orders
WHERE status = 'completed'
GROUP BY user_id

-- Dashboard: Marketing Funnel

Question 1: Funnel Overview
Data: ad_clicks (base)
Join 1: signups
  ON ad_clicks.user_pseudo_id = signups.user_id::text
  AND ad_clicks.click_date = signups.signup_date
Join 2: first_purchases
  ON signups.user_id = first_purchases.user_id

Metrics:
  - Total clicks: COUNT(ad_clicks.user_pseudo_id)
  - Signups: COUNT(signups.user_id)
  - Purchases: COUNT(first_purchases.user_id)
  - Signup rate: signups / clicks * 100
  - Purchase rate: purchases / signups * 100
  - Overall conversion: purchases / clicks * 100

Group by: campaign_id

Question 2: Funnel Visualization
Visualization: Funnel
Steps:
  1. Clicked ad: 10,000 (100%)
  2. Signed up: 2,000 (20%)
  3. Made purchase: 400 (4% of clicks, 20% of signups)

Question 3: ROI by Campaign
Custom expressions:
  - CAC = Ad Spend / Purchases
  - LTV (estimate from first purchase)
  - ROI = (LTV - CAC) / CAC * 100

Filter: ROI > 0 (profitable campaigns only)
Sort: ROI descending

연습 3: 통합 리포팅 시스템 (고급)

과제: 4개 데이터소스를 통합한 Executive Dashboard

Sources:

  1. Sales DB (PostgreSQL)
  2. CRM (MySQL)
  3. Marketing (BigQuery)
  4. Finance (Google Sheets)
해답 보기
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
Architecture:

1. Sales DB (PostgreSQL)
   → Model: "Sales Metrics"
     - Daily revenue
     - Order count
     - Product mix

2. CRM (MySQL)
   → Model: "Pipeline Metrics"
     - Lead count by stage
     - Conversion rates
     - Sales cycle length

3. Marketing (BigQuery)
   → Model: "Marketing Metrics"
     - Campaign performance
     - Channel attribution
     - CAC by channel

4. Finance (Google Sheets)
   → Database connection
     - Budget by department
     - Forecast by month

Dashboard: "Executive Summary"

Row 1: Company KPIs (4 Number cards)
  ┌─────────────┬─────────────┬─────────────┬─────────────┐
  │   Revenue   │   Pipeline  │  Marketing  │   Burn Rate │
  │   (Sales)   │    (CRM)    │   (BigQuery)│  (Sheets)   │
  └─────────────┴─────────────┴─────────────┴─────────────┘

  1. Monthly Revenue (Sales DB)
     Model: Sales Metrics
     Metric: SUM(revenue) WHERE month = CURRENT_MONTH
     vs Last Month: +12%

  2. Pipeline Value (CRM)
     Model: Pipeline Metrics
     Metric: SUM(deal_value) WHERE stage IN ('Proposal', 'Negotiation')

  3. Marketing ROI (BigQuery)
     Model: Marketing Metrics
     Calculation: (Revenue from campaigns - Ad spend) / Ad spend

  4. Burn Rate (Google Sheets)
     Source: "Finance" sheet
     Metric: Monthly expenses / Cash balance
     Runway: 18 months

Row 2: Trends (2 Line charts)
  ┌──────────────────────────┬──────────────────────────┐
  │  Revenue vs Budget       │  Customer Acquisition    │
  │  (Sales + Sheets)        │  (Marketing + Sales)     │
  └──────────────────────────┴──────────────────────────┘

  Chart 1: Revenue vs Budget
    Data sources:
      - Actual: Sales Metrics (PostgreSQL)
      - Budget: Finance sheet (Google Sheets)

    Join: BY month

    Series:
      - Actual Revenue (solid line, blue)
      - Budget (dashed line, gray)
      - Variance (area fill, green/red)

  Chart 2: Customer Acquisition
    Data sources:
      - Marketing spend: BigQuery
      - New customers: PostgreSQL

    Metrics:
      - New customers (bar)
      - CAC (line)

Row 3: Detailed Breakdowns (2 Tables)
  ┌──────────────────────────┬──────────────────────────┐
  │  Sales by Channel        │  Top Campaigns           │
  │  (CRM + Sales + Mktg)    │  (BigQuery + Sales)      │
  └──────────────────────────┴──────────────────────────┘

  Table 1: Sales by Channel
    Data model integration:
      - CRM: Lead source
      - Sales: Revenue
      - Marketing: Campaign data

    Columns:
      - Channel
      - Leads (CRM)
      - Customers (Sales)
      - Revenue (Sales)
      - Conversion % (calc)
      - CAC (Marketing)
      - LTV (Sales)
      - LTV:CAC ratio (calc)

  Table 2: Top Campaigns
    Joined models:
      - Marketing Metrics (BigQuery): Campaign details, spend
      - Sales Metrics (PostgreSQL): Revenue attribution

    Columns:
      - Campaign name
      - Impressions
      - Clicks
      - CTR
      - Conversions
      - Revenue
      - ROAS

Filters (Dashboard-level):
  - Date range: Last 90 days (default)
  - Department: All (for budget comparison)
  - Channel: All

Refresh schedule:
  - Sales data: Every 15 minutes (recent)
  - CRM data: Every hour (slower changes)
  - Marketing data: Every 30 minutes (campaign updates)
  - Finance data: Daily (manual updates)

Caching strategy:
  - Sales metrics: 5 minutes
  - CRM metrics: 1 hour
  - Marketing metrics: 15 minutes
  - Finance data: 24 hours

Performance:
  - Dashboard load time: < 5 seconds
  - Each model pre-aggregated
  - Cross-DB joins minimized
  - Heavy lifting done in Models

다음 단계

멀티 데이터소스를 마스터했습니다. 다음 포스트에서는:

  • 커스터마이징: 화이트라벨링, 플러그인
  • 고급 설정: 환경 변수, 설정 관리
  • 확장성: Scale-out, 고가용성

요약

멀티 소스 전략

graph TD
    A[Data Strategy] --> B{Complexity}
    B -->|Simple| C[Models + GUI]
    B -->|Medium| D[ETL + Warehouse]
    B -->|Complex| E[Data Lake + Federation]

    C --> C1[2-3 databases]
    D --> D1[3-10 databases]
    E --> E1[10+ databases]

체크리스트

연결 설정:

  • Read-only 사용자 생성
  • Read replica 사용 (가능한 경우)
  • SSL/TLS 활성화
  • 방화벽 규칙 설정

Models 설계:

  • 각 DB에서 미리 집계
  • 명확한 join key
  • 적절한 캐싱 TTL
  • 문서화

성능:

  • 인덱스 최적화
  • 쿼리 복잡도 최소화
  • 캐싱 전략
  • 모니터링 설정

보안:

  • 최소 권한
  • Row-level security
  • 데이터 마스킹
  • 접근 로그 감사

다음 포스트에서는 Metabase를 조직의 요구사항에 맞게 커스터마이징하는 방법을 배웁니다!

📚 시리즈 전체 목차

🚀 기초편 (1-5화)

  1. Metabase 소개와 핵심 개념
  2. 설치와 초기 설정
  3. 샘플 데이터 둘러보기
  4. 첫 차트 만들기 - 실습 완전정복
  5. 대시보드 만들기 - 한 화면에 모으기

💪 활용편 (6-10화)

  1. 필터와 파라미터
  2. SQL 네이티브 쿼리
  3. 데이터 모델링
  4. 자동화와 알림
  5. 권한과 보안

🎯 고급편 (11-16화)

  1. 임베딩과 공유
  2. 성능 최적화
  3. [멀티 데이터소스] (현재 글)
  4. 커스터마이징
  5. 운영과 모니터링
  6. 실전 프로젝트
이 기사는 저작권자의 CC BY 4.0 라이센스를 따릅니다.