[Python 100일 챌린지] Day 43 - JSON 데이터 처리

게시 2025/04/12

JSON 데이터 처리

By YonYonWare

20 분읽는 시간

[Python 100일 챌린지] Day 43 - JSON 데이터 처리

json.loads('{"name":"철수", "age":25}') → Python 딕셔너리로 변신! 😊

날씨 API, 지도 API, 채팅 API… 모든 웹 서비스가 JSON으로 대화합니다. 카카오톡 챗봇, 주식 API, GitHub 정보 가져오기 모두 JSON으로 가능해요!

(40-50분 완독 ⭐⭐⭐ 난이도: 중급)

📚 사전 지식

Day 41: 파일 입출력 기초
Day 42: 텍스트 파일 고급 처리
Phase 4의 객체지향 프로그래밍 개념

🎯 학습 목표 1: JSON 형식의 개념 이해하기

1.1 JSON의 특징

JSON (JavaScript Object Notation):

경량 데이터 교환 형식
사람이 읽기 쉽고 기계가 파싱하기 쉬움
언어 독립적 (모든 언어에서 사용 가능)
웹 API의 사실상 표준

  
{
  "name": "Alice",
  "age": 25,
  "skills": ["Python", "JavaScript", "SQL"],
  "active": true,
  "address": {
    "city": "Seoul",
    "country": "Korea"
  }
}

1.2 JSON 데이터 타입

JSON 타입	Python 타입	예제
객체 `{}`	`dict`	`{"key": "value"}`
배열 `[]`	`list`	`[1, 2, 3]`
문자열	`str`	`"Hello"`
숫자	`int`, `float`	`42`, `3.14`
불린	`bool`	`true`, `false`
null	`None`	`null`

🎯 학습 목표 2: json 모듈로 데이터 직렬화하기

2.1 JSON 파일 읽기

  
import json

# JSON 파일 읽기
with open('data.json', 'r', encoding='utf-8') as f:
    data = json.load(f)

print(type(data))  # <class 'dict'>
print(data['name'])

2.2 JSON 파일 쓰기

  
# Python 딕셔너리
data = {
    "name": "Alice",
    "age": 25,
    "skills": ["Python", "JavaScript"],
    "active": True
}

# JSON 파일로 저장
with open('output.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

json.dump() 옵션:

ensure_ascii=False: 한글 등 유니코드 문자 유지
indent=2: 들여쓰기 (가독성)
sort_keys=True: 키 정렬

2.3 문자열 <-> JSON 변환

  
# Python 객체 → JSON 문자열
data = {"name": "Bob", "age": 30}
json_str = json.dumps(data, ensure_ascii=False)
print(json_str)  # '{"name": "Bob", "age": 30}'

# JSON 문자열 → Python 객체
parsed = json.loads(json_str)
print(type(parsed))  # <class 'dict'>

🎯 학습 목표 3: JSON 파일 읽기와 쓰기

3.1 ISO 8601 날짜/시간 형식이란?

ISO 8601은 국제 표준 날짜/시간 형식입니다:

형식: YYYY-MM-DDTHH:MM:SS
예시: 2025-04-12T10:30:00
JSON에서 날짜를 저장할 때 사용하는 표준 형식
Python의 datetime.isoformat() 메서드로 쉽게 변환 가능

3.2 복잡한 객체 직렬화

  
from datetime import datetime

class User:
    def __init__(self, name, joined_at):
        self.name = name
        self.joined_at = joined_at

# ❌ 직접 직렬화 불가
user = User("Alice", datetime.now())
# json.dumps(user)  # TypeError!

# ✅ 커스텀 인코더
class UserEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, User):
            return {
                'name': obj.name,
                'joined_at': obj.joined_at.isoformat()  # datetime → ISO 8601 문자열
            }
        if isinstance(obj, datetime):
            return obj.isoformat()  # ISO 8601 형식으로 변환

        return super().default(obj)

# 사용
json_str = json.dumps(user, cls=UserEncoder)
print(json_str)  # '{"name": "Alice", "joined_at": "2025-04-12T10:00:00"}'

3.3 역직렬화 with 객체 복원

  
def user_decoder(dct):
    """JSON을 User 객체로 복원"""
    if 'name' in dct and 'joined_at' in dct:
        return User(
            name=dct['name'],
            joined_at=datetime.fromisoformat(dct['joined_at'])  # ISO 8601 → datetime
        )
    return dct

# 사용
json_str = '{"name": "Alice", "joined_at": "2025-04-12T10:00:00"}'
user = json.loads(json_str, object_hook=user_decoder)
print(type(user))  # <class '__main__.User'>

🎯 학습 목표 4: 실전 JSON 활용 패턴 익히기

4.1 깊이 있는 데이터 접근

  
data = {
    "user": {
        "name": "Alice",
        "profile": {
            "email": "alice@example.com",
            "address": {
                "city": "Seoul",
                "country": "Korea"
            }
        }
    }
}

# 안전한 접근 (get 메서드)
email = data.get('user', {}).get('profile', {}).get('email')
print(email)

# 또는 함수로
def get_nested(data, *keys, default=None):
    """중첩 딕셔너리에서 안전하게 값 가져오기"""
    for key in keys:
        if isinstance(data, dict):
            data = data.get(key, default)
        else:
            return default
    return data

city = get_nested(data, 'user', 'profile', 'address', 'city')
print(city)  # Seoul

4.2 JSON 데이터 병합

  
def merge_json(obj1, obj2):
    """두 JSON 객체 병합"""
    if isinstance(obj1, dict) and isinstance(obj2, dict):
        result = obj1.copy()
        for key, value in obj2.items():
            if key in result:
                result[key] = merge_json(result[key], value)
            else:
                result[key] = value
        return result
    else:
        return obj2

# 사용
config_default = {"debug": False, "timeout": 30}
config_user = {"debug": True, "retry": 3}

merged = merge_json(config_default, config_user)
print(merged)  # {'debug': True, 'timeout': 30, 'retry': 3}

4.3 JSON 경로 탐색

🤔 재귀 함수가 처음이신가요?

재귀는 “자기 자신을 호출하는 함수”입니다. 중첩된 데이터를 탐색할 때 매우 유용해요!

실생활 비유: 러시아 인형(마트료시카)을 모두 여는 과정

인형을 연다
안에 또 인형이 있으면? → 다시 1번으로 (재귀!)
더 이상 인형이 없으면 종료

JSON의 중첩 구조도 이와 똑같습니다! 🎎

  
def find_all_values(data, target_key):
    """모든 중첩 레벨에서 키의 값 찾기"""
    results = []

    def search(obj):
        """
        재귀 함수 (Recursive Function):
        함수가 자기 자신을 다시 호출하는 패턴입니다.
        중첩된 데이터 구조를 탐색할 때 매우 유용합니다!
        """
        if isinstance(obj, dict):
            for key, value in obj.items():
                if key == target_key:
                    results.append(value)
                search(value)  # 🔄 자기 자신을 다시 호출 (재귀)
        elif isinstance(obj, list):
            for item in obj:
                search(item)  # 🔄 리스트의 각 항목도 재귀 탐색

    search(data)
    return results

# 사용
data = {
    "users": [
        {"name": "Alice", "profile": {"name": "Alice Park"}},
        {"name": "Bob", "profile": {"name": "Bob Kim"}}
    ]
}

names = find_all_values(data, 'name')
print(names)  # ['Alice', 'Alice Park', 'Bob', 'Bob Kim']

🎯 학습 목표 5: JSON 유효성 검증하기

5.1 기본 검증

  
def validate_json_file(filename):
    """JSON 파일 유효성 검증"""
    try:
        with open(filename, 'r', encoding='utf-8') as f:
            json.load(f)
        print(f"✅ {filename}은(는) 유효한 JSON입니다")
        return True
    except json.JSONDecodeError as e:
        print(f"❌ JSON 파싱 오류: {e}")
        return False
    except FileNotFoundError:
        print(f"❌ 파일을 찾을 수 없습니다: {filename}")
        return False

# validate_json_file('data.json')

5.2 스키마 검증 (jsonschema 라이브러리)

jsonschema란?

JSON 데이터의 구조를 정의하고 검증하는 라이브러리
API 요청/응답 검증, 설정 파일 검증에 유용
설치: pip install jsonschema

  
# pip install jsonschema

from jsonschema import validate, ValidationError

# 스키마 정의
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "number", "minimum": 0},
        "email": {"type": "string", "format": "email"}
    },
    "required": ["name", "age"]
}

# 검증
data_valid = {"name": "Alice", "age": 25, "email": "alice@example.com"}
data_invalid = {"name": "Bob"}  # age 누락

try:
    validate(instance=data_valid, schema=schema)
    print("✅ 유효한 데이터")
except ValidationError as e:
    print(f"❌ 검증 실패: {e.message}")

try:
    validate(instance=data_invalid, schema=schema)
except ValidationError as e:
    print(f"❌ 검증 실패: {e.message}")

🎯 학습 목표 6: 실전 API 연동과 종합 예제

6.1 JSON API 요청 (requests 사용)

requests 라이브러리란?

Python에서 HTTP 요청을 보내는 가장 인기 있는 라이브러리
웹 API 연동, 웹 스크래핑 등에 필수
설치: pip install requests

  
# pip install requests

import requests

def fetch_github_user(username):
    """GitHub API로 사용자 정보 가져오기"""
    url = f"https://api.github.com/users/{username}"

    response = requests.get(url)

    if response.status_code == 200:
        user_data = response.json()  # 자동으로 JSON 파싱
        return user_data
    else:
        print(f"오류: {response.status_code}")
        return None

# 사용
# user = fetch_github_user("torvalds")
# if user:
#     print(f"이름: {user['name']}")
#     print(f"팔로워: {user['followers']}")

6.2 API 응답 캐싱

  
import json
import os
from datetime import datetime, timedelta

class JSONCache:
    """JSON API 응답 캐싱"""
    def __init__(self, cache_dir='cache', ttl_hours=24):
        self.cache_dir = cache_dir
        self.ttl = timedelta(hours=ttl_hours)
        os.makedirs(cache_dir, exist_ok=True)

    def _get_cache_path(self, key):
        return os.path.join(self.cache_dir, f"{key}.json")

    def get(self, key):
        """캐시에서 데이터 가져오기"""
        cache_path = self._get_cache_path(key)

        if not os.path.exists(cache_path):
            return None

        # 캐시 유효성 확인
        mtime = datetime.fromtimestamp(os.path.getmtime(cache_path))
        if datetime.now() - mtime > self.ttl:
            return None

        with open(cache_path, 'r', encoding='utf-8') as f:
            return json.load(f)

    def set(self, key, data):
        """캐시에 데이터 저장"""
        cache_path = self._get_cache_path(key)

        with open(cache_path, 'w', encoding='utf-8') as f:
            json.dump(data, f, ensure_ascii=False, indent=2)

# 사용
cache = JSONCache()

def get_user_with_cache(username):
    """캐싱 적용된 사용자 정보 조회"""
    # 캐시 확인
    cached = cache.get(username)
    if cached:
        print("💾 캐시에서 로드")
        return cached

    # API 요청
    print("🌐 API 요청")
    user = fetch_github_user(username)

    if user:
        cache.set(username, user)

    return user

💡 실전 팁 & 주의사항

외부 라이브러리 정리

오늘 학습에서 사용한 외부 라이브러리들:

라이브러리	용도	설치 명령
`jsonschema`	JSON 스키마 검증	`pip install jsonschema`
`requests`	HTTP API 요청	`pip install requests`

참고: Python 표준 라이브러리 json 모듈은 별도 설치 없이 사용 가능합니다!

JSON 처리 시 주의사항

한글 처리: ensure_ascii=False 옵션 필수
대용량 파일: JSON Lines (JSONL) 형식 고려
날짜/시간: ISO 8601 형식 사용 권장
중첩 접근: 안전한 .get() 메서드 활용

📊 실전 종합 예제

예제 1: 설정 파일 관리

  
class ConfigManager:
    """JSON 기반 설정 관리자"""
    def __init__(self, config_file='config.json'):
        self.config_file = config_file
        self.config = self._load()

    def _load(self):
        """설정 파일 로드"""
        if os.path.exists(self.config_file):
            with open(self.config_file, 'r', encoding='utf-8') as f:
                return json.load(f)
        return {}

    def get(self, key, default=None):
        """설정 값 가져오기 (중첩 키 지원)"""
        keys = key.split('.')
        value = self.config

        for k in keys:
            if isinstance(value, dict):
                value = value.get(k)
            else:
                return default

        return value if value is not None else default

    def set(self, key, value):
        """설정 값 설정 (중첩 키 지원)"""
        keys = key.split('.')
        config = self.config

        for k in keys[:-1]:
            if k not in config:
                config[k] = {}
            config = config[k]

        config[keys[-1]] = value
        self._save()

    def _save(self):
        """설정 파일 저장"""
        with open(self.config_file, 'w', encoding='utf-8') as f:
            json.dump(self.config, f, ensure_ascii=False, indent=2)

# 사용
config = ConfigManager()
config.set('database.host', 'localhost')
config.set('database.port', 5432)
config.set('app.debug', True)

print(config.get('database.host'))  # localhost
print(config.get('app.timeout', 30))  # 30 (기본값)

예제 2: JSON 데이터 변환기

  
class JSONTransformer:
    """JSON 데이터 변환"""
    @staticmethod
    def flatten(nested_dict, parent_key='', sep='_'):
        """중첩 딕셔너리를 평탄화"""
        items = []

        for k, v in nested_dict.items():
            new_key = f"{parent_key}{sep}{k}" if parent_key else k

            if isinstance(v, dict):
                items.extend(
                    JSONTransformer.flatten(v, new_key, sep).items()
                )
            else:
                items.append((new_key, v))

        return dict(items)

    @staticmethod
    def unflatten(flat_dict, sep='_'):
        """평탄화된 딕셔너리를 중첩 구조로"""
        result = {}

        for key, value in flat_dict.items():
            parts = key.split(sep)
            d = result

            for part in parts[:-1]:
                if part not in d:
                    d[part] = {}
                d = d[part]

            d[parts[-1]] = value

        return result

# 사용
nested = {
    "user": {
        "name": "Alice",
        "address": {
            "city": "Seoul"
        }
    }
}

flat = JSONTransformer.flatten(nested)
print(flat)  # {'user_name': 'Alice', 'user_address_city': 'Seoul'}

restored = JSONTransformer.unflatten(flat)
print(restored)  # 원래 구조 복원

예제 3: JSON Lines (JSONL) 처리

JSON Lines (JSONL)란?

각 줄이 독립적인 JSON 객체인 파일 형식
대용량 데이터에 적합 (한 번에 1줄씩만 메모리에 로드)
스트리밍 처리 가능 (파일 전체를 메모리에 올리지 않음)

  
def read_jsonl(filename):
    """JSON Lines 파일 읽기 (각 줄이 JSON 객체)"""
    data = []

    with open(filename, 'r', encoding='utf-8') as f:
        for line in f:  # 한 줄씩 읽기 (메모리 효율적!)
            if line.strip():
                data.append(json.loads(line))

    return data

def write_jsonl(filename, data):
    """JSON Lines 파일 쓰기"""
    with open(filename, 'w', encoding='utf-8') as f:
        for item in data:
            f.write(json.dumps(item, ensure_ascii=False) + '\n')

# 사용 (대용량 데이터에 적합)
users = [
    {"name": "Alice", "age": 25},
    {"name": "Bob", "age": 30},
    {"name": "Charlie", "age": 35}
]

write_jsonl('users.jsonl', users)
loaded = read_jsonl('users.jsonl')

예제 4: JSON 데이터 필터링

람다 함수 (Lambda Function) 간단 설명:

이름 없는 작은 함수를 한 줄로 작성하는 방법
형식: lambda 매개변수: 반환값
예시: lambda x: x >= 30은 “x를 받아서 x가 30 이상이면 True 반환”

💡 callable() 함수란?

객체가 호출 가능한지(함수인지) 확인하는 내장 함수입니다.

  
callable(lambda x: x > 5)  # True (람다는 함수)
callable(10)               # False (숫자는 함수 아님)
callable(print)            # True (print도 함수)

  
def filter_json_array(data, conditions):
    """JSON 배열 필터링"""
    results = []

    for item in data:
        match = True

        for key, value in conditions.items():
            if callable(value):
                # 함수로 조건 검사 (람다 함수 사용 가능)
                if not value(item.get(key)):
                    match = False
                    break
            else:
                # 값으로 직접 비교
                if item.get(key) != value:
                    match = False
                    break

        if match:
            results.append(item)

    return results

# 사용
users = [
    {"name": "Alice", "age": 25, "city": "Seoul"},
    {"name": "Bob", "age": 30, "city": "Busan"},
    {"name": "Charlie", "age": 35, "city": "Seoul"}
]

# 나이 30 이상, 서울 거주
filtered = filter_json_array(users, {
    'age': lambda x: x >= 30,  # 람다 함수: 30 이상인지 확인
    'city': 'Seoul'  # 직접 비교: Seoul과 같은지 확인
})

print(filtered)  # [{'name': 'Charlie', 'age': 35, 'city': 'Seoul'}]

📝 오늘 배운 내용 정리

핵심 정리

JSON 기본:
- json.load(): 파일 → Python
- json.dump(): Python → 파일
- json.loads(): 문자열 → Python
- json.dumps(): Python → 문자열
옵션:
- ensure_ascii=False: 한글 유지
- indent=2: 가독성
- sort_keys=True: 키 정렬
중첩 데이터: 안전한 접근, 병합, 탐색
유효성 검증: jsonschema 라이브러리
실전 활용: API 연동, 캐싱, 설정 관리

체크리스트

JSON 데이터 타입과 Python 타입 매핑 이해
json.load()와 json.dump() 사용법 숙지
ISO 8601 날짜 형식 이해
중첩 딕셔너리 안전하게 접근하기
재귀 함수의 개념 이해
jsonschema로 데이터 검증하기
requests로 API 호출하기
람다 함수 기본 이해

🧪 연습 문제

문제 1: JSON 파일 병합

여러 JSON 파일을 하나로 병합하는 함수를 작성하세요.

해답 보기

  
def merge_json_files(file_list, output_file):
    """여러 JSON 파일을 하나로 병합"""
    merged = []

    for filename in file_list:
        with open(filename, 'r', encoding='utf-8') as f:
            data = json.load(f)

            if isinstance(data, list):
                merged.extend(data)
            else:
                merged.append(data)

    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(merged, f, ensure_ascii=False, indent=2)

# 테스트
files = ['users1.json', 'users2.json', 'users3.json']
merge_json_files(files, 'all_users.json')

문제 2: JSON 필드 추출

JSON 파일에서 특정 필드만 추출하여 새 파일로 저장하세요.