Working with JSON, CSV, and XML in Python: A Practical Guide

Data comes in many formats. Whether you’re consuming an API, processing spreadsheets, or parsing configuration files, you’ll encounter JSON, CSV, and XML. Understanding how to work with these formats efficiently is essential for any Python developer.

This guide covers all three formats, showing you how to read, write, parse, and manipulate data in each. By the end, you’ll know which format to use for different scenarios and how to handle each one effectively.

Understanding Data Formats

Before diving into code, let’s understand when to use each format:

JSON: Lightweight, human-readable, ideal for APIs and configuration. Best for nested data structures.
CSV: Simple tabular format, widely supported, perfect for spreadsheets and data analysis.
XML: Hierarchical, self-describing, good for complex documents and enterprise systems.

Part 1: Working with JSON

What is JSON?

JSON (JavaScript Object Notation) is a lightweight data format that’s easy for humans to read and for machines to parse. It’s the de facto standard for web APIs and modern data interchange.

Reading JSON Files

import json

# Read JSON from a file
with open('data.json', 'r') as file:
    data = json.load(file)

# Access the data
print(data)
print(data['name'])
print(data['users'][0]['email'])

Writing JSON Files

import json

# Create data structure
data = {
    'name': 'My Application',
    'version': '1.0.0',
    'users': [
        {'id': 1, 'name': 'Alice', 'email': '[email protected]'},
        {'id': 2, 'name': 'Bob', 'email': '[email protected]'}
    ],
    'settings': {
        'debug': True,
        'timeout': 30
    }
}

# Write to file
with open('output.json', 'w') as file:
    json.dump(data, file, indent=2)

# Or get JSON string
json_string = json.dumps(data, indent=2)
print(json_string)

Parsing JSON Strings

import json

# JSON string from API response
json_string = '''
{
    "status": "success",
    "data": {
        "user_id": 123,
        "username": "alice",
        "created_at": "2025-12-16"
    }
}
'''

# Parse JSON string
data = json.loads(json_string)
print(data['data']['username'])  # Output: alice

Practical Example: Processing API Response

import json
from datetime import datetime

def process_api_response(json_response):
    """Process and validate API response"""
    try:
        data = json.loads(json_response)
        
        # Validate required fields
        if 'status' not in data or 'data' not in data:
            raise ValueError("Invalid API response structure")
        
        # Extract and transform data
        users = []
        for user in data['data'].get('users', []):
            users.append({
                'id': user['id'],
                'name': user['name'],
                'email': user['email'],
                'joined': datetime.fromisoformat(user['created_at'])
            })
        
        return {
            'status': data['status'],
            'user_count': len(users),
            'users': users
        }
    
    except json.JSONDecodeError as e:
        print(f"Invalid JSON: {e}")
        return None
    except (KeyError, ValueError) as e:
        print(f"Data validation error: {e}")
        return None

# Example usage
api_response = '''
{
    "status": "success",
    "data": {
        "users": [
            {"id": 1, "name": "Alice", "email": "[email protected]", "created_at": "2025-01-01"},
            {"id": 2, "name": "Bob", "email": "[email protected]", "created_at": "2025-01-02"}
        ]
    }
}
'''

result = process_api_response(api_response)
if result:
    print(f"Processed {result['user_count']} users")

Handling Special Data Types

import json
from datetime import datetime
from decimal import Decimal

# Custom JSON encoder for special types
class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        elif isinstance(obj, Decimal):
            return float(obj)
        elif isinstance(obj, set):
            return list(obj)
        return super().default(obj)

# Data with special types
data = {
    'timestamp': datetime.now(),
    'price': Decimal('19.99'),
    'tags': {'python', 'json', 'tutorial'}
}

# Serialize with custom encoder
json_string = json.dumps(data, cls=CustomEncoder, indent=2)
print(json_string)

JSON Best Practices

Validate data: Always check for required fields and correct types
Handle errors: Use try-except for JSON parsing
Use indent: Make JSON readable with indent=2 or indent=4
Be careful with large files: Stream large JSON files instead of loading all at once
Use custom encoders: For non-standard types like datetime or Decimal

Part 2: Working with CSV

What is CSV?

CSV (Comma-Separated Values) is a simple format for tabular data. It’s widely used for spreadsheets, data exports, and data analysis.

Reading CSV Files

import csv

# Read CSV file
with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # Each row is a list

# Read with header
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row)  # Each row is a dictionary

Writing CSV Files

import csv

# Write CSV file
data = [
    ['Name', 'Email', 'Age'],
    ['Alice', '[email protected]', 30],
    ['Bob', '[email protected]', 25],
    ['Charlie', '[email protected]', 35]
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

# Write with dictionaries
data_dicts = [
    {'name': 'Alice', 'email': '[email protected]', 'age': 30},
    {'name': 'Bob', 'email': '[email protected]', 'age': 25},
    {'name': 'Charlie', 'email': '[email protected]', 'age': 35}
]

with open('output.csv', 'w', newline='') as file:
    fieldnames = ['name', 'email', 'age']
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data_dicts)

Practical Example: Processing Sales Data

import csv
from collections import defaultdict

def analyze_sales_data(filename):
    """Analyze sales data from CSV file"""
    sales_by_region = defaultdict(float)
    sales_by_product = defaultdict(int)
    total_sales = 0
    
    try:
        with open(filename, 'r') as file:
            reader = csv.DictReader(file)
            
            for row in reader:
                region = row['region']
                product = row['product']
                amount = float(row['amount'])
                
                sales_by_region[region] += amount
                sales_by_product[product] += 1
                total_sales += amount
        
        return {
            'total_sales': total_sales,
            'by_region': dict(sales_by_region),
            'by_product': dict(sales_by_product)
        }
    
    except FileNotFoundError:
        print(f"File {filename} not found")
        return None
    except (KeyError, ValueError) as e:
        print(f"Error processing data: {e}")
        return None

# Example: Create sample CSV
sample_data = [
    ['region', 'product', 'amount'],
    ['North', 'Widget A', '1000'],
    ['North', 'Widget B', '1500'],
    ['South', 'Widget A', '800'],
    ['South', 'Widget C', '2000'],
    ['East', 'Widget B', '1200']
]

with open('sales.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(sample_data)

# Analyze the data
results = analyze_sales_data('sales.csv')
if results:
    print(f"Total Sales: ${results['total_sales']}")
    print(f"By Region: {results['by_region']}")
    print(f"By Product: {results['by_product']}")

Handling Different Delimiters

import csv

# Tab-separated values
with open('data.tsv', 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        print(row)

# Semicolon-separated (common in Europe)
with open('data.csv', 'r') as file:
    reader = csv.reader(file, delimiter=';')
    for row in reader:
        print(row)

# Custom delimiter
with open('data.txt', 'r') as file:
    reader = csv.reader(file, delimiter='|')
    for row in reader:
        print(row)

CSV Best Practices

Always use newline='': When opening CSV files for writing
Use DictReader/DictWriter: For better readability with headers
Handle encoding: Specify encoding if needed: open(file, encoding='utf-8')
Quote fields with special characters: Use quoting=csv.QUOTE_MINIMAL
Validate data types: Convert strings to appropriate types (int, float, etc.)

Part 3: Working with XML

What is XML?

XML (eXtensible Markup Language) is a hierarchical format for structured data. It’s self-describing and widely used in enterprise systems and configuration files.

Reading XML Files

import xml.etree.ElementTree as ET

# Parse XML file
tree = ET.parse('data.xml')
root = tree.getroot()

# Access elements
for child in root:
    print(child.tag, child.attrib, child.text)

# Find specific elements
users = root.findall('user')
for user in users:
    name = user.find('name').text
    email = user.find('email').text
    print(f"{name}: {email}")

Writing XML Files

import xml.etree.ElementTree as ET

# Create XML structure
root = ET.Element('users')

# Add users
for user_data in [
    {'id': 1, 'name': 'Alice', 'email': '[email protected]'},
    {'id': 2, 'name': 'Bob', 'email': '[email protected]'}
]:
    user = ET.SubElement(root, 'user', id=str(user_data['id']))
    
    name = ET.SubElement(user, 'name')
    name.text = user_data['name']
    
    email = ET.SubElement(user, 'email')
    email.text = user_data['email']

# Write to file
tree = ET.ElementTree(root)
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

# Or get XML string
xml_string = ET.tostring(root, encoding='unicode')
print(xml_string)

Parsing XML Strings

import xml.etree.ElementTree as ET

xml_string = '''
<response>
    <status>success</status>
    <data>
        <user id="1">
            <name>Alice</name>
            <email>[email protected]</email>
        </user>
        <user id="2">
            <name>Bob</name>
            <email>[email protected]</email>
        </user>
    </data>
</response>
'''

# Parse XML string
root = ET.fromstring(xml_string)

# Extract data
status = root.find('status').text
users = root.findall('.//user')

for user in users:
    user_id = user.get('id')
    name = user.find('name').text
    email = user.find('email').text
    print(f"User {user_id}: {name} ({email})")

Practical Example: Processing Configuration File

import xml.etree.ElementTree as ET

def load_config(filename):
    """Load configuration from XML file"""
    try:
        tree = ET.parse(filename)
        root = tree.getroot()
        
        config = {}
        
        # Parse database settings
        db = root.find('database')
        if db is not None:
            config['database'] = {
                'host': db.find('host').text,
                'port': int(db.find('port').text),
                'name': db.find('name').text,
                'user': db.find('user').text
            }
        
        # Parse server settings
        server = root.find('server')
        if server is not None:
            config['server'] = {
                'host': server.find('host').text,
                'port': int(server.find('port').text),
                'debug': server.find('debug').text.lower() == 'true'
            }
        
        # Parse features
        features = root.find('features')
        if features is not None:
            config['features'] = {}
            for feature in features.findall('feature'):
                name = feature.get('name')
                enabled = feature.text.lower() == 'true'
                config['features'][name] = enabled
        
        return config
    
    except ET.ParseError as e:
        print(f"XML parsing error: {e}")
        return None
    except (AttributeError, ValueError) as e:
        print(f"Configuration error: {e}")
        return None

# Example: Create sample XML config
config_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<config>
    <database>
        <host>localhost</host>
        <port>5432</port>
        <name>myapp</name>
        <user>admin</user>
    </database>
    <server>
        <host>0.0.0.0</host>
        <port>8000</port>
        <debug>true</debug>
    </server>
    <features>
        <feature name="caching">true</feature>
        <feature name="logging">true</feature>
        <feature name="analytics">false</feature>
    </features>
</config>
'''

with open('config.xml', 'w') as f:
    f.write(config_xml)

# Load and display config
config = load_config('config.xml')
if config:
    print("Database:", config['database'])
    print("Server:", config['server'])
    print("Features:", config['features'])

XPath Queries

import xml.etree.ElementTree as ET

xml_string = '''
<library>
    <book id="1">
        <title>Python Basics</title>
        <author>John Doe</author>
        <year>2020</year>
    </book>
    <book id="2">
        <title>Advanced Python</title>
        <author>Jane Smith</author>
        <year>2022</year>
    </book>
</library>
'''

root = ET.fromstring(xml_string)

# Find all books
all_books = root.findall('.//book')
print(f"Total books: {len(all_books)}")

# Find books by year
for book in root.findall(".//book[year='2022']"):
    title = book.find('title').text
    print(f"Book from 2022: {title}")

# Get book by ID
book_1 = root.find(".//book[@id='1']")
if book_1 is not None:
    title = book_1.find('title').text
    print(f"Book 1: {title}")

XML Best Practices

Use ElementTree: It’s part of the standard library and sufficient for most use cases
Handle namespaces: Use namespace-aware parsing for complex XML
Validate structure: Check for required elements before accessing
Use try-except: Handle parsing errors gracefully
Consider lxml: For advanced XML processing and validation
Pretty print: Format XML output for readability

Comparison: JSON vs CSV vs XML

Feature	JSON	CSV	XML
Readability	High	High	Medium
Nested Data	Excellent	Poor	Excellent
Tabular Data	Awkward	Excellent	Awkward
File Size	Small	Small	Large
Parsing Speed	Fast	Fast	Slower
Schema Support	No	No	Yes (XSD)
Best For	APIs, Config	Spreadsheets, Data	Documents, Config
Learning Curve	Easy	Easy	Medium

Choosing the Right Format

Use JSON when:

Building APIs or web services
Storing configuration files
Working with nested data structures
You need human-readable format with small file size

Use CSV when:

Working with tabular data
Exporting/importing spreadsheets
Data analysis and reporting
You need simplicity and wide compatibility

Use XML when:

Working with complex hierarchical data
You need schema validation
Building enterprise systems
You need self-describing data with metadata

Common Pitfalls and Solutions

JSON Pitfalls

import json

# ❌ Pitfall: Trying to serialize non-serializable objects
data = {'date': datetime.now()}  # datetime not JSON serializable
# json.dumps(data)  # Raises TypeError

# ✓ Solution: Use custom encoder or convert to string
data = {'date': datetime.now().isoformat()}
json.dumps(data)

CSV Pitfalls

import csv

# ❌ Pitfall: Forgetting newline='' parameter
# with open('file.csv', 'w') as f:  # Wrong!
#     writer = csv.writer(f)

# ✓ Solution: Always use newline=''
with open('file.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Email'])

XML Pitfalls

import xml.etree.ElementTree as ET

# ❌ Pitfall: Not handling namespaces
# root.findall('user')  # Won't find elements with namespace

# ✓ Solution: Use namespace-aware queries
namespaces = {'ns': 'http://example.com/schema'}
root.findall('ns:user', namespaces)

Conclusion

JSON, CSV, and XML are essential data formats for modern Python development. Each has strengths and appropriate use cases:

JSON is ideal for APIs and configuration with its simplicity and readability
CSV excels at tabular data and spreadsheet integration
XML provides structure and validation for complex hierarchical data

Key takeaways:

Choose the right format for your specific use case
Always validate data when parsing from external sources
Handle errors gracefully with try-except blocks
Use standard library modules for basic needs
Consider performance for large datasets
Document your data structures for maintainability

By mastering these formats, you’ll be equipped to handle data in virtually any Python project. Start with the format that matches your current needs, and expand your skills as you encounter new requirements.