Skip to main content
โšก Calmops

Working with JSON, CSV, and XML in Python: A Practical Guide

Data comes in many formats. Whether you’re consuming an API, processing spreadsheets, or parsing configuration files, you’ll encounter JSON, CSV, and XML. Understanding how to work with these formats efficiently is essential for any Python developer.

This guide covers all three formats, showing you how to read, write, parse, and manipulate data in each. By the end, you’ll know which format to use for different scenarios and how to handle each one effectively.


Understanding Data Formats

Before diving into code, let’s understand when to use each format:

  • JSON: Lightweight, human-readable, ideal for APIs and configuration. Best for nested data structures.
  • CSV: Simple tabular format, widely supported, perfect for spreadsheets and data analysis.
  • XML: Hierarchical, self-describing, good for complex documents and enterprise systems.

Part 1: Working with JSON

What is JSON?

JSON (JavaScript Object Notation) is a lightweight data format that’s easy for humans to read and for machines to parse. It’s the de facto standard for web APIs and modern data interchange.

Reading JSON Files

import json

# Read JSON from a file
with open('data.json', 'r') as file:
    data = json.load(file)

# Access the data
print(data)
print(data['name'])
print(data['users'][0]['email'])

Writing JSON Files

import json

# Create data structure
data = {
    'name': 'My Application',
    'version': '1.0.0',
    'users': [
        {'id': 1, 'name': 'Alice', 'email': '[email protected]'},
        {'id': 2, 'name': 'Bob', 'email': '[email protected]'}
    ],
    'settings': {
        'debug': True,
        'timeout': 30
    }
}

# Write to file
with open('output.json', 'w') as file:
    json.dump(data, file, indent=2)

# Or get JSON string
json_string = json.dumps(data, indent=2)
print(json_string)

Parsing JSON Strings

import json

# JSON string from API response
json_string = '''
{
    "status": "success",
    "data": {
        "user_id": 123,
        "username": "alice",
        "created_at": "2025-12-16"
    }
}
'''

# Parse JSON string
data = json.loads(json_string)
print(data['data']['username'])  # Output: alice

Practical Example: Processing API Response

import json
from datetime import datetime

def process_api_response(json_response):
    """Process and validate API response"""
    try:
        data = json.loads(json_response)
        
        # Validate required fields
        if 'status' not in data or 'data' not in data:
            raise ValueError("Invalid API response structure")
        
        # Extract and transform data
        users = []
        for user in data['data'].get('users', []):
            users.append({
                'id': user['id'],
                'name': user['name'],
                'email': user['email'],
                'joined': datetime.fromisoformat(user['created_at'])
            })
        
        return {
            'status': data['status'],
            'user_count': len(users),
            'users': users
        }
    
    except json.JSONDecodeError as e:
        print(f"Invalid JSON: {e}")
        return None
    except (KeyError, ValueError) as e:
        print(f"Data validation error: {e}")
        return None

# Example usage
api_response = '''
{
    "status": "success",
    "data": {
        "users": [
            {"id": 1, "name": "Alice", "email": "[email protected]", "created_at": "2025-01-01"},
            {"id": 2, "name": "Bob", "email": "[email protected]", "created_at": "2025-01-02"}
        ]
    }
}
'''

result = process_api_response(api_response)
if result:
    print(f"Processed {result['user_count']} users")

Handling Special Data Types

import json
from datetime import datetime
from decimal import Decimal

# Custom JSON encoder for special types
class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        elif isinstance(obj, Decimal):
            return float(obj)
        elif isinstance(obj, set):
            return list(obj)
        return super().default(obj)

# Data with special types
data = {
    'timestamp': datetime.now(),
    'price': Decimal('19.99'),
    'tags': {'python', 'json', 'tutorial'}
}

# Serialize with custom encoder
json_string = json.dumps(data, cls=CustomEncoder, indent=2)
print(json_string)

JSON Best Practices

  • Validate data: Always check for required fields and correct types
  • Handle errors: Use try-except for JSON parsing
  • Use indent: Make JSON readable with indent=2 or indent=4
  • Be careful with large files: Stream large JSON files instead of loading all at once
  • Use custom encoders: For non-standard types like datetime or Decimal

Part 2: Working with CSV

What is CSV?

CSV (Comma-Separated Values) is a simple format for tabular data. It’s widely used for spreadsheets, data exports, and data analysis.

Reading CSV Files

import csv

# Read CSV file
with open('data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # Each row is a list

# Read with header
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row)  # Each row is a dictionary

Writing CSV Files

import csv

# Write CSV file
data = [
    ['Name', 'Email', 'Age'],
    ['Alice', '[email protected]', 30],
    ['Bob', '[email protected]', 25],
    ['Charlie', '[email protected]', 35]
]

with open('output.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)

# Write with dictionaries
data_dicts = [
    {'name': 'Alice', 'email': '[email protected]', 'age': 30},
    {'name': 'Bob', 'email': '[email protected]', 'age': 25},
    {'name': 'Charlie', 'email': '[email protected]', 'age': 35}
]

with open('output.csv', 'w', newline='') as file:
    fieldnames = ['name', 'email', 'age']
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(data_dicts)

Practical Example: Processing Sales Data

import csv
from collections import defaultdict

def analyze_sales_data(filename):
    """Analyze sales data from CSV file"""
    sales_by_region = defaultdict(float)
    sales_by_product = defaultdict(int)
    total_sales = 0
    
    try:
        with open(filename, 'r') as file:
            reader = csv.DictReader(file)
            
            for row in reader:
                region = row['region']
                product = row['product']
                amount = float(row['amount'])
                
                sales_by_region[region] += amount
                sales_by_product[product] += 1
                total_sales += amount
        
        return {
            'total_sales': total_sales,
            'by_region': dict(sales_by_region),
            'by_product': dict(sales_by_product)
        }
    
    except FileNotFoundError:
        print(f"File {filename} not found")
        return None
    except (KeyError, ValueError) as e:
        print(f"Error processing data: {e}")
        return None

# Example: Create sample CSV
sample_data = [
    ['region', 'product', 'amount'],
    ['North', 'Widget A', '1000'],
    ['North', 'Widget B', '1500'],
    ['South', 'Widget A', '800'],
    ['South', 'Widget C', '2000'],
    ['East', 'Widget B', '1200']
]

with open('sales.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(sample_data)

# Analyze the data
results = analyze_sales_data('sales.csv')
if results:
    print(f"Total Sales: ${results['total_sales']}")
    print(f"By Region: {results['by_region']}")
    print(f"By Product: {results['by_product']}")

Handling Different Delimiters

import csv

# Tab-separated values
with open('data.tsv', 'r') as file:
    reader = csv.reader(file, delimiter='\t')
    for row in reader:
        print(row)

# Semicolon-separated (common in Europe)
with open('data.csv', 'r') as file:
    reader = csv.reader(file, delimiter=';')
    for row in reader:
        print(row)

# Custom delimiter
with open('data.txt', 'r') as file:
    reader = csv.reader(file, delimiter='|')
    for row in reader:
        print(row)

CSV Best Practices

  • Always use newline='': When opening CSV files for writing
  • Use DictReader/DictWriter: For better readability with headers
  • Handle encoding: Specify encoding if needed: open(file, encoding='utf-8')
  • Quote fields with special characters: Use quoting=csv.QUOTE_MINIMAL
  • Validate data types: Convert strings to appropriate types (int, float, etc.)

Part 3: Working with XML

What is XML?

XML (eXtensible Markup Language) is a hierarchical format for structured data. It’s self-describing and widely used in enterprise systems and configuration files.

Reading XML Files

import xml.etree.ElementTree as ET

# Parse XML file
tree = ET.parse('data.xml')
root = tree.getroot()

# Access elements
for child in root:
    print(child.tag, child.attrib, child.text)

# Find specific elements
users = root.findall('user')
for user in users:
    name = user.find('name').text
    email = user.find('email').text
    print(f"{name}: {email}")

Writing XML Files

import xml.etree.ElementTree as ET

# Create XML structure
root = ET.Element('users')

# Add users
for user_data in [
    {'id': 1, 'name': 'Alice', 'email': '[email protected]'},
    {'id': 2, 'name': 'Bob', 'email': '[email protected]'}
]:
    user = ET.SubElement(root, 'user', id=str(user_data['id']))
    
    name = ET.SubElement(user, 'name')
    name.text = user_data['name']
    
    email = ET.SubElement(user, 'email')
    email.text = user_data['email']

# Write to file
tree = ET.ElementTree(root)
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

# Or get XML string
xml_string = ET.tostring(root, encoding='unicode')
print(xml_string)

Parsing XML Strings

import xml.etree.ElementTree as ET

xml_string = '''
<response>
    <status>success</status>
    <data>
        <user id="1">
            <name>Alice</name>
            <email>[email protected]</email>
        </user>
        <user id="2">
            <name>Bob</name>
            <email>[email protected]</email>
        </user>
    </data>
</response>
'''

# Parse XML string
root = ET.fromstring(xml_string)

# Extract data
status = root.find('status').text
users = root.findall('.//user')

for user in users:
    user_id = user.get('id')
    name = user.find('name').text
    email = user.find('email').text
    print(f"User {user_id}: {name} ({email})")

Practical Example: Processing Configuration File

import xml.etree.ElementTree as ET

def load_config(filename):
    """Load configuration from XML file"""
    try:
        tree = ET.parse(filename)
        root = tree.getroot()
        
        config = {}
        
        # Parse database settings
        db = root.find('database')
        if db is not None:
            config['database'] = {
                'host': db.find('host').text,
                'port': int(db.find('port').text),
                'name': db.find('name').text,
                'user': db.find('user').text
            }
        
        # Parse server settings
        server = root.find('server')
        if server is not None:
            config['server'] = {
                'host': server.find('host').text,
                'port': int(server.find('port').text),
                'debug': server.find('debug').text.lower() == 'true'
            }
        
        # Parse features
        features = root.find('features')
        if features is not None:
            config['features'] = {}
            for feature in features.findall('feature'):
                name = feature.get('name')
                enabled = feature.text.lower() == 'true'
                config['features'][name] = enabled
        
        return config
    
    except ET.ParseError as e:
        print(f"XML parsing error: {e}")
        return None
    except (AttributeError, ValueError) as e:
        print(f"Configuration error: {e}")
        return None

# Example: Create sample XML config
config_xml = '''<?xml version="1.0" encoding="UTF-8"?>
<config>
    <database>
        <host>localhost</host>
        <port>5432</port>
        <name>myapp</name>
        <user>admin</user>
    </database>
    <server>
        <host>0.0.0.0</host>
        <port>8000</port>
        <debug>true</debug>
    </server>
    <features>
        <feature name="caching">true</feature>
        <feature name="logging">true</feature>
        <feature name="analytics">false</feature>
    </features>
</config>
'''

with open('config.xml', 'w') as f:
    f.write(config_xml)

# Load and display config
config = load_config('config.xml')
if config:
    print("Database:", config['database'])
    print("Server:", config['server'])
    print("Features:", config['features'])

XPath Queries

import xml.etree.ElementTree as ET

xml_string = '''
<library>
    <book id="1">
        <title>Python Basics</title>
        <author>John Doe</author>
        <year>2020</year>
    </book>
    <book id="2">
        <title>Advanced Python</title>
        <author>Jane Smith</author>
        <year>2022</year>
    </book>
</library>
'''

root = ET.fromstring(xml_string)

# Find all books
all_books = root.findall('.//book')
print(f"Total books: {len(all_books)}")

# Find books by year
for book in root.findall(".//book[year='2022']"):
    title = book.find('title').text
    print(f"Book from 2022: {title}")

# Get book by ID
book_1 = root.find(".//book[@id='1']")
if book_1 is not None:
    title = book_1.find('title').text
    print(f"Book 1: {title}")

XML Best Practices

  • Use ElementTree: It’s part of the standard library and sufficient for most use cases
  • Handle namespaces: Use namespace-aware parsing for complex XML
  • Validate structure: Check for required elements before accessing
  • Use try-except: Handle parsing errors gracefully
  • Consider lxml: For advanced XML processing and validation
  • Pretty print: Format XML output for readability

Comparison: JSON vs CSV vs XML

Feature JSON CSV XML
Readability High High Medium
Nested Data Excellent Poor Excellent
Tabular Data Awkward Excellent Awkward
File Size Small Small Large
Parsing Speed Fast Fast Slower
Schema Support No No Yes (XSD)
Best For APIs, Config Spreadsheets, Data Documents, Config
Learning Curve Easy Easy Medium

Choosing the Right Format

Use JSON when:

  • Building APIs or web services
  • Storing configuration files
  • Working with nested data structures
  • You need human-readable format with small file size

Use CSV when:

  • Working with tabular data
  • Exporting/importing spreadsheets
  • Data analysis and reporting
  • You need simplicity and wide compatibility

Use XML when:

  • Working with complex hierarchical data
  • You need schema validation
  • Building enterprise systems
  • You need self-describing data with metadata

Common Pitfalls and Solutions

JSON Pitfalls

import json

# โŒ Pitfall: Trying to serialize non-serializable objects
data = {'date': datetime.now()}  # datetime not JSON serializable
# json.dumps(data)  # Raises TypeError

# โœ“ Solution: Use custom encoder or convert to string
data = {'date': datetime.now().isoformat()}
json.dumps(data)

CSV Pitfalls

import csv

# โŒ Pitfall: Forgetting newline='' parameter
# with open('file.csv', 'w') as f:  # Wrong!
#     writer = csv.writer(f)

# โœ“ Solution: Always use newline=''
with open('file.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Email'])

XML Pitfalls

import xml.etree.ElementTree as ET

# โŒ Pitfall: Not handling namespaces
# root.findall('user')  # Won't find elements with namespace

# โœ“ Solution: Use namespace-aware queries
namespaces = {'ns': 'http://example.com/schema'}
root.findall('ns:user', namespaces)

Conclusion

JSON, CSV, and XML are essential data formats for modern Python development. Each has strengths and appropriate use cases:

  • JSON is ideal for APIs and configuration with its simplicity and readability
  • CSV excels at tabular data and spreadsheet integration
  • XML provides structure and validation for complex hierarchical data

Key takeaways:

  1. Choose the right format for your specific use case
  2. Always validate data when parsing from external sources
  3. Handle errors gracefully with try-except blocks
  4. Use standard library modules for basic needs
  5. Consider performance for large datasets
  6. Document your data structures for maintainability

By mastering these formats, you’ll be equipped to handle data in virtually any Python project. Start with the format that matches your current needs, and expand your skills as you encounter new requirements.

Comments