Descriptors and metaclasses are among Python’s most powerful and misunderstood features. They’re the mechanisms that power frameworks like Django, SQLAlchemy, and Pydantic. Understanding them transforms you from someone who uses these frameworks to someone who can build similar abstractions.
This guide demystifies these advanced concepts by building from fundamentals to real-world applications.
Part 1: Descriptors
What Are Descriptors?
A descriptor is an object that implements the descriptor protocolโa set of special methods that control how attributes are accessed. When you access an attribute on an object, Python’s attribute lookup mechanism checks if that attribute is a descriptor and, if so, calls its special methods.
The descriptor protocol consists of three methods:
class Descriptor:
def __get__(self, obj, objtype=None):
"""Called when the attribute is accessed"""
pass
def __set__(self, obj, value):
"""Called when the attribute is assigned"""
pass
def __delete__(self, obj):
"""Called when the attribute is deleted"""
pass
An object is a descriptor if it implements at least one of these methods. If it implements __set__ or __delete__, it’s a data descriptor. If it only implements __get__, it’s a non-data descriptor.
Why Descriptors Exist
Descriptors solve a fundamental problem: how do you intercept attribute access? Without descriptors, you’d need to override __getattr__ and __setattr__ on every class, which is inefficient and error-prone.
Descriptors are used throughout Python:
propertyis a descriptorstaticmethodandclassmethodare descriptors- Methods are descriptors
- Django model fields are descriptors
The Descriptor Protocol in Detail
Understanding __get__
The __get__ method is called when an attribute is accessed. It receives three parameters:
def __get__(self, obj, objtype=None):
"""
obj: The instance through which the descriptor was accessed (None if accessed on class)
objtype: The type of the instance
"""
class Descriptor:
def __get__(self, obj, objtype=None):
print(f"__get__ called: obj={obj}, objtype={objtype}")
return "descriptor value"
class MyClass:
attr = Descriptor()
# Accessing through instance
instance = MyClass()
print(instance.attr) # __get__ called: obj=<MyClass object>, objtype=<class 'MyClass'>
# Accessing through class
print(MyClass.attr) # __get__ called: obj=None, objtype=<class 'MyClass'>
Understanding __set__
The __set__ method is called when an attribute is assigned. It receives the instance and the value being assigned:
class ValidatedDescriptor:
def __init__(self, name):
self.name = name
def __get__(self, obj, objtype=None):
if obj is None:
return self
return obj.__dict__.get(self.name, None)
def __set__(self, obj, value):
if not isinstance(value, int):
raise TypeError(f"{self.name} must be an integer")
obj.__dict__[self.name] = value
class Person:
age = ValidatedDescriptor('age')
def __init__(self, name, age):
self.name = name
self.age = age # Calls __set__
person = Person("Alice", 30)
print(person.age) # Calls __get__
person.age = 31 # Valid
person.age = "thirty" # Raises TypeError
Understanding __delete__
The __delete__ method is called when an attribute is deleted:
class DeletableDescriptor:
def __init__(self, name):
self.name = name
def __get__(self, obj, objtype=None):
if obj is None:
return self
return obj.__dict__.get(self.name)
def __set__(self, obj, value):
obj.__dict__[self.name] = value
def __delete__(self, obj):
print(f"Deleting {self.name}")
del obj.__dict__[self.name]
class Document:
content = DeletableDescriptor('content')
doc = Document()
doc.content = "Hello"
print(doc.content) # Hello
del doc.content # Deleting content
print(doc.content) # None
Data Descriptors vs Non-Data Descriptors
The distinction matters for attribute lookup order:
Data descriptors (implement __set__ or __delete__) take precedence over instance attributes.
Non-data descriptors (only implement __get__) are overridden by instance attributes.
class DataDescriptor:
def __get__(self, obj, objtype=None):
return "data descriptor"
def __set__(self, obj, value):
pass
class NonDataDescriptor:
def __get__(self, obj, objtype=None):
return "non-data descriptor"
class Example:
data_desc = DataDescriptor()
non_data_desc = NonDataDescriptor()
example = Example()
# Data descriptor takes precedence
example.__dict__['data_desc'] = "instance value"
print(example.data_desc) # Output: data descriptor
# Non-data descriptor is overridden
example.__dict__['non_data_desc'] = "instance value"
print(example.non_data_desc) # Output: instance value
Practical Descriptor Examples
Example 1: Lazy Loading
class LazyProperty:
"""Load expensive data only when accessed"""
def __init__(self, func):
self.func = func
self.name = func.__name__
def __get__(self, obj, objtype=None):
if obj is None:
return self
# Compute value and cache it
value = self.func(obj)
# Store in instance dict to avoid calling __get__ again
obj.__dict__[self.name] = value
return value
class DataModel:
def __init__(self, data_file):
self.data_file = data_file
@LazyProperty
def data(self):
"""Load data from file only when accessed"""
print(f"Loading data from {self.data_file}...")
# Simulate expensive operation
return {"records": 1000, "size": "10MB"}
# Usage
model = DataModel("data.csv")
print("Model created")
print(model.data) # Loads data
print(model.data) # Returns cached value (no print statement)
Example 2: Type Validation
class TypedProperty:
"""Enforce type checking on assignment"""
def __init__(self, name, expected_type):
self.name = name
self.expected_type = expected_type
def __get__(self, obj, objtype=None):
if obj is None:
return self
return obj.__dict__.get(self.name)
def __set__(self, obj, value):
if not isinstance(value, self.expected_type):
raise TypeError(
f"{self.name} must be {self.expected_type.__name__}, "
f"got {type(value).__name__}"
)
obj.__dict__[self.name] = value
class Product:
name = TypedProperty('name', str)
price = TypedProperty('price', (int, float))
quantity = TypedProperty('quantity', int)
def __init__(self, name, price, quantity):
self.name = name
self.price = price
self.quantity = quantity
# Usage
product = Product("Laptop", 999.99, 5)
product.price = 899.99 # Valid
product.quantity = "five" # Raises TypeError
Example 3: Computed Properties with Caching
class CachedProperty:
"""Cache computed property value"""
def __init__(self, func):
self.func = func
self.name = func.__name__
def __get__(self, obj, objtype=None):
if obj is None:
return self
# Check cache
cache_attr = f'_cache_{self.name}'
if not hasattr(obj, cache_attr):
# Compute and cache
value = self.func(obj)
setattr(obj, cache_attr, value)
return getattr(obj, cache_attr)
class Rectangle:
def __init__(self, width, height):
self.width = width
self.height = height
@CachedProperty
def area(self):
print("Computing area...")
return self.width * self.height
# Usage
rect = Rectangle(5, 10)
print(rect.area) # Computing area... 50
print(rect.area) # 50 (cached, no print)
Example 4: Bound Methods (How Methods Work)
Methods are actually descriptors. Here’s how Python implements them:
class MethodDescriptor:
"""Simplified version of how methods work"""
def __init__(self, func):
self.func = func
def __get__(self, obj, objtype=None):
if obj is None:
return self.func
# Return a bound method
def bound_method(*args, **kwargs):
return self.func(obj, *args, **kwargs)
return bound_method
class Example:
def method(self):
return "Hello"
# When you access example.method, Python calls __get__
example = Example()
print(example.method) # <bound method>
print(example.method()) # Hello
Real-World Descriptor Application: ORM Field
Here’s how Django and SQLAlchemy use descriptors for ORM fields:
class Field:
"""Base descriptor for ORM fields"""
def __init__(self, column_name=None, required=True):
self.column_name = column_name
self.required = required
self.name = None
def __set_name__(self, owner, name):
"""Called when descriptor is assigned to a class attribute"""
self.name = name
if self.column_name is None:
self.column_name = name
def __get__(self, obj, objtype=None):
if obj is None:
return self
return obj.__dict__.get(self.name)
def __set__(self, obj, value):
if self.required and value is None:
raise ValueError(f"{self.name} is required")
obj.__dict__[self.name] = value
class StringField(Field):
def __set__(self, obj, value):
if value is not None and not isinstance(value, str):
raise TypeError(f"{self.name} must be a string")
super().__set__(obj, value)
class IntegerField(Field):
def __set__(self, obj, value):
if value is not None and not isinstance(value, int):
raise TypeError(f"{self.name} must be an integer")
super().__set__(obj, value)
class Model:
"""Base class for ORM models"""
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
class User(Model):
username = StringField(required=True)
email = StringField(required=True)
age = IntegerField(required=False)
# Usage
user = User(username="alice", email="[email protected]", age=30)
print(user.username) # alice
user.age = "thirty" # Raises TypeError
Part 2: Metaclasses
What Are Metaclasses?
A metaclass is a class whose instances are classes. In Python, type is the default metaclassโit’s the class of all classes.
class MyClass:
pass
print(type(MyClass)) # <class 'type'>
print(type(int)) # <class 'type'>
print(type(str)) # <class 'type'>
# Everything is an instance of type
print(isinstance(MyClass, type)) # True
print(isinstance(int, type)) # True
Understanding type
The type function has two uses:
- Get the type of an object:
type(obj) - Create a class dynamically:
type(name, bases, dict)
# Creating a class dynamically
MyClass = type('MyClass', (), {'x': 10})
instance = MyClass()
print(instance.x) # 10
# Equivalent to:
class MyClass:
x = 10
The three arguments to type are:
name: The class name (string)bases: Tuple of base classesdict: Dictionary of class attributes and methods
def method(self):
return "Hello"
MyClass = type('MyClass', (object,), {
'x': 10,
'method': method
})
instance = MyClass()
print(instance.x) # 10
print(instance.method()) # Hello
Creating Custom Metaclasses
A custom metaclass is a class that inherits from type. It can override methods to customize class creation:
class Meta(type):
def __new__(mcs, name, bases, namespace):
"""Called when a class is created"""
print(f"Creating class {name}")
return super().__new__(mcs, name, bases, namespace)
def __init__(cls, name, bases, namespace):
"""Called after the class is created"""
print(f"Initializing class {name}")
super().__init__(name, bases, namespace)
class MyClass(metaclass=Meta):
pass
# Output:
# Creating class MyClass
# Initializing class MyClass
Metaclass Methods
__new__ vs __init__
__new__creates the class object__init__initializes the class object
class TracingMeta(type):
def __new__(mcs, name, bases, namespace):
print(f"__new__: Creating {name}")
cls = super().__new__(mcs, name, bases, namespace)
return cls
def __init__(cls, name, bases, namespace):
print(f"__init__: Initializing {name}")
super().__init__(name, bases, namespace)
class Example(metaclass=TracingMeta):
pass
# Output:
# __new__: Creating Example
# __init__: Initializing Example
__call__
The __call__ method is invoked when you instantiate a class:
class SingletonMeta(type):
_instances = {}
def __call__(cls, *args, **kwargs):
"""Ensure only one instance exists"""
if cls not in cls._instances:
instance = super().__call__(*args, **kwargs)
cls._instances[cls] = instance
return cls._instances[cls]
class Database(metaclass=SingletonMeta):
def __init__(self):
print("Initializing database")
self.connection = "connected"
# Usage
db1 = Database() # Initializing database
db2 = Database() # No output (returns same instance)
print(db1 is db2) # True
Practical Metaclass Examples
Example 1: Enforcing Class Attributes
class RequiredAttributesMeta(type):
"""Ensure classes define required attributes"""
required_attrs = []
def __new__(mcs, name, bases, namespace):
# Skip check for base classes
if bases:
for attr in mcs.required_attrs:
if attr not in namespace:
raise TypeError(
f"Class {name} must define {attr}"
)
return super().__new__(mcs, name, bases, namespace)
class Plugin(metaclass=RequiredAttributesMeta):
required_attrs = ['name', 'version', 'execute']
class MyPlugin(Plugin):
name = "My Plugin"
version = "1.0"
def execute(self):
pass
# This would raise TypeError:
# class BadPlugin(Plugin):
# name = "Bad Plugin"
# # Missing version and execute
Example 2: Automatic Method Registration
class RegistryMeta(type):
"""Automatically register methods with specific decorator"""
def __new__(mcs, name, bases, namespace):
cls = super().__new__(mcs, name, bases, namespace)
cls._registry = {}
for attr_name, attr_value in namespace.items():
if hasattr(attr_value, '_registered'):
cls._registry[attr_value._registered] = attr_value
return cls
def register(key):
"""Decorator to register a method"""
def decorator(func):
func._registered = key
return func
return decorator
class CommandHandler(metaclass=RegistryMeta):
@register('help')
def handle_help(self):
return "Help command"
@register('status')
def handle_status(self):
return "Status command"
# Usage
handler = CommandHandler()
print(CommandHandler._registry) # {'help': <function>, 'status': <function>}
Example 3: Automatic Property Creation
class AutoPropertyMeta(type):
"""Automatically create properties for private attributes"""
def __new__(mcs, name, bases, namespace):
# Find all private attributes and create properties
new_namespace = {}
for attr_name, attr_value in namespace.items():
new_namespace[attr_name] = attr_value
# Create property for private attributes
if attr_name.startswith('_') and not attr_name.startswith('__'):
public_name = attr_name[1:] # Remove underscore
# Create getter
def make_getter(private_name):
def getter(self):
return getattr(self, private_name)
return getter
# Create setter
def make_setter(private_name):
def setter(self, value):
setattr(self, private_name, value)
return setter
new_namespace[public_name] = property(
make_getter(attr_name),
make_setter(attr_name)
)
return super().__new__(mcs, name, bases, new_namespace)
class Person(metaclass=AutoPropertyMeta):
def __init__(self, name, age):
self._name = name
self._age = age
# Usage
person = Person("Alice", 30)
print(person.name) # Alice (via property)
person.age = 31 # Via property setter
Example 4: Validation Metaclass
class ValidatedMeta(type):
"""Validate class attributes against type hints"""
def __new__(mcs, name, bases, namespace):
cls = super().__new__(mcs, name, bases, namespace)
# Store type hints for validation
cls._validators = {}
if hasattr(cls, '__annotations__'):
cls._validators = cls.__annotations__.copy()
return cls
def __call__(cls, *args, **kwargs):
instance = super().__call__(*args, **kwargs)
# Validate attributes
for attr_name, expected_type in cls._validators.items():
if hasattr(instance, attr_name):
value = getattr(instance, attr_name)
if not isinstance(value, expected_type):
raise TypeError(
f"{attr_name} must be {expected_type.__name__}, "
f"got {type(value).__name__}"
)
return instance
class ValidatedClass(metaclass=ValidatedMeta):
name: str
age: int
def __init__(self, name, age):
self.name = name
self.age = age
# Usage
obj = ValidatedClass("Alice", 30) # Valid
obj = ValidatedClass("Bob", "thirty") # Raises TypeError
Example 5: Singleton Pattern
class SingletonMeta(type):
"""Metaclass for singleton pattern"""
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Logger(metaclass=SingletonMeta):
def __init__(self):
self.logs = []
def log(self, message):
self.logs.append(message)
# Usage
logger1 = Logger()
logger2 = Logger()
print(logger1 is logger2) # True
logger1.log("Message 1")
print(logger2.logs) # ["Message 1"]
Real-World Metaclass Application: ORM Model
Here’s how Django models use metaclasses:
class ModelMeta(type):
"""Metaclass for ORM models"""
def __new__(mcs, name, bases, namespace):
# Collect fields
fields = {}
for key, value in list(namespace.items()):
if isinstance(value, Field):
fields[key] = value
value.name = key
namespace['_fields'] = fields
cls = super().__new__(mcs, name, bases, namespace)
return cls
class Field:
def __init__(self, field_type, required=True):
self.field_type = field_type
self.required = required
self.name = None
def __get__(self, obj, objtype=None):
if obj is None:
return self
return obj.__dict__.get(self.name)
def __set__(self, obj, value):
if self.required and value is None:
raise ValueError(f"{self.name} is required")
if value is not None and not isinstance(value, self.field_type):
raise TypeError(f"{self.name} must be {self.field_type.__name__}")
obj.__dict__[self.name] = value
class Model(metaclass=ModelMeta):
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
def save(self):
print(f"Saving {self.__class__.__name__}")
for field_name, field in self._fields.items():
value = getattr(self, field_name)
print(f" {field_name}: {value}")
class User(Model):
username = Field(str, required=True)
email = Field(str, required=True)
age = Field(int, required=False)
# Usage
user = User(username="alice", email="[email protected]", age=30)
user.save()
# Output:
# Saving User
# username: alice
# email: [email protected]
# age: 30
Combining Descriptors and Metaclasses
The most powerful applications combine both concepts:
class ValidatedField:
"""Descriptor for validated fields"""
def __init__(self, field_type, required=True):
self.field_type = field_type
self.required = required
self.name = None
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, objtype=None):
if obj is None:
return self
return obj.__dict__.get(self.name)
def __set__(self, obj, value):
if self.required and value is None:
raise ValueError(f"{self.name} is required")
if value is not None and not isinstance(value, self.field_type):
raise TypeError(
f"{self.name} must be {self.field_type.__name__}, "
f"got {type(value).__name__}"
)
obj.__dict__[self.name] = value
class ModelMeta(type):
"""Metaclass that collects fields and provides introspection"""
def __new__(mcs, name, bases, namespace):
fields = {}
for key, value in namespace.items():
if isinstance(value, ValidatedField):
fields[key] = value
namespace['_fields'] = fields
cls = super().__new__(mcs, name, bases, namespace)
return cls
class Model(metaclass=ModelMeta):
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
def to_dict(self):
"""Convert model to dictionary"""
return {
name: getattr(self, name)
for name in self._fields
}
def validate(self):
"""Validate all fields"""
for name, field in self._fields.items():
value = getattr(self, name)
if field.required and value is None:
raise ValueError(f"{name} is required")
class Product(Model):
name = ValidatedField(str, required=True)
price = ValidatedField((int, float), required=True)
description = ValidatedField(str, required=False)
# Usage
product = Product(name="Laptop", price=999.99, description="High-end laptop")
print(product.to_dict())
# Output: {'name': 'Laptop', 'price': 999.99, 'description': 'High-end laptop'}
product.validate() # Passes
When to Use Descriptors and Metaclasses
Use Descriptors When:
- You need to intercept attribute access
- You want to validate data on assignment
- You need computed properties
- You’re implementing lazy loading
- You’re building an ORM or validation framework
Use Metaclasses When:
- You need to customize class creation
- You want to enforce class structure
- You need to register classes automatically
- You’re implementing design patterns like Singleton
- You’re building frameworks that need introspection
Avoid When:
- A simple property would suffice
- You’re trying to be clever without clear benefit
- The code becomes harder to understand
- There’s a simpler alternative
Common Pitfalls
Pitfall 1: Overcomplicating with Metaclasses
# โ Wrong: Using metaclass when not needed
class Meta(type):
def __new__(mcs, name, bases, namespace):
# Unnecessary complexity
return super().__new__(mcs, name, bases, namespace)
class MyClass(metaclass=Meta):
pass
# โ Correct: Use only when necessary
class MyClass:
pass
Pitfall 2: Forgetting __set_name__
# โ Wrong: Descriptor doesn't know its name
class Descriptor:
def __get__(self, obj, objtype=None):
return obj.__dict__.get('???') # What's the name?
# โ Correct: Use __set_name__
class Descriptor:
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, objtype=None):
return obj.__dict__.get(self.name)
Pitfall 3: Metaclass Conflicts
# โ Wrong: Conflicting metaclasses
class Meta1(type):
pass
class Meta2(type):
pass
class Base(metaclass=Meta1):
pass
# This raises TypeError: metaclass conflict
# class Derived(Base, metaclass=Meta2):
# pass
# โ Correct: Create a unified metaclass
class UnifiedMeta(Meta1, Meta2):
pass
class Derived(Base, metaclass=UnifiedMeta):
pass
Conclusion
Descriptors and metaclasses are powerful tools that enable Python’s most sophisticated frameworks:
Descriptors control attribute access through the descriptor protocol. They’re used for validation, lazy loading, computed properties, and implementing framework features like ORM fields.
Metaclasses customize class creation and behavior. They’re used for enforcing class structure, automatic registration, design patterns, and framework introspection.
Understanding these concepts transforms your ability to:
- Read and understand framework code
- Build your own abstractions and frameworks
- Debug complex attribute access issues
- Implement sophisticated design patterns
Start with descriptorsโthey’re more commonly needed and easier to understand. Use metaclasses only when you need to customize class creation itself. Remember: with great power comes great responsibility. Use these features judiciously, and always prioritize code clarity over cleverness.
Comments