Use when optimizing Swift code performance, reducing memory usage, improving runtime efficiency, dealing with COW, ARC overhead, generics specialization, or collection optimization
This skill inherits all available tools. When active, it can use any tool Claude has access to.
Core Principle: Optimize Swift code by understanding language-level performance characteristics—value semantics, ARC behavior, generic specialization, and memory layout—to write fast, efficient code without premature micro-optimization.
Swift Version: Swift 6.0+ (Swift 6.2 for @concurrent)
Xcode: 16+
Platforms: iOS 18+, macOS 15+
Related Skills:
performance-profiling — Use Instruments to measure (do this first!)swiftui-performance — SwiftUI-specific optimizationsbuild-performance — Compilation speedswift-concurrency — Correctness-focused concurrency patternsperformance-profiling first to measureswiftui-performance skill insteadbuild-performance skill insteadPerformance issue identified?
│
├─ Profiler shows excessive copying?
│ └─ → Part 1: Noncopyable Types
│ └─ → Part 2: Copy-on-Write
│
├─ Retain/release overhead in Time Profiler?
│ └─ → Part 4: ARC Optimization
│
├─ Generic code in hot path?
│ └─ → Part 5: Generics & Specialization
│
├─ Collection operations slow?
│ └─ → Part 7: Collection Performance
│
├─ Async/await overhead visible?
│ └─ → Part 8: Concurrency Performance
│
├─ Struct vs class decision?
│ └─ → Part 3: Value vs Reference
│
└─ Memory layout concerns?
└─ → Part 9: Memory Layout
Swift 6.0+ introduces noncopyable types for performance-critical scenarios where you want to avoid implicit copies.
// Noncopyable type
struct FileHandle: ~Copyable {
private let fd: Int32
init(path: String) throws {
self.fd = open(path, O_RDONLY)
guard fd != -1 else { throw FileError.openFailed }
}
deinit {
close(fd)
}
// Must explicitly consume
consuming func close() {
_ = consume self
}
}
// Usage
func processFile() throws {
let handle = try FileHandle(path: "/data.txt")
// handle is automatically consumed at end of scope
// Cannot accidentally copy handle
}
// consuming - takes ownership, caller cannot use after
func process(consuming data: [UInt8]) {
// data is consumed
}
// borrowing - temporary access without ownership
func validate(borrowing data: [UInt8]) -> Bool {
// data can still be used by caller
return data.count > 0
}
// inout - mutable access
func modify(inout data: [UInt8]) {
data.append(0)
}
Swift collections use COW for efficient memory sharing. Understanding when copies happen is critical for performance.
var array1 = [1, 2, 3] // Single allocation
var array2 = array1 // Share storage (no copy)
array2.append(4) // Now copies (array1 modified array2)
final class Storage<T> {
var data: [T]
init(_ data: [T]) { self.data = data }
}
struct COWArray<T> {
private var storage: Storage<T>
init(_ data: [T]) {
self.storage = Storage(data)
}
// COW check before mutation
private mutating func ensureUnique() {
if !isKnownUniquelyReferenced(&storage) {
storage = Storage(storage.data)
}
}
mutating func append(_ element: T) {
ensureUnique() // Copy if shared
storage.data.append(element)
}
subscript(index: Int) -> T {
get { storage.data[index] }
set {
ensureUnique() // Copy before mutation
storage.data[index] = newValue
}
}
}
// ❌ Accidental copy in loop
for i in 0..<array.count {
array[i] = transform(array[i]) // Copy on first mutation if shared!
}
// ✅ Reserve capacity first (ensures unique)
array.reserveCapacity(array.count)
for i in 0..<array.count {
array[i] = transform(array[i])
}
// ❌ Multiple mutations trigger multiple uniqueness checks
array.append(1)
array.append(2)
array.append(3)
// ✅ Single reservation
array.reserveCapacity(array.count + 3)
array.append(contentsOf: [1, 2, 3])
Choosing between struct and class has significant performance implications.
| Factor | Use Struct | Use Class |
|---|---|---|
| Size | ≤ 64 bytes | > 64 bytes or contains large data |
| Identity | No identity needed | Needs identity (===) |
| Inheritance | Not needed | Inheritance required |
| Mutation | Infrequent | Frequent in-place updates |
| Sharing | No sharing needed | Must be shared across scope |
// ✅ Fast - fits in registers, no heap allocation
struct Point {
var x: Double // 8 bytes
var y: Double // 8 bytes
} // Total: 16 bytes - excellent for struct
struct Color {
var r, g, b, a: UInt8 // 4 bytes total - perfect for struct
}
// ❌ Slow - excessive copying
struct HugeData {
var buffer: [UInt8] // 1MB
var metadata: String
}
func process(_ data: HugeData) { // Copies 1MB!
// ...
}
// ✅ Use reference semantics for large data
final class HugeData {
var buffer: [UInt8]
var metadata: String
}
func process(_ data: HugeData) { // Only copies pointer (8 bytes)
// ...
}
// Best of both worlds
struct LargeDataWrapper {
private final class Storage {
var largeBuffer: [UInt8]
init(_ buffer: [UInt8]) { self.largeBuffer = buffer }
}
private var storage: Storage
init(buffer: [UInt8]) {
self.storage = Storage(buffer)
}
// Value semantics externally, reference internally
var buffer: [UInt8] {
get { storage.largeBuffer }
set {
if !isKnownUniquelyReferenced(&storage) {
storage = Storage(newValue)
} else {
storage.largeBuffer = newValue
}
}
}
}
Automatic Reference Counting adds overhead. Minimize it where possible.
class Parent {
var child: Child?
}
class Child {
// ❌ Weak adds overhead (optional, thread-safe zeroing)
weak var parent: Parent?
}
// ✅ Unowned when you know lifetime guarantees
class Child {
unowned let parent: Parent // No overhead, crashes if parent deallocated
}
Performance: unowned is ~2x faster than weak (no atomic operations).
Use when: Child lifetime < Parent lifetime (guaranteed).
class DataProcessor {
var data: [Int]
// ❌ Captures self strongly, then uses weak - unnecessary weak overhead
func process(completion: @escaping () -> Void) {
DispatchQueue.global().async { [weak self] in
guard let self else { return }
self.data.forEach { print($0) }
completion()
}
}
// ✅ Capture only what you need
func process(completion: @escaping () -> Void) {
let data = self.data // Copy value type
DispatchQueue.global().async {
data.forEach { print($0) } // No self captured
completion()
}
}
}
// ❌ Multiple retain/release pairs
for object in objects {
process(object) // retain, release
}
// ✅ Single retain for entire loop
func processAll(_ objects: [MyClass]) {
// Compiler optimizes to single retain/release
for object in objects {
process(object)
}
}
From WWDC 2021-10216: Object lifetimes end at last use, not at closing brace.
// ❌ Relying on observed lifetime is fragile
class Traveler {
weak var account: Account?
deinit {
print("Deinitialized") // May run BEFORE expected with ARC optimizations!
}
}
func test() {
let traveler = Traveler()
let account = Account(traveler: traveler)
// traveler's last use is above - may deallocate here!
account.printSummary() // weak reference may be nil!
}
// ✅ Explicitly extend lifetime when needed
func test() {
let traveler = Traveler()
let account = Account(traveler: traveler)
withExtendedLifetime(traveler) {
account.printSummary() // traveler guaranteed to live
}
}
// Alternative: defer at end of scope
func test() {
let traveler = Traveler()
defer { withExtendedLifetime(traveler) {} }
let account = Account(traveler: traveler)
account.printSummary()
}
Why This Matters: Observed object lifetimes are an emergent property of compiler optimizations and can change between:
Build Setting: Enable "Optimize Object Lifetimes" (Xcode 13+) during development to expose hidden lifetime bugs early.
Generic code can be fast or slow depending on specialization.
// Generic function
func process<T>(_ value: T) {
print(value)
}
// Calling with concrete type
process(42) // Compiler specializes: process_Int(42)
process("hello") // Compiler specializes: process_String("hello")
protocol Drawable {
func draw()
}
// ❌ Existential container - expensive (heap allocation, indirection)
func drawAll(shapes: [any Drawable]) {
for shape in shapes {
shape.draw() // Dynamic dispatch through witness table
}
}
// ✅ Generic with constraint - can specialize
func drawAll<T: Drawable>(shapes: [T]) {
for shape in shapes {
shape.draw() // Static dispatch after specialization
}
}
Performance: Generic version ~10x faster (eliminates witness table overhead).
From WWDC 2016-416: any Protocol uses an existential container with specific performance characteristics.
// Existential Container Memory Layout (64-bit systems)
//
// Small Type (≤24 bytes):
// ┌──────────────────┬──────────────┬────────────────┐
// │ Value (inline) │ Type │ Protocol │
// │ 3 words max │ Metadata │ Witness Table │
// │ (24 bytes) │ (8 bytes) │ (8 bytes) │
// └──────────────────┴──────────────┴────────────────┘
// ↑ No heap allocation - value stored directly
//
// Large Type (>24 bytes):
// ┌──────────────────┬──────────────┬────────────────┐
// │ Heap Pointer → │ Type │ Protocol │
// │ (8 bytes) │ Metadata │ Witness Table │
// │ │ (8 bytes) │ (8 bytes) │
// └──────────────────┴──────────────┴────────────────┘
// ↑ Heap allocation required - pointer to actual value
//
// Total container size: 40 bytes (5 words on 64-bit)
// Threshold: 3 words (24 bytes) determines inline vs heap
// Small type example - stored inline (FAST)
struct Point: Drawable {
var x, y, z: Double // 24 bytes - fits inline!
}
let drawable: any Drawable = Point(x: 1, y: 2, z: 3)
// ✅ Point stored directly in container (no heap allocation)
// Large type example - heap allocated (SLOWER)
struct Rectangle: Drawable {
var x, y, width, height: Double // 32 bytes - exceeds inline buffer
}
let drawable: any Drawable = Rectangle(x: 0, y: 0, width: 10, height: 20)
// ❌ Rectangle allocated on heap, container stores pointer
// Performance comparison:
// - Small existential (≤24 bytes): ~5ns access time
// - Large existential (>24 bytes): ~15ns access time (heap indirection)
// - Generic `some Drawable`: ~2ns access time (no container)
Design Tip: Keep protocol-conforming types ≤24 bytes when used as any Protocol for best performance. Use some Protocol instead of any Protocol when possible to eliminate all container overhead.
@_specialize Attribute// Force specialization for common types
@_specialize(where T == Int)
@_specialize(where T == String)
func process<T: Comparable>(_ value: T) -> T {
// Expensive generic operation
return value
}
// Compiler generates:
// - func process_Int(_ value: Int) -> Int
// - func process_String(_ value: String) -> String
// - Generic fallback for other types
any vs some// ❌ any - existential, runtime overhead
func makeDrawable() -> any Drawable {
return Circle() // Heap allocation
}
// ✅ some - opaque type, compile-time type
func makeDrawable() -> some Drawable {
return Circle() // No overhead, type known at compile time
}
Inlining eliminates function call overhead but increases code size.
// ✅ Small, frequently called functions
@inlinable
public func fastAdd(_ a: Int, _ b: Int) -> Int {
return a + b
}
// ❌ Large functions - code bloat
@inlinable // Don't do this!
public func complexAlgorithm() {
// 100 lines of code...
}
// Framework code
public struct Point {
public var x: Double
public var y: Double
// ✅ Inlinable for cross-module optimization
@inlinable
public func distance(to other: Point) -> Double {
let dx = x - other.x
let dy = y - other.y
return sqrt(dx*dx + dy*dy)
}
}
// Client code
let p1 = Point(x: 0, y: 0)
let p2 = Point(x: 3, y: 4)
let d = p1.distance(to: p2) // Inlined across module boundary
@usableFromInline// Internal helper that can be inlined
@usableFromInline
internal func helperFunction() { }
// Public API that uses it
@inlinable
public func publicAPI() {
helperFunction() // Can inline internal function
}
Trade-off: @inlinable exposes implementation, prevents future optimization.
Choosing the right collection and using it correctly matters.
// ❌ Array<T> - may use NSArray bridging (Swift/ObjC interop)
let array: Array<Int> = [1, 2, 3]
// ✅ ContiguousArray<T> - guaranteed contiguous memory (no bridging)
let array: ContiguousArray<Int> = [1, 2, 3]
Use ContiguousArray when: No ObjC bridging needed (pure Swift), ~15% faster.
// ❌ Multiple reallocations
var array: [Int] = []
for i in 0..<10000 {
array.append(i) // Reallocates ~14 times
}
// ✅ Single allocation
var array: [Int] = []
array.reserveCapacity(10000)
for i in 0..<10000 {
array.append(i) // No reallocations
}
struct BadKey: Hashable {
var data: [Int]
// ❌ Expensive hash (iterates entire array)
func hash(into hasher: inout Hasher) {
for element in data {
hasher.combine(element)
}
}
}
struct GoodKey: Hashable {
var id: UUID // Fast hash
var data: [Int] // Not hashed
// ✅ Hash only the unique identifier
func hash(into hasher: inout Hasher) {
hasher.combine(id)
}
}
// ❌ Eager evaluation - processes entire array
let result = array
.map { expensive($0) }
.filter { $0 > 0 }
.first // Only need first element!
// ✅ Lazy evaluation - stops at first match
let result = array
.lazy
.map { expensive($0) }
.filter { $0 > 0 }
.first // Only evaluates until first match
Async/await and actors add overhead. Use appropriately.
actor Counter {
private var value = 0
// ❌ Actor call overhead for simple operation
func increment() {
value += 1
}
}
// Calling from different isolation domain
for _ in 0..<10000 {
await counter.increment() // 10,000 actor hops!
}
// ✅ Batch operations to reduce actor overhead
actor Counter {
private var value = 0
func incrementBatch(_ count: Int) {
value += count
}
}
await counter.incrementBatch(10000) // Single actor hop
// async/await overhead: ~20-30μs per suspension point
// ❌ Unnecessary async for fast synchronous operation
func compute() async -> Int {
return 42 // Instant, but pays async overhead
}
// ✅ Keep synchronous operations synchronous
func compute() -> Int {
return 42 // No overhead
}
// ❌ Creating task per item (~100μs overhead each)
for item in items {
Task {
await process(item)
}
}
// ✅ Single task for batch
Task {
for item in items {
await process(item)
}
}
// ✅ Or use TaskGroup for parallelism
await withTaskGroup(of: Void.self) { group in
for item in items {
group.addTask {
await process(item)
}
}
}
@concurrent Attribute (Swift 6.2)// Force background execution
@concurrent
func expensiveComputation() -> Int {
// Always runs on background thread, even if called from MainActor
return complexCalculation()
}
// Safe to call from main actor without blocking
@MainActor
func updateUI() async {
let result = await expensiveComputation() // Guaranteed off main thread
label.text = "\(result)"
}
actor DataStore {
private var data: [Int] = []
// ❌ Isolated - actor overhead even for read-only
func getCount() -> Int {
data.count
}
// ✅ nonisolated for immutable state
nonisolated var storedCount: Int {
// Must be immutable
return data.count // Error: cannot access isolated property
}
}
Understanding memory layout helps optimize cache performance and reduce allocations.
// ❌ Poor layout (24 bytes due to padding)
struct BadLayout {
var a: Bool // 1 byte + 7 padding
var b: Int64 // 8 bytes
var c: Bool // 1 byte + 7 padding
}
print(MemoryLayout<BadLayout>.size) // 24 bytes
// ✅ Optimized layout (16 bytes)
struct GoodLayout {
var b: Int64 // 8 bytes
var a: Bool // 1 byte
var c: Bool // 1 byte + 6 padding
}
print(MemoryLayout<GoodLayout>.size) // 16 bytes
// Query alignment
print(MemoryLayout<Double>.alignment) // 8
print(MemoryLayout<Int32>.alignment) // 4
// Structs align to largest member
struct Mixed {
var int32: Int32 // 4 bytes, 4-byte aligned
var double: Double // 8 bytes, 8-byte aligned
}
print(MemoryLayout<Mixed>.alignment) // 8 (largest member)
// ❌ Poor cache locality
struct PointerBased {
var next: UnsafeMutablePointer<Node>? // Pointer chasing
}
// ✅ Array-based for cache locality
struct ArrayBased {
var data: ContiguousArray<Int> // Contiguous memory
}
// Array iteration ~10x faster due to cache prefetching
Typed throws can be faster than untyped by avoiding existential overhead.
// Untyped - existential container for error
func fetchData() throws -> Data {
// Can throw any Error
throw NetworkError.timeout
}
// Typed - concrete error type
func fetchData() throws(NetworkError) -> Data {
// Can only throw NetworkError
throw NetworkError.timeout
}
// Measure with tight loop
func untypedThrows() throws -> Int {
throw GenericError.failed
}
func typedThrows() throws(GenericError) -> Int {
throw GenericError.failed
}
// Benchmark: typed ~5-10% faster (no existential overhead)
final class Storage<T> {
var value: T
init(_ value: T) { self.value = value }
}
struct COWWrapper<T> {
private var storage: Storage<T>
init(_ value: T) {
storage = Storage(value)
}
var value: T {
get { storage.value }
set {
if !isKnownUniquelyReferenced(&storage) {
storage = Storage(newValue)
} else {
storage.value = newValue
}
}
}
}
func processLargeArray(_ input: [Int]) -> [Int] {
var result = ContiguousArray<Int>()
result.reserveCapacity(input.count)
for element in input {
result.append(transform(element))
}
return Array(result)
}
private var cache: [Key: Value] = [:]
@inlinable
func getCached(_ key: Key) -> Value? {
return cache[key] // Inlined across modules
}
// Don't optimize without measuring first!
// ❌ Complex optimization with no measurement
struct OverEngineered {
@usableFromInline var data: ContiguousArray<UInt8>
// 100 lines of COW logic...
}
// ✅ Start simple, measure, then optimize
struct Simple {
var data: [UInt8]
}
// Profile → Optimize if needed
class Manager {
// ❌ Unnecessary weak reference overhead
weak var delegate: Delegate?
weak var dataSource: DataSource?
weak var observer: Observer?
}
// ✅ Use unowned when lifetime is guaranteed
class Manager {
unowned let delegate: Delegate // Delegate outlives Manager
weak var dataSource: DataSource? // Optional, may be nil
}
// ❌ Actor overhead for simple synchronous data
actor SimpleCounter {
private var count = 0
func increment() {
count += 1
}
}
// ✅ Use lock-free atomics or @unchecked Sendable
import Atomics
struct AtomicCounter: @unchecked Sendable {
private let count = ManagedAtomic<Int>(0)
func increment() {
count.wrappingIncrement(ordering: .relaxed)
}
}
isKnownUniquelyReferenced before mutationreserveCapacity when size is knownsome instead of any where possible@_specializeContiguousArray over Arrayhash(into:) implementationsasync@concurrent (Swift 6.2)The Pressure: Manager sees "slow" in profiler, demands immediate action.
Red Flags:
Time Cost Comparison:
How to Push Back Professionally:
"I want to optimize effectively. Let me spend 30 minutes with Instruments
to find the actual bottleneck. This prevents wasting time on code that's
not the problem. I've seen this save days of work."
The Pressure: Team adopts Swift 6, decides "everything should be an actor."
Red Flags:
Time Cost Comparison:
How to Push Back Professionally:
"Actors are great for isolation, but they add overhead. For this simple
counter, lock-free atomics are 10x faster. Let's use actors where we need
them—shared mutable state—and avoid them for pure value types."
The Pressure: Someone reads that inlining is faster, marks everything @inlinable.
Red Flags:
@inlinableTime Cost Comparison:
How to Push Back Professionally:
"Inlining trades code size for speed. The compiler already inlines when
beneficial. Manual @inlinable should be for small, frequently called
functions. Let's profile and inline the 3 actual hotspots, not everything."
Problem: Processing 1000 images takes 30 seconds.
Investigation:
// Original code
func processImages(_ images: [UIImage]) -> [ProcessedImage] {
var results: [ProcessedImage] = []
for image in images {
results.append(expensiveProcess(image)) // Reallocations!
}
return results
}
Solution:
func processImages(_ images: [UIImage]) -> [ProcessedImage] {
var results = ContiguousArray<ProcessedImage>()
results.reserveCapacity(images.count) // Single allocation
for image in images {
results.append(expensiveProcess(image))
}
return Array(results)
}
Result: 30s → 8s (73% faster) by eliminating reallocations.
Problem: Actor counter in tight loop causes UI jank.
Investigation:
// Original - 10,000 actor hops
for _ in 0..<10000 {
await counter.increment() // ~100μs each = 1 second total!
}
Solution:
// Batch operations
actor Counter {
private var value = 0
func incrementBatch(_ count: Int) {
value += count
}
}
await counter.incrementBatch(10000) // Single actor hop
Result: 1000ms → 0.1ms (10,000x faster) by batching.
Problem: Protocol-based rendering is slow.
Investigation:
// Original - existential overhead
func render(shapes: [any Shape]) {
for shape in shapes {
shape.draw() // Dynamic dispatch
}
}
Solution:
// Specialized generic
func render<S: Shape>(shapes: [S]) {
for shape in shapes {
shape.draw() // Static dispatch after specialization
}
}
// Or use @_specialize
@_specialize(where S == Circle)
@_specialize(where S == Rectangle)
func render<S: Shape>(shapes: [S]) { }
Result: 100ms → 10ms (10x faster) by eliminating witness table overhead.
| Session | Title | Key Topics |
|---|---|---|
| WWDC 2024-10217 | Explore Swift performance | Function calls, memory allocation, layout, copying |
| WWDC 2016-416 | Understanding Swift Performance | Value vs reference, protocol witness tables, COW |
| WWDC 2021-10216 | ARC in Swift: Basics and beyond | Object lifetimes, weak/unowned, withExtendedLifetime |
| WWDC 2024-10170 | Consume noncopyable types in Swift | ~Copyable, ownership, consuming/borrowing |
performance-profiling — Instruments workflowsswift-concurrency — Correctness-focused concurrencyswiftui-performance — SwiftUI-specific optimizationsLast Updated: 2025-12-18
Swift Version: 6.0+ (6.2 for @concurrent)
Status: Production-ready
Remember: Profile first, optimize later. Readability > micro-optimizations.