axiom-networking-diag by charleswiltgen/axiom
npx skills add https://github.com/charleswiltgen/axiom --skill axiom-networking-diag核心原则 85% 的网络问题源于对连接状态的误解、未处理网络切换或错误处理不当——而非 Network.framework 的缺陷。
Network.framework 已在每个 iOS 应用中经过实战检验(内部为 URLSession 提供支持),每日处理数万亿请求,并通过 Happy Eyeballs、代理评估和 WiFi 辅助功能提供智能连接建立。如果你的连接失败、超时或行为异常,问题几乎总是出在你使用框架的方式上,而非框架本身。
本技能提供系统性诊断方法,可在数分钟内而非数小时内定位根本原因。
如果你遇到以下任何情况,请怀疑是网络配置错误,而非框架故障:
关键区别 模拟器使用 macOS 网络栈(非 iOS),隐藏了蜂窝网络特定问题(仅 IPv6 网络),并且不模拟网络切换。强制要求:在真实网络条件下的真实设备上进行测试。
在更改代码之前,务必先运行这些命令:
// 1. 启用 Network.framework 日志记录
// 添加到 Xcode 方案:Product → Scheme → Edit Scheme → Arguments
// -NWLoggingEnabled 1
// -NWConnectionLoggingEnabled 1
// 2. 检查连接状态历史
connection.stateUpdateHandler = { state in
print("\(Date()): Connection state: \(state)")
// 记录带时间戳的每次状态转换
}
// 3. 检查 TLS 配置
// 如果使用自定义 TLS 参数:
print("TLS version: \(tlsParameters.minimumTLSProtocolVersion)")
print("Cipher suites: \(tlsParameters.tlsCipherSuites ?? [])")
// 4. 使用数据包捕获进行测试(Charles Proxy 或 Wireshark)
// 在设备上:Settings → WiFi → (i) → Configure Proxy → Manual
// Charles:Help → SSL Proxying → Install Charles Root Certificate on iOS
// 5. 在不同网络上测试
// - WiFi
// - 蜂窝网络(禁用 WiFi)
// - 飞行模式 → WiFi(测试等待状态)
// - VPN 激活状态
// - 仅 IPv6(某些蜂窝运营商)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 观察现象 | 诊断 | 下一步 |
|---|---|---|
| 卡在 .preparing 状态超过 5 秒 | DNS 故障或网络不可用 | 模式 1a |
| 立即转为 .waiting 状态 | 无网络连接(飞行模式、无信号) | 模式 1b |
| .failed 并伴随 POSIX 错误 61 | 连接被拒绝(服务器未监听) | 模式 1c |
| .failed 并伴随 POSIX 错误 50 | 网络不可用(接口禁用) | 模式 1d |
| .ready 然后立即 .failed | TLS 握手失败 | 模式 2b |
| .ready,发送成功,但无数据到达 | 帧结构问题或接收方未处理 | 模式 3a |
| WiFi 工作,蜂窝网络失败 | 仅 IPv6 网络(硬编码 IPv4) | 模式 5a |
| 无 VPN 工作,有 VPN 失败 | 代理干扰或 DNS 覆盖 | 模式 5b |
在更改任何代码之前,先识别以下情况之一:
使用此树在 2 分钟内找到正确的诊断模式:
Network problem?
├─ Connection never reaches .ready?
│ ├─ Stuck in .preparing for >5 seconds?
│ │ ├─ DNS lookup timing out? → Pattern 1a (DNS Failure)
│ │ ├─ Network available but can't reach host? → Pattern 1c (Connection Refused)
│ │ └─ First connection slow, subsequent fast? → Pattern 1e (DNS Caching)
│ │
│ ├─ Moves to .waiting immediately?
│ │ ├─ Airplane Mode or no signal? → Pattern 1b (No Connectivity)
│ │ ├─ Cellular blocked by parameters? → Pattern 1b (Interface Restrictions)
│ │ └─ VPN connecting? → Wait and retry
│ │
│ ├─ .failed with POSIX error 61?
│ │ └─ → Pattern 1c (Connection Refused)
│ │
│ └─ .failed with POSIX error 50?
│ └─ → Pattern 1d (Network Down)
│
├─ Connection reaches .ready, then fails?
│ ├─ Fails immediately after .ready?
│ │ ├─ TLS error -9806? → Pattern 2b (Certificate Validation)
│ │ ├─ TLS error -9801? → Pattern 2b (Protocol Version)
│ │ └─ POSIX error 54? → Pattern 2d (Connection Reset)
│ │
│ ├─ Fails after network change (WiFi → cellular)?
│ │ ├─ No viabilityUpdateHandler? → Pattern 2a (Viability Not Handled)
│ │ ├─ Didn't detect better path? → Pattern 2a (Better Path)
│ │ └─ IPv6 → IPv4 transition? → Pattern 5a (Dual Stack)
│ │
│ ├─ Fails after timeout?
│ │ └─ → Pattern 2c (Receiver Not Responding)
│ │
│ └─ Random disconnects?
│ └─ → Pattern 2d (Network Instability)
│
├─ Data not arriving?
│ ├─ Send succeeds, receive never returns?
│ │ ├─ No message framing? → Pattern 3a (Framing Problem)
│ │ ├─ Wrong byte count? → Pattern 3b (Min/Max Bytes)
│ │ └─ Receiver not calling receive()? → Check receiver code
│ │
│ ├─ Partial data arrives?
│ │ ├─ receive(exactly:) too large? → Pattern 3b (Chunking)
│ │ ├─ Sender closing too early? → Check sender lifecycle
│ │ └─ Buffer overflow? → Pattern 3b (Buffer Management)
│ │
│ ├─ Data corrupted?
│ │ ├─ TLS disabled? → Pattern 3c (No Encryption)
│ │ ├─ Binary vs text encoding? → Check ContentType
│ │ └─ Byte order (endianness)? → Use network byte order
│ │
│ └─ Works sometimes, fails intermittently?
│ └─ → Pattern 3d (Race Condition)
│
├─ Performance degrading?
│ ├─ Latency increasing over time?
│ │ ├─ TCP congestion? → Pattern 4a (Congestion Control)
│ │ ├─ No contentProcessed pacing? → Pattern 4a (Buffering)
│ │ └─ Server overloaded? → Check server metrics
│ │
│ ├─ Throughput decreasing?
│ │ ├─ Network transition WiFi → cellular? → Pattern 4b (Bandwidth Change)
│ │ ├─ Packet loss increasing? → Pattern 4b (Network Quality)
│ │ └─ Multiple streams competing? → Pattern 4b (Prioritization)
│ │
│ ├─ High CPU usage?
│ │ ├─ Not using batch for UDP? → Pattern 4c (Batching)
│ │ ├─ Too many small sends? → Pattern 4c (Coalescing)
│ │ └─ Using sockets instead of Network.framework? → Migrate (30% CPU savings)
│ │
│ └─ Memory growing?
│ ├─ Not releasing connections? → Pattern 4d (Connection Leaks)
│ ├─ Not cancelling on deinit? → Pattern 4d (Lifecycle)
│ └─ Missing [weak self]? → Pattern 4d (Retain Cycles)
│
└─ Works on WiFi, fails on cellular/VPN?
├─ IPv6-only cellular network?
│ ├─ Hardcoded IPv4 address? → Pattern 5a (IPv4 Literal)
│ ├─ getaddrinfo with AF_INET only? → Pattern 5a (Address Family)
│ └─ Works on some carriers, not others? → Pattern 5a (Regional IPv6)
│
├─ Corporate VPN active?
│ ├─ Proxy configuration failing? → Pattern 5b (PAC)
│ ├─ DNS override blocking hostname? → Pattern 5b (DNS)
│ └─ Certificate pinning failing? → Pattern 5b (TLS in VPN)
│
├─ Port blocked by firewall?
│ ├─ Non-standard port? → Pattern 5c (Firewall)
│ ├─ Outbound only? → Pattern 5c (NATing)
│ └─ Works on port 443, not 8080? → Pattern 5c (Port Scanning)
│
├─ Peer-to-peer connection failing?
│ ├─ NAT traversal issue? → Pattern 5d (STUN/TURN)
│ ├─ Symmetric NAT? → Pattern 5d (NAT Type)
│ └─ Local network only? → Pattern 5d (Bonjour/mDNS)
│
└─ URLSession fails but NWConnection works?
├─ HTTP URL blocked? → Pattern 6a (ATS HTTP Block)
├─ "SSL error" on HTTPS? → Pattern 6b (ATS TLS Version)
└─ Works on older iOS? → Pattern 6a/6b (ATS enforcement)
在继续到某个模式之前:
时间成本 10-15 分钟
// 启用 DNS 日志记录
// -NWLoggingEnabled 1
// 手动检查 DNS 解析
// Terminal: nslookup example.com
// Terminal: dig example.com
// 日志显示:
// "DNS lookup timed out"
// "getaddrinfo failed: 8 (nodename nor servname provided)"
// ❌ 错误 —— 添加超时并不能修复 DNS
/*
let parameters = NWParameters.tls
parameters.expiredDNSBehavior = .allow // 如果 DNS 从未解析,这没有帮助
*/
// ✅ 正确 —— 验证主机名,手动测试 DNS
// 1. 手动测试 DNS:
// $ nslookup your-hostname.com
// 如果失败,DNS 就是问题所在(不是你的代码)
// 2. 如果 DNS 手动工作但在应用中不工作:
// 检查 VPN 或企业配置是否阻止了应用的 DNS
// 3. 如果主机名不存在:
let connection = NWConnection(
host: NWEndpoint.Host("correct-hostname.com"), // 修正拼写错误
port: 443,
using: .tls
)
// 4. 如果 DNS 缓存问题(罕见):
// 重启设备以清除 DNS 缓存
// 或者在调查 DNS 服务器问题时临时使用 IP 地址
nslookup your-hostname.com —— 应在 <1 秒内返回 IP时间成本 15-20 分钟
-9806 (kSSLPeerCertInvalid)-9807 (kSSLPeerCertExpired)-9801 (kSSLProtocol)# 使用 openssl 手动测试 TLS
openssl s_client -connect example.com:443 -showcerts
# 检查证书详情
openssl s_client -connect example.com:443 | openssl x509 -noout -dates
# notBefore: Jan 1 00:00:00 2024 GMT
# notAfter: Dec 31 23:59:59 2024 GMT ← 检查是否过期
# 检查证书链
openssl s_client -connect example.com:443 -showcerts | grep "CN="
# 应显示:Subject CN=example.com, Issuer CN=Trusted CA
// ❌ 错误 —— 切勿在生产环境中禁用证书验证
/*
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(tlsOptions.securityProtocolOptions, { ... }, .main)
// 这会禁用验证 → 安全漏洞
*/
// ✅ 正确 —— 在服务器上修复证书
// 1. 续订过期证书(Let's Encrypt、DigiCert 等)
// 2. 确保主机名匹配(CN=example.com 或 SAN 包含 example.com)
// 3. 在服务器上包含中间 CA 证书
// 4. 使用以下命令测试:openssl s_client -connect example.com:443
// ⚠️ 仅适用于开发/暂存环境
#if DEBUG
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(
tlsOptions.securityProtocolOptions,
{ (sec_protocol_metadata, sec_trust, sec_protocol_verify_complete) in
// 信任任何证书(仅限开发)
sec_protocol_verify_complete(true)
},
.main
)
let parameters = NWParameters(tls: tlsOptions)
let connection = NWConnection(host: "dev-server.example.com", port: 443, using: parameters)
#endif
// 生产级证书固定
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(
tlsOptions.securityProtocolOptions,
{ (metadata, trust, complete) in
let trust = sec_protocol_metadata_copy_peer_public_key(metadata)
// 将 trust 与固定的证书进行比较
let pinnedCertificateData = Data(/* your cert */)
let serverCertificateData = SecCertificateCopyData(trust) as Data
if serverCertificateData == pinnedCertificateData {
complete(true)
} else {
complete(false) // 拒绝非固定的证书
}
},
.main
)
openssl s_client -connect example.com:443 显示 Verify return code: 0 (ok)时间成本 20-30 分钟
// 启用详细日志记录
connection.send(content: data, completion: .contentProcessed { error in
if let error = error {
print("Send error: \(error)")
} else {
print("✅ Sent \(data.count) bytes at \(Date())")
}
})
connection.receive(minimumIncompleteLength: 1, maximumLength: 65536) { data, context, isComplete, error in
if let error = error {
print("Receive error: \(error)")
} else if let data = data {
print("✅ Received \(data.count) bytes at \(Date())")
}
}
// 使用 Charles Proxy 或 Wireshark 验证线路上有字节
常见原因 流协议(TCP/TLS)不保留消息边界。
// 发送方发送 3 条消息:
send("Hello") // 5 字节
send("World") // 5 字节
send("!") // 1 字节
// 接收方可能得到:
receive() → "HelloWorld!" // 一次收到所有 11 字节
// 或者:
receive() → "Hel" // 3 字节
receive() → "loWorld!" // 8 字节
// 消息边界丢失!
// 带有 TLV 的 NetworkConnection
let connection = NetworkConnection(
to: .hostPort(host: "example.com", port: 1029)
) {
TLV {
TLS()
}
}
// 发送类型化消息
enum MessageType: Int {
case chat = 1
case ping = 2
}
let chatData = Data("Hello".utf8)
try await connection.send(chatData, type: MessageType.chat.rawValue)
// 接收类型化消息
let (data, metadata) = try await connection.receive()
if metadata.type == MessageType.chat.rawValue {
print("Chat message: \(String(data: data, encoding: .utf8)!)")
}
// 发送方:在消息前加上 UInt32 长度
func sendMessage(_ message: Data) {
var length = UInt32(message.count).bigEndian
let lengthData = Data(bytes: &length, count: 4)
connection.send(content: lengthData, completion: .contentProcessed { _ in
connection.send(content: message, completion: .contentProcessed { _ in
print("Sent message with length prefix")
})
})
}
// 接收方:读取长度,然后读取消息
func receiveMessage() {
// 1. 读取 4 字节长度
connection.receive(minimumIncompleteLength: 4, maximumLength: 4) { lengthData, _, _, error in
guard let lengthData = lengthData else { return }
let length = lengthData.withUnsafeBytes { $0.load(as: UInt32.self).bigEndian }
// 2. 读取确切长度的消息
connection.receive(minimumIncompleteLength: Int(length), maximumLength: Int(length)) { messageData, _, _, error in
guard let messageData = messageData else { return }
print("Received complete message: \(messageData.count) bytes")
}
}
}
时间成本 15-25 分钟
// 监控发送完成时间
let sendStart = Date()
connection.send(content: data, completion: .contentProcessed { error in
let elapsed = Date().timeIntervalSince(sendStart)
print("Send completed in \(elapsed)s") // 正常情况下应 < 0.1s
// 如果 > 1s,TCP 拥塞或接收方处理速度不够快
})
// 使用 Instruments 进行分析
// Xcode → Product → Profile → Network template
// 检查 "Bytes Sent" 与 "Time" 图表
// 应该是平滑的线,而不是阶梯状/停滞的
// ❌ 错误 —— 没有节奏控制的发送
/*
for frame in videoFrames {
connection.send(content: frame, completion: .contentProcessed { _ in })
// 立即缓冲所有帧 → 内存激增 → 拥塞
}
*/
// ✅ 正确 —— 使用 contentProcessed 回调进行节奏控制
func sendFrameWithPacing() {
guard let nextFrame = getNextFrame() else { return }
connection.send(content: nextFrame, completion: .contentProcessed { [weak self] error in
if let error = error {
print("Send error: \(error)")
return
}
// contentProcessed = 网络栈已消耗帧
// 现在发送下一帧(节奏控制)
self?.sendFrameWithPacing()
})
}
// 开始节奏控制
sendFrameWithPacing()
// 具有自然背压的 NetworkConnection
func sendFrames() async throws {
for frame in videoFrames {
try await connection.send(frame)
// 如果网络跟不上,会自动挂起
// 内置背压,无需手动节奏控制
}
}
时间成本 10-15 分钟
# 检查主机名是否有 IPv6
dig AAAA example.com
# 检查设备是否在仅 IPv6 网络上
# Settings → WiFi/Cellular → (i) → IP Address
# 如果以 "2001:" 或 "fe80:" 开头 → IPv6
# 如果是 "192.168" 或 "10." → IPv4
# 使用仅 IPv6 模拟器进行测试
# Xcode → Devices → (device) → Use as Development Target
# Settings → Developer → Networking → DNS64/NAT64
// ❌ 错误 —— 硬编码 IPv4
/*
let host = "192.168.1.100" // 在仅 IPv6 蜂窝网络上失败
*/
// ❌ 错误 —— 强制使用 IPv4
/*
let parameters = NWParameters.tcp
parameters.requiredInterfaceType = .wifi
parameters.ipOptions.version = .v4 // 在仅 IPv6 网络上失败
*/
// ✅ 正确 —— 使用主机名,让框架处理 IPv4/IPv6
let connection = NWConnection(
host: NWEndpoint.Host("example.com"), // 主机名,非 IP
port: 443,
using: .tls
)
// 框架自动:
// 1. 解析 A(IPv4)和 AAAA(IPv6)记录
// 2. 先尝试 IPv6(如果可用)
// 3. 回退到 IPv4(Happy Eyeballs)
// 4. 在任何网络上工作(IPv4、IPv6、双栈)
dig AAAA your-hostname.com 以验证 IPv6 记录存在你有 1 小时向 CEO 提供:
// 检查 v4.2 中更改了什么
git diff v4.1 v4.2 -- NetworkClient.swift
// 最可能的罪魁祸首:
// - TLS 配置更改
// - 添加了证书固定
// - 更改了连接参数
// - 更新了主机名
// 检查失败模式:
// - 随机的 15%?还是特定的用户群体?
// - 特定的 iOS 版本?(检查分析)
// - 特定的网络?(WiFi 与蜂窝网络)
// 在生产版本上启用日志记录(紧急标志):
#if PRODUCTION
if UserDefaults.standard.bool(forKey: "EnableNetworkLogging") {
// -NWLoggingEnabled 1
}
#endif
// 要求客户支持为受影响的用户启用
// 检查日志中的特定错误代码
// 在 git diff 中发现:
// v4.1:
let parameters = NWParameters.tls
// v4.2:
let tlsOptions = NWProtocolTLS.Options()
tlsOptions.minimumTLSProtocolVersion = .TLSv13 // ← 确凿证据
let parameters = NWParameters(tls: tlsOptions)
已识别的根本原因 某些用户的后端基础设施(负载均衡器、代理服务器)不支持 TLS 1.3。v4.1 协商 TLS 1.2,v4.2 要求 TLS 1.3 → 连接失败。
// 修复:同时支持 TLS 1.2 和 TLS 1.3
let tlsOptions = NWProtocolTLS.Options()
tlsOptions.minimumTLSProtocolVersion = .TLSv12 // ✅ 支持旧的基础设施
// TLS 1.3 仍将在支持的地方使用(自动协商)
let parameters = NWParameters(tls: tlsOptions)
# 构建热修复 v4.2.1
# 在受影响用户的网络上测试(关键!)
# 提交到 App Store 并请求加急审核
# 解释:"影响 15% 用户的生产中断"
已找到根本原因:v4.2 要求 TLS 1.3,但 15% 的用户在仅支持 TLS 1.2 的旧基础设施上
(企业代理、旧的负载均衡器)。
修复:将最低 TLS 版本更改为 1.2(向后兼容,在可用时仍使用 1.3)。
预计完成时间:热修复 v4.2.1 将在 1 小时内进入 App Store(加急审核)。
完全推送给用户:24 小时。
现在缓解:告知受影响的用户在可用时立即更新。
根本原因:v4.2 中更改了 TLS 版本要求(仅 TLS 1.3)。
15% 的用户在不支持 TLS 1.3 的基础设施后面。
技术修复:设置 tlsOptions.minimumTLSProtocolVersion = .TLSv12
这允许向后兼容,同时在支持的地方仍使用 TLS 1.3。
测试:已在用户网络上验证修复(带有旧代理的企业 VPN)。
部署:热修复构建正在进行中,预计 30 分钟后提交。
预防:将 TLS 兼容性测试添加到预发布检查清单。
更新:我们已确定问题,并将在 1 小时内部署修复。
受影响的用户:那些在企业网络或旧 ISP 基础设施上的用户。
临时解决方案:无(网络级别问题)。
预期解决方案:v4.2.1 将在 1 小时内在 App Store 中可用。
要求用户立即更新。
更新:我将每 30 分钟通知您一次。
| 方法 | 解决时间 | 用户影响 |
|---|---|---|
| ❌ 恐慌性回滚 | 1-2 小时应用审核 + 24 小时用户更新 = 26 小时 | 10K 用户停机 26 小时 |
| ❌ "添加更多重试" | 未知(不能解决根本原因) | 永久性的 15% 失败率 |
| ❌ "对我来说工作" | 数天调试错误的东西 | 用户沮丧,差评 |
| ✅ 系统性诊断 | 30 分钟诊断 + 20 分钟修复 + 1 小时审核 = 2 小时 | 10K 用户停机 2 小时 |
Core principle 85% of networking problems stem from misunderstanding connection states, not handling network transitions, or improper error handling—not Network.framework defects.
Network.framework is battle-tested in every iOS app (powers URLSession internally), handles trillions of requests daily, and provides smart connection establishment with Happy Eyeballs, proxy evaluation, and WiFi Assist. If your connection is failing, timing out, or behaving unexpectedly, the issue is almost always in how you're using the framework, not the framework itself.
This skill provides systematic diagnostics to identify root causes in minutes, not hours.
If you see ANY of these, suspect a networking misconfiguration, not framework breakage:
Connection times out after 60 seconds with no clear error
TLS handshake fails with "certificate invalid" on some networks
Data sent but never arrives at receiver
Connection drops when switching WiFi to cellular
Works perfectly on WiFi but fails 100% of time on cellular
Works in simulator but fails on real device
Connection succeeds on your network but fails for users
❌ FORBIDDEN "Network.framework is broken, we should rewrite with sockets"
Critical distinction Simulator uses macOS networking stack (not iOS), hides cellular-specific issues (IPv6-only networks), and doesn't simulate network transitions. MANDATORY: Test on real device with real network conditions.
ALWAYS run these commands FIRST (before changing code):
// 1. Enable Network.framework logging
// Add to Xcode scheme: Product → Scheme → Edit Scheme → Arguments
// -NWLoggingEnabled 1
// -NWConnectionLoggingEnabled 1
// 2. Check connection state history
connection.stateUpdateHandler = { state in
print("\(Date()): Connection state: \(state)")
// Log every state transition with timestamp
}
// 3. Check TLS configuration
// If using custom TLS parameters:
print("TLS version: \(tlsParameters.minimumTLSProtocolVersion)")
print("Cipher suites: \(tlsParameters.tlsCipherSuites ?? [])")
// 4. Test with packet capture (Charles Proxy or Wireshark)
// On device: Settings → WiFi → (i) → Configure Proxy → Manual
// Charles: Help → SSL Proxying → Install Charles Root Certificate on iOS
// 5. Test on different networks
// - WiFi
// - Cellular (disable WiFi)
// - Airplane Mode → WiFi (test waiting state)
// - VPN active
// - IPv6-only (some cellular carriers)
| Observation | Diagnosis | Next Step |
|---|---|---|
| Stuck in .preparing > 5 seconds | DNS failure or network down | Pattern 1a |
| Moves to .waiting immediately | No connectivity (Airplane Mode, no signal) | Pattern 1b |
| .failed with POSIX error 61 | Connection refused (server not listening) | Pattern 1c |
| .failed with POSIX error 50 | Network down (interface disabled) | Pattern 1d |
| .ready then immediate .failed | TLS handshake failure | Pattern 2b |
| .ready, send succeeds, no data arrives | Framing problem or receiver not processing | Pattern 3a |
| Works WiFi, fails cellular | IPv6-only network (hardcoded IPv4) | Pattern 5a |
| Works without VPN, fails with VPN | Proxy interference or DNS override | Pattern 5b |
Before changing ANY code, identify ONE of these:
Use this to reach the correct diagnostic pattern in 2 minutes:
Network problem?
├─ Connection never reaches .ready?
│ ├─ Stuck in .preparing for >5 seconds?
│ │ ├─ DNS lookup timing out? → Pattern 1a (DNS Failure)
│ │ ├─ Network available but can't reach host? → Pattern 1c (Connection Refused)
│ │ └─ First connection slow, subsequent fast? → Pattern 1e (DNS Caching)
│ │
│ ├─ Moves to .waiting immediately?
│ │ ├─ Airplane Mode or no signal? → Pattern 1b (No Connectivity)
│ │ ├─ Cellular blocked by parameters? → Pattern 1b (Interface Restrictions)
│ │ └─ VPN connecting? → Wait and retry
│ │
│ ├─ .failed with POSIX error 61?
│ │ └─ → Pattern 1c (Connection Refused)
│ │
│ └─ .failed with POSIX error 50?
│ └─ → Pattern 1d (Network Down)
│
├─ Connection reaches .ready, then fails?
│ ├─ Fails immediately after .ready?
│ │ ├─ TLS error -9806? → Pattern 2b (Certificate Validation)
│ │ ├─ TLS error -9801? → Pattern 2b (Protocol Version)
│ │ └─ POSIX error 54? → Pattern 2d (Connection Reset)
│ │
│ ├─ Fails after network change (WiFi → cellular)?
│ │ ├─ No viabilityUpdateHandler? → Pattern 2a (Viability Not Handled)
│ │ ├─ Didn't detect better path? → Pattern 2a (Better Path)
│ │ └─ IPv6 → IPv4 transition? → Pattern 5a (Dual Stack)
│ │
│ ├─ Fails after timeout?
│ │ └─ → Pattern 2c (Receiver Not Responding)
│ │
│ └─ Random disconnects?
│ └─ → Pattern 2d (Network Instability)
│
├─ Data not arriving?
│ ├─ Send succeeds, receive never returns?
│ │ ├─ No message framing? → Pattern 3a (Framing Problem)
│ │ ├─ Wrong byte count? → Pattern 3b (Min/Max Bytes)
│ │ └─ Receiver not calling receive()? → Check receiver code
│ │
│ ├─ Partial data arrives?
│ │ ├─ receive(exactly:) too large? → Pattern 3b (Chunking)
│ │ ├─ Sender closing too early? → Check sender lifecycle
│ │ └─ Buffer overflow? → Pattern 3b (Buffer Management)
│ │
│ ├─ Data corrupted?
│ │ ├─ TLS disabled? → Pattern 3c (No Encryption)
│ │ ├─ Binary vs text encoding? → Check ContentType
│ │ └─ Byte order (endianness)? → Use network byte order
│ │
│ └─ Works sometimes, fails intermittently?
│ └─ → Pattern 3d (Race Condition)
│
├─ Performance degrading?
│ ├─ Latency increasing over time?
│ │ ├─ TCP congestion? → Pattern 4a (Congestion Control)
│ │ ├─ No contentProcessed pacing? → Pattern 4a (Buffering)
│ │ └─ Server overloaded? → Check server metrics
│ │
│ ├─ Throughput decreasing?
│ │ ├─ Network transition WiFi → cellular? → Pattern 4b (Bandwidth Change)
│ │ ├─ Packet loss increasing? → Pattern 4b (Network Quality)
│ │ └─ Multiple streams competing? → Pattern 4b (Prioritization)
│ │
│ ├─ High CPU usage?
│ │ ├─ Not using batch for UDP? → Pattern 4c (Batching)
│ │ ├─ Too many small sends? → Pattern 4c (Coalescing)
│ │ └─ Using sockets instead of Network.framework? → Migrate (30% CPU savings)
│ │
│ └─ Memory growing?
│ ├─ Not releasing connections? → Pattern 4d (Connection Leaks)
│ ├─ Not cancelling on deinit? → Pattern 4d (Lifecycle)
│ └─ Missing [weak self]? → Pattern 4d (Retain Cycles)
│
└─ Works on WiFi, fails on cellular/VPN?
├─ IPv6-only cellular network?
│ ├─ Hardcoded IPv4 address? → Pattern 5a (IPv4 Literal)
│ ├─ getaddrinfo with AF_INET only? → Pattern 5a (Address Family)
│ └─ Works on some carriers, not others? → Pattern 5a (Regional IPv6)
│
├─ Corporate VPN active?
│ ├─ Proxy configuration failing? → Pattern 5b (PAC)
│ ├─ DNS override blocking hostname? → Pattern 5b (DNS)
│ └─ Certificate pinning failing? → Pattern 5b (TLS in VPN)
│
├─ Port blocked by firewall?
│ ├─ Non-standard port? → Pattern 5c (Firewall)
│ ├─ Outbound only? → Pattern 5c (NATing)
│ └─ Works on port 443, not 8080? → Pattern 5c (Port Scanning)
│
├─ Peer-to-peer connection failing?
│ ├─ NAT traversal issue? → Pattern 5d (STUN/TURN)
│ ├─ Symmetric NAT? → Pattern 5d (NAT Type)
│ └─ Local network only? → Pattern 5d (Bonjour/mDNS)
│
└─ URLSession fails but NWConnection works?
├─ HTTP URL blocked? → Pattern 6a (ATS HTTP Block)
├─ "SSL error" on HTTPS? → Pattern 6b (ATS TLS Version)
└─ Works on older iOS? → Pattern 6a/6b (ATS enforcement)
Before proceeding to a pattern:
Time cost 10-15 minutes
// Enable DNS logging
// -NWLoggingEnabled 1
// Check DNS resolution manually
// Terminal: nslookup example.com
// Terminal: dig example.com
// Logs show:
// "DNS lookup timed out"
// "getaddrinfo failed: 8 (nodename nor servname provided)"
// ❌ WRONG — Adding timeout doesn't fix DNS
/*
let parameters = NWParameters.tls
parameters.expiredDNSBehavior = .allow // Doesn't help if DNS never resolves
*/
// ✅ CORRECT — Verify hostname, test DNS manually
// 1. Test DNS manually:
// $ nslookup your-hostname.com
// If this fails, DNS is the problem (not your code)
// 2. If DNS works manually but not in app:
// Check if VPN or enterprise config blocking app DNS
// 3. If hostname doesn't exist:
let connection = NWConnection(
host: NWEndpoint.Host("correct-hostname.com"), // Fix typo
port: 443,
using: .tls
)
// 4. If DNS caching issue (rare):
// Restart device to clear DNS cache
// Or use IP address temporarily while investigating DNS server issue
nslookup your-hostname.com — should return IP in <1 secondTime cost 15-20 minutes
-9806 (kSSLPeerCertInvalid)-9807 (kSSLPeerCertExpired)-9801 (kSSLProtocol)# Test TLS manually with openssl
openssl s_client -connect example.com:443 -showcerts
# Check certificate details
openssl s_client -connect example.com:443 | openssl x509 -noout -dates
# notBefore: Jan 1 00:00:00 2024 GMT
# notAfter: Dec 31 23:59:59 2024 GMT ← Check if expired
# Check certificate chain
openssl s_client -connect example.com:443 -showcerts | grep "CN="
# Should show: Subject CN=example.com, Issuer CN=Trusted CA
// ❌ WRONG — Never disable certificate validation in production
/*
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(tlsOptions.securityProtocolOptions, { ... }, .main)
// This disables validation → security vulnerability
*/
// ✅ CORRECT — Fix the certificate on server
// 1. Renew expired certificate (Let's Encrypt, DigiCert, etc.)
// 2. Ensure hostname matches (CN=example.com or SAN includes example.com)
// 3. Include intermediate CA certificates on server
// 4. Test with: openssl s_client -connect example.com:443
// ⚠️ ONLY for development/staging
#if DEBUG
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(
tlsOptions.securityProtocolOptions,
{ (sec_protocol_metadata, sec_trust, sec_protocol_verify_complete) in
// Trust any certificate (DEV ONLY)
sec_protocol_verify_complete(true)
},
.main
)
let parameters = NWParameters(tls: tlsOptions)
let connection = NWConnection(host: "dev-server.example.com", port: 443, using: parameters)
#endif
// Production-grade certificate pinning
let tlsOptions = NWProtocolTLS.Options()
sec_protocol_options_set_verify_block(
tlsOptions.securityProtocolOptions,
{ (metadata, trust, complete) in
let trust = sec_protocol_metadata_copy_peer_public_key(metadata)
// Compare trust with pinned certificate
let pinnedCertificateData = Data(/* your cert */)
let serverCertificateData = SecCertificateCopyData(trust) as Data
if serverCertificateData == pinnedCertificateData {
complete(true)
} else {
complete(false) // Reject non-pinned certificates
}
},
.main
)
openssl s_client -connect example.com:443 shows Verify return code: 0 (ok)Time cost 20-30 minutes
// Enable detailed logging
connection.send(content: data, completion: .contentProcessed { error in
if let error = error {
print("Send error: \(error)")
} else {
print("✅ Sent \(data.count) bytes at \(Date())")
}
})
connection.receive(minimumIncompleteLength: 1, maximumLength: 65536) { data, context, isComplete, error in
if let error = error {
print("Receive error: \(error)")
} else if let data = data {
print("✅ Received \(data.count) bytes at \(Date())")
}
}
// Use Charles Proxy or Wireshark to verify bytes on wire
Common cause Stream protocols (TCP/TLS) don't preserve message boundaries.
// Sender sends 3 messages:
send("Hello") // 5 bytes
send("World") // 5 bytes
send("!") // 1 byte
// Receiver might get:
receive() → "HelloWorld!" // All 11 bytes at once
// Or:
receive() → "Hel" // 3 bytes
receive() → "loWorld!" // 8 bytes
// Message boundaries lost!
// NetworkConnection with TLV
let connection = NetworkConnection(
to: .hostPort(host: "example.com", port: 1029)
) {
TLV {
TLS()
}
}
// Send typed messages
enum MessageType: Int {
case chat = 1
case ping = 2
}
let chatData = Data("Hello".utf8)
try await connection.send(chatData, type: MessageType.chat.rawValue)
// Receive typed messages
let (data, metadata) = try await connection.receive()
if metadata.type == MessageType.chat.rawValue {
print("Chat message: \(String(data: data, encoding: .utf8)!)")
}
// Sender: Prefix message with UInt32 length
func sendMessage(_ message: Data) {
var length = UInt32(message.count).bigEndian
let lengthData = Data(bytes: &length, count: 4)
connection.send(content: lengthData, completion: .contentProcessed { _ in
connection.send(content: message, completion: .contentProcessed { _ in
print("Sent message with length prefix")
})
})
}
// Receiver: Read length, then read message
func receiveMessage() {
// 1. Read 4-byte length
connection.receive(minimumIncompleteLength: 4, maximumLength: 4) { lengthData, _, _, error in
guard let lengthData = lengthData else { return }
let length = lengthData.withUnsafeBytes { $0.load(as: UInt32.self).bigEndian }
// 2. Read message of exact length
connection.receive(minimumIncompleteLength: Int(length), maximumLength: Int(length)) { messageData, _, _, error in
guard let messageData = messageData else { return }
print("Received complete message: \(messageData.count) bytes")
}
}
}
Time cost 15-25 minutes
// Monitor send completion time
let sendStart = Date()
connection.send(content: data, completion: .contentProcessed { error in
let elapsed = Date().timeIntervalSince(sendStart)
print("Send completed in \(elapsed)s") // Should be < 0.1s normally
// If > 1s, TCP congestion or receiver not draining fast enough
})
// Profile with Instruments
// Xcode → Product → Profile → Network template
// Check "Bytes Sent" vs "Time" graph
// Should be smooth line, not stepped/stalled
// ❌ WRONG — Sending without pacing
/*
for frame in videoFrames {
connection.send(content: frame, completion: .contentProcessed { _ in })
// Buffers all frames immediately → memory spike → congestion
}
*/
// ✅ CORRECT — Pace with contentProcessed callback
func sendFrameWithPacing() {
guard let nextFrame = getNextFrame() else { return }
connection.send(content: nextFrame, completion: .contentProcessed { [weak self] error in
if let error = error {
print("Send error: \(error)")
return
}
// contentProcessed = network stack consumed frame
// NOW send next frame (pacing)
self?.sendFrameWithPacing()
})
}
// Start pacing
sendFrameWithPacing()
// NetworkConnection with natural back pressure
func sendFrames() async throws {
for frame in videoFrames {
try await connection.send(frame)
// Suspends automatically if network can't keep up
// Built-in back pressure, no manual pacing needed
}
}
Time cost 10-15 minutes
# Check if hostname has IPv6
dig AAAA example.com
# Check if device is on IPv6-only network
# Settings → WiFi/Cellular → (i) → IP Address
# If starts with "2001:" or "fe80:" → IPv6
# If "192.168" or "10." → IPv4
# Test with IPv6-only simulator
# Xcode → Devices → (device) → Use as Development Target
# Settings → Developer → Networking → DNS64/NAT64
// ❌ WRONG — Hardcoded IPv4
/*
let host = "192.168.1.100" // Fails on IPv6-only cellular
*/
// ❌ WRONG — Forcing IPv4
/*
let parameters = NWParameters.tcp
parameters.requiredInterfaceType = .wifi
parameters.ipOptions.version = .v4 // Fails on IPv6-only
*/
// ✅ CORRECT — Use hostname, let framework handle IPv4/IPv6
let connection = NWConnection(
host: NWEndpoint.Host("example.com"), // Hostname, not IP
port: 443,
using: .tls
)
// Framework automatically:
// 1. Resolves both A (IPv4) and AAAA (IPv6) records
// 2. Tries IPv6 first (if available)
// 3. Falls back to IPv4 (Happy Eyeballs)
// 4. Works on any network (IPv4, IPv6, dual-stack)
dig AAAA your-hostname.com to verify IPv6 record exists"Just roll back to v4.1"
"Disable TLS temporarily to narrow it down"
"It works on my device, must be user error"
"Let's add retry logic and more timeouts"
You have 1 hour to provide CEO with:
// Check what changed in v4.2
git diff v4.1 v4.2 -- NetworkClient.swift
// Most likely culprits:
// - TLS configuration changed
// - Added certificate pinning
// - Changed connection parameters
// - Updated hostname
// Check failure pattern:
// - Random 15%? Or specific user segment?
// - Specific iOS version? (check analytics)
// - Specific network? (WiFi vs cellular)
// Enable logging on production builds (emergency flag):
#if PRODUCTION
if UserDefaults.standard.bool(forKey: "EnableNetworkLogging") {
// -NWLoggingEnabled 1
}
#endif
// Ask Customer Support to enable for affected users
// Check logs for specific error code
// Found in git diff:
// v4.1:
let parameters = NWParameters.tls
// v4.2:
let tlsOptions = NWProtocolTLS.Options()
tlsOptions.minimumTLSProtocolVersion = .TLSv13 // ← SMOKING GUN
let parameters = NWParameters(tls: tlsOptions)
Root Cause Identified Some users' backend infrastructure (load balancers, proxy servers) don't support TLS 1.3. v4.1 negotiated TLS 1.2, v4.2 requires TLS 1.3 → connection fails.
// Fix: Support both TLS 1.2 and TLS 1.3
let tlsOptions = NWProtocolTLS.Options()
tlsOptions.minimumTLSProtocolVersion = .TLSv12 // ✅ Support older infrastructure
// TLS 1.3 will still be used where supported (automatic negotiation)
let parameters = NWParameters(tls: tlsOptions)
# Build hotfix v4.2.1
# Test on affected user's network (critical!)
# Submit to App Store with expedited review request
# Explain: "Production outage affecting 15% of users"
Found root cause: v4.2 requires TLS 1.3, but 15% of users on older infrastructure
(enterprise proxies, older load balancers) that only support TLS 1.2.
Fix: Change minimum TLS version to 1.2 (backward compatible, 1.3 still used when available).
ETA: Hotfix v4.2.1 in App Store in 1 hour (expedited review).
Full rollout to users: 24 hours.
Mitigation now: Telling affected users to update immediately when available.
Root cause: TLS version requirement changed in v4.2 (TLS 1.3 only).
15% of users behind infrastructure that doesn't support TLS 1.3.
Technical fix: Set tlsOptions.minimumTLSProtocolVersion = .TLSv12
This allows backward compatibility while still using TLS 1.3 where supported.
Testing: Verified fix on user's network (enterprise VPN with old proxy).
Deployment: Hotfix build in progress, ETA 30 minutes to submit.
Prevention: Add TLS compatibility testing to pre-release checklist.
Update: We've identified the issue and have a fix deploying within 1 hour.
Affected users: Those on enterprise networks or older ISP infrastructure.
Workaround: None (network level issue).
Expected resolution: v4.2.1 will be available in App Store in 1 hour.
Ask users to update immediately.
Updates: I'll notify you every 30 minutes.
| Approach | Time to Resolution | User Impact |
|---|---|---|
| ❌ Panic rollback | 1-2 hours app review + 24 hours user updates = 26 hours | 10K users down for 26 hours |
| ❌ "Add more retries" | Unknown (doesn't fix root cause) | Permanent 15% failure rate |
| ❌ "Works for me" | Days of debugging wrong thing | Frustrated users, bad reviews |
| ✅ Systematic diagnosis | 30 min diagnosis + 20 min fix + 1 hour review = 2 hours | 10K users down for 2 hours |
| Symptom | Likely Cause | First Check | Pattern | Fix Time |
|---|---|---|---|---|
| Stuck in .preparing | DNS failure | nslookup hostname | 1a | 10-15 min |
| .waiting immediately | No connectivity | Airplane Mode? | 1b | 5 min |
| .failed POSIX 61 | Connection refused | Server listening? | 1c | 5-10 min |
| .failed POSIX 50 | Network down | Check interface | 1d | 5 min |
| TLS error -9806 | Certificate invalid | openssl s_client |
Time cost 5-15 minutes
ATS enforces HTTPS for all connections by default (iOS 9+). ATS failures are silent — connections fail with generic errors, no ATS-specific message in console.
NSURLErrorSecureConnectionFailed (-1200) or NSURLErrorAppTransportSecurityRequiresSecureConnection (-1022)# Check if ATS is blocking the connection
nscurl --ats-diagnostics https://yourserver.com
# Shows exactly which ATS policy the server fails
// In console, look for:
// "App Transport Security has blocked a cleartext HTTP (http://) resource load"
// This only appears if OS-level logging is enabled
<!-- Info.plist — exception for specific domain only -->
<key>NSAppTransportSecurity</key>
<dict>
<key>NSExceptionDomains</key>
<dict>
<key>api.legacy-server.com</key>
<dict>
<key>NSExceptionAllowsInsecureHTTPLoads</key>
<true/>
</dict>
</dict>
</dict>
Do NOT useNSAllowsArbitraryLoads — disables ATS entirely. App Store Review flags this and may reject. Use domain-specific exceptions.
nscurl --ats-diagnostics shows TLS version failure# Check server's TLS version
openssl s_client -connect yourserver.com:443 -tls1_2
# If this fails but -tls1 succeeds → server doesn't support TLS 1.2
<!-- Info.plist — allow TLS 1.0 for specific domain (temporary) -->
<key>NSAppTransportSecurity</key>
<dict>
<key>NSExceptionDomains</key>
<dict>
<key>legacy-api.example.com</key>
<dict>
<key>NSExceptionMinimumTLSVersion</key>
<string>TLSv1.0</string>
</dict>
</dict>
</dict>
Better fix : Upgrade the server to TLS 1.2+. ATS exceptions for TLS downgrade trigger App Store Review scrutiny.
ATS applies to URLSession and WKWebView connections. Network.framework (NWConnection/NetworkConnection) is NOT subject to ATS — it handles TLS configuration directly via tlsOptions. If URLSession fails but NWConnection succeeds for the same server, ATS is almost certainly the cause.
Problem Trying to debug networking issues without seeing framework's internal state.
Why it fails You're guessing what's happening. Logs show exact state transitions, error codes, timing.
// Add to Xcode scheme BEFORE debugging:
// -NWLoggingEnabled 1
// -NWConnectionLoggingEnabled 1
// Or programmatically:
#if DEBUG
ProcessInfo.processInfo.environment["NW_LOGGING_ENABLED"] = "1"
#endif
Problem WiFi and cellular have different characteristics (IPv6-only, proxy configs, packet loss).
Why it fails 40% of connection failures are network-specific. If you only test WiFi, you miss cellular issues.
Problem Seeing .failed(let error) and just showing generic "Connection failed" to user.
Why it fails Different error codes require different fixes. POSIX 61 = server issue, POSIX 50 = client network issue.
if case .failed(let error) = state {
let posixError = (error as NSError).code
switch posixError {
case 61: // ECONNREFUSED
print("Server not listening, check server logs")
case 50: // ENETDOWN
print("Network interface down, check WiFi/cellular")
case 60: // ETIMEDOUT
print("Connection timeout, check firewall/DNS")
default:
print("Connection failed: \(error)")
}
}
Problem Testing only happy path (.preparing → .ready). Not testing .waiting, network changes, failures.
Why it fails Real users experience network transitions (WiFi → cellular), Airplane Mode, weak signal.
// Test with Network Link Conditioner:
// 1. 100% Loss — verify .waiting state shows "Waiting for network"
// 2. WiFi → None → WiFi — verify automatic reconnection
// 3. 3% packet loss — verify performance graceful degradation
Problem Testing only in simulator. Simulator uses macOS networking (different from iOS), no cellular.
Why it fails Simulator hides IPv6-only issues, doesn't simulate network transitions, has different DNS.
networking skill — Discipline-enforcing anti-patterns:
network-framework-ref skill — Complete API documentation:
swift-concurrency skill — If using async/await:
Last Updated 2025-12-02 Status Production-ready diagnostics from WWDC 2018/2025 Tested Diagnostic patterns validated against real production issues
Weekly Installs
136
Repository
GitHub Stars
674
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode120
gemini-cli113
codex113
cursor111
claude-code111
github-copilot110
| 2b |
| 15-20 min |
| Data not received | Framing problem | Packet capture | 3a | 20-30 min |
| Partial data | Min/max bytes wrong | Check receive() params | 3b | 10 min |
| Latency increasing | TCP congestion | contentProcessed pacing | 4a | 15-25 min |
| High CPU | No batching | Use connection.batch | 4c | 10 min |
| Memory growing | Connection leaks | Check [weak self] | 4d | 10-15 min |
| Works WiFi, fails cellular | IPv6-only network | dig AAAA hostname | 5a | 10-15 min |
| Works without VPN, fails with VPN | Proxy interference | Test PAC file | 5b | 20-30 min |
| Port blocked | Firewall | Try 443 vs 8080 | 5c | 10 min |
| HTTP URL blocked silently | ATS enforcement | Check Info.plist | 6a | 5-10 min |
| "An SSL error has occurred" | ATS TLS requirements | Check server TLS version | 6b | 10-15 min |