GPU Memory & Profiling
- Global vs shared vs local memory; registers and occupancy trade-offs
- Host↔Device transfers, pageable vs pinned memory
- Streams, concurrency, overlapping compute and copies
- Profiling tools: Nsight Systems/Compute; reading timelines and counters