What are OpenCL devices?
Belong to platforms, represent GPUs or other accelerators
What are OpenCL contexts?
Coordinate interaction between host and device, one per device
What are command queues?
Request work execution on device, usually one per device
What is simpleOpenContext_GPU()?
Helper function that finds first GPU and creates context/queue
What is device memory allocation?
Use clCreateBuffer() to allocate GPU memory
What are memory flags?
CL_MEM_READ_ONLY, CL_MEM_WRITE_ONLY, CL_MEM_READ_WRITE, CL_MEM_COPY_HOST_PTR
What is a kernel?
Function executed on GPU, preceded by __kernel
What is get_global_id(0)?
Returns global thread index for current work item
What is kernel compilation?
OpenCL kernels compiled at runtime using clCreateProgramWithSource()
What is compileKernelFromFile()?
Helper function that reads, compiles, and creates kernel from file
What are kernel arguments?
Set with clSetKernelArg() before enqueueing
What is clEnqueueNDRangeKernel()?
Launches kernel with specified work dimensions and sizes
What are work items?
Basic unit of GPU execution, maps to hardware thread
What is work group size?
Number of work items per work group, affects performance
What is NDRange?
N-dimensional range of all work items in kernel launch
What is the hierarchy: work items and work groups?
Work items in work groups, communication only within groups
What are local indices?
get_local_id() gives position within work group
What are global indices?
get_global_id() gives position in entire NDRange
What is the advantage of local memory?
Faster than global, shared within work group
What is memory coalescing?
GPU optimization where adjacent threads access adjacent memory
What is the copy pattern: host to device?
clEnqueueWriteBuffer() or CL_MEM_COPY_HOST_PTR
What is the copy pattern: device to host?
clEnqueueReadBuffer() with CL_TRUE for blocking
What is the typical GPU program flow?
Allocate memory → copy to GPU → launch kernel → copy results back