Create a tensor from Python data
import torch x = torch.tensor([1, 2, 3])
Create common tensors (zeros/ones/rand)
x0 = torch.zeros(3, 4) x1 = torch.ones(3, 4) xr = torch.rand(3, 4)
Create tensor like another tensor (shape/dtype/device)
y = torch.zeros_like(x) z = torch.randn_like(x.float())
Set dtype explicitly
x = torch.tensor([1,2,3], dtype=torch.float32)
Move tensor to device (CPU/GPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = x.to(device)Change dtype (cast)
x = x.to(torch.float16) # or x = x.float()
Tensor shape/size
x.shape x.size()
Reshape (view/reshape)
y = x.reshape(2, -1) # view requires contiguous sometimes y = x.view(2, -1)
Add/remove dimensions
y = x.unsqueeze(0) z = y.squeeze(0)
z = y.squeeze()
Flatten tensor
y = torch.flatten(x, start_dim=1)
Permute dimensions (reorder axes)
y = x.permute(0, 2, 1)
Transpose last two dims
y = x.transpose(-1, -2)
Concatenate tensors
y = torch.cat([a, b], dim=0)
Stack tensors (new dimension)
y = torch.stack([a, b], dim=0)
Indexing & slicing
y = x[0, :3] z = x[:, -1]
Boolean masking
mask = x > 0 y = x[mask]
Cap values to a range
y = torch.clamp(x, min=0.0, max=1.0)
Where (conditional select)
y = torch.where(x > 0, x, torch.zeros_like(x))
Argmax / Top-k
pred = logits.argmax(dim=1) vals, idx = logits.topk(k=5, dim=1)
Softmax / LogSoftmax
p = torch.softmax(logits, dim=1) logp = torch.log_softmax(logits, dim=1)
Basic math operations
y = a + b y = a * b y = torch.matmul(a, b)
Broadcasting (concept)
# shapes like (B,1,D) + (B,T,D) broadcast on dim=1 y = a + b
Random seed for reproducibility
import torch torch.manual_seed(0)
Disable gradient tracking (inference)
with torch.no_grad():
y = model(x)