• Docs >
  • CUDA Memory Operators
Shortcuts

CUDA Memory Operators

Tensor new_managed_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)

Allocate an at::Tensor with unified managed memory (UVM). Then set its preferred storage location to CPU (host memory) and establish mappings on the CUDA device to the host memory.

Parameters:
  • self – The input tensor

  • sizes – The target tensor dimensions

Returns:

A new tensor backed by UVM

Tensor new_managed_tensor_meta(const Tensor &self, const std::vector<std::int64_t> &sizes)

Placeholder operator for the Meta dispatch key.

Parameters:
  • self – The input tensor

  • sizes – The target tensor dimensions

Returns:

A new empty tensor

Tensor new_host_mapped_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)

Allocate the at::Tensor with host-mapped memory.

Parameters:
  • self – The input tensor

  • sizes – The target tensor dimensions

Returns:

A new tensor backed by host-mapped memory

Tensor new_unified_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes, bool is_host_mapped)

Allocate the at::Tensor with either unified managed memory (UVM) or host-mapped memory.

Parameters:
  • self – The input tensor

  • sizes – The target tensor dimensions

  • is_host_mapped – Whether to allocate UVM or host-mapped memory

Returns:

A new tensor backed by UVM or host-mapped memory, depending on the value of is_host_mapped

Tensor new_vanilla_managed_tensor(const Tensor &self, const std::vector<std::int64_t> &sizes)

Allocate an at::Tensor with unified managed memory (UVM), but allow for its preferred storage location to be automatically managed.

Parameters:
  • self – The input tensor

  • sizes – The target tensor dimensions

Returns:

A new tensor backed by UVM

bool uvm_storage(const Tensor &self)

Check if a tensor is allocated with UVM (either CPU or GPU tensor).

Parameters:

self – The input tensor

Returns:

true if the tensor is allocated with UVM, otherwise false

bool is_uvm_tensor(const Tensor &self)

Check if a tensor is allocated with UVM, BUT is not a CPU tensor.

Parameters:

self – The input tensor

Returns:

true if the tensor is a non-CPU tensor allocated with UVM, otherwise false

Tensor uvm_to_cpu(const Tensor &self)

Convert a UVM tensor to a CPU tensor.

Parameters:

self – The input tensor

Returns:

A new tensor that is effectively the input moved from UVM to CPU

Tensor uvm_to_device(const Tensor &self, const Tensor &prototype)

Create a new UVM tensor that shares the same device and UVM storage with prototype.

Parameters:
  • self – The input tensor

  • prototype – The target tensor whose device and and UVM storage will be shared with the new tensor

Returns:

A new tensor that shares the same device and UVM storage with prototype.

void uvm_cuda_mem_advise(const Tensor &self, int64_t cuda_memory_advise)

Call cudaMemAdvise() on a UVM tensor’s storage. The cudaMemoryAdvise enum is available on the Python side in the fbgemm_gpu.uvm namespace; see the documentation over there for valid values.

See also

See here for more information on the cudaMemoryAdvise enum.

Parameters:
  • self – The input tensor

  • cuda_memory_advise – The cudaMemoryAdvise enum value, as integer

void uvm_cuda_mem_prefetch_async(const Tensor &self, std::optional<Tensor> device_t)

Call cudaMemPrefetchAsync() on a UVM tensor’s storage to prefetch memory to a destination device.

See also

See here for more information on cudaMemPrefetchAsync().

Parameters:
  • self – The input tensor

  • device_t[OPTIONAL] The tensor whose device will be the prefetch destination

void uvm_mem_advice_dont_fork(const Tensor &self)

Call madvise(...MADV_DONTFORK) on a UVM tensor’s storage. This is a workaround for an issue where the UVM kernel driver un-maps UVM storage pages from the page table on fork, causing slowdown on the next access from a CPU.

See also

See here for more information on madvise().

Parameters:

self – The input tensor

Tensor uvm_to_cpu_clone(const Tensor &self)

Copy a UVM tensor’s contiguous storage (uvm_storage(t) is true) into a new CPU Tensor. The copy operation uses single-threaded memcpy().

Parameters:

self – The input tensor

Returns:

A new CPU tensor containing the data copied from the UVM tensor

Docs

Access comprehensive developer documentation for PyTorch

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources