Skip to content

Comments

fix: replace unsafe d2h/h2d MemcpyAsync calls with synchronous Memcpy#107

Closed
kilinchange wants to merge 2 commits intomasterfrom
fix/memcpyasync-host-lifetime
Closed

fix: replace unsafe d2h/h2d MemcpyAsync calls with synchronous Memcpy#107
kilinchange wants to merge 2 commits intomasterfrom
fix/memcpyasync-host-lifetime

Conversation

@kilinchange
Copy link
Collaborator

@kilinchange kilinchange commented Feb 12, 2026

Follow-up to #103 comment: replace unsafe d2h/h2d MemcpyAsync calls with synchronous Memcpy.

@kilinchange kilinchange force-pushed the fix/memcpyasync-host-lifetime branch 2 times, most recently from 82a8322 to 616ad5b Compare February 13, 2026 02:09
@kilinchange
Copy link
Collaborator Author

image

@kilinchange kilinchange force-pushed the fix/memcpyasync-host-lifetime branch from cf5b07f to c35455d Compare February 13, 2026 09:37
CUDA_CHECK(cudaMallocAsync(&device_input_ptrs, sizeof(T *) * num_inputs, stream));
CUDA_CHECK(cudaMemcpyAsync(device_input_ptrs, host_input_ptrs.data(), sizeof(T *) * num_inputs,
cudaMemcpyHostToDevice, stream));
CUDA_CHECK(cudaMemcpy(device_input_ptrs, host_input_ptrs.data(), sizeof(T *) * num_inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果最后加了 stream sync 的话,这里用 Async 应该也可以?

// NOTE(dcj):
// Synchronize the stream here to ensure all preceding H2D/D2H memcpy
// operations have completed before the host buffers go out of scope.
CUDA_CHECK(cudaStreamSynchronize(stream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里 host 内存都还没被释放,这里加了 stream sync 的话,前面 memcpy 都可以保留 async

stream));
CUDA_CHECK(cudaMemcpy(out_dims_dev, idx_dims.data(), num_dims * sizeof(int64_t), cudaMemcpyHostToDevice));
CUDA_CHECK(cudaMemcpy(in_strides_dev, in_strides.data(), num_dims * sizeof(int64_t), cudaMemcpyHostToDevice));
CUDA_CHECK(cudaMemcpy(out_strides_dev, out_strides.data(), num_dims * sizeof(int64_t), cudaMemcpyHostToDevice));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块应该也是,可以保留 async

@kilinchange kilinchange force-pushed the fix/memcpyasync-host-lifetime branch 3 times, most recently from 40ce151 to 03a6c57 Compare February 24, 2026 07:54
@kilinchange kilinchange force-pushed the fix/memcpyasync-host-lifetime branch from 03a6c57 to d569171 Compare February 24, 2026 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants