Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLPack overhaul #5661

Merged
merged 9 commits into from
Oct 10, 2024
Merged

DLPack overhaul #5661

merged 9 commits into from
Oct 10, 2024

Conversation

mzient
Copy link
Contributor
@mzient mzient commented Oct 7, 2024

Category:

New feature (non-breaking change which adds functionality)
Refactoring (Redesign of existing code that doesn't affect functionality)

Description:

This PR reworks the DLPack support in DALI.
New features:

  • shared buffer ownership (not just views)
  • add pinned memory support
    Refactoring:
  • remove DLTensorResource inheritance
  • unify JAX DLTensorResource and other resources

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Existing tests: ExternalSource, PythonFunction, JAX, dali_test.bin:DL
New tests: buffer sharing (reference counting test).

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: N/A

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19098946]: BUILD STARTED

@banasraf banasraf self-assigned this Oct 7, 2024
if (dl_type.code < std::size(code_str))
ss << code_str[dl_type.code];
else
ss << "<unknown" << dl_type.code + 0 << ">";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ss << "<unknown" << dl_type.code + 0 << ">";
ss << "<unknown:" << dl_type.code + 0 << ">";

*
* The text representation looks like:
* <type><bits>[x<lanes>]
* with <lanes>x present only if the number of lanes is > 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* with <lanes>x present only if the number of lanes is > 1
* with x<lanes> present only if the number of lanes is > 1

* +-- ...
* ```
*
* You can use any payload structure of your choice, but it must privde the storage for DLTensor's
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* You can use any payload structure of your choice, but it must privde the storage for DLTensor's
* You can use any payload structure of your choice, but it must provide the storage for DLTensor's

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19098946]: BUILD PASSED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
- Remove inheritance.
- Make more stuff inline.
- Make function names consistent.
- Add DLPack resources which share tensor ownership
- Add reference counting tests.
- Add CPU pinned memory support.

TODO(michalz): Unify with JAX.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19136647]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19136647]: BUILD PASSED

@@ -19,18 +19,18 @@

namespace dali {

DLDataType GetDLType(DALIDataType type) {
DLDataType ToDLType(DALIDataType type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a side note, the TYPE_SWITCH here seems like an overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but then again it's almost certainly faster than using TypeTable.

Comment on lines 287 to 296
template <typename Backend>
std::vector<DLMTensorPtr> GetSharedDLTensorList(TensorList<Backend> &tensor_list) {
int device_id = tensor_list.device_id();
bool pinned = tensor_list.is_pinned();

std::vector<DLMTensorPtr> dl_tensors{};
dl_tensors.reserve(tensor_list.num_samples());

for (int i = 0; i < tensor_list.num_samples(); ++i)
dl_tensors.push_back(GetDLTensorView(tensor_list[i], pinned, device_id));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a copy-paste for GetDLTensorListView. It probably should use unsafe_sample_owner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19179834]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [19179834]: BUILD PASSED

@mzient mzient merged commit 7a51e09 into NVIDIA:main Oct 10, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants