Don't transfer input ArrayBuffers #566

reillyeon · 2024-02-15T02:10:31Z

Right now the input and output ArrayBuffers passed to MLContext's compute() method are transferred so that their contents cannot be modified by script during the computation. It seems appropriate that an ArrayBuffer passed to MLGraphBuilder's constant() method should be treated the same way, though it would be better if the transfer were delayed until build() is called so that multiple ArrayBufferViews using the same backing ArrayBuffer can be passed as constants. build() would construct a list of unique backing ArrayBuffers to transfer rather than the naive approach of simply iterating over the ArrayBufferViews themselves.

I'll note that transferring ArrayBuffer parameters makes an API harder to call from WebAssembly, but I think that any argument for why compute inputs and outputs are transferred should also apply to constants.

The text was updated successfully, but these errors were encountered:

huningxin · 2024-02-21T08:31:24Z

@reillyeon

but I think that any argument for why compute inputs and outputs are transferred should also apply to constants.

Agreed.

I'll note that transferring ArrayBuffer parameters makes an API harder to call from WebAssembly

This is true.

Should we consider copying the content of input (and constant) ArrayBuffer instead? Notice that the Chromium implementation actually copies the input and constant ArrayBuffers for XNNPACK backend and DirectML (mojo) backend. It means for each input ArrayBuffer, WebAssembly path may need two copies, one is done by JS that copies from Wasm heap to an ArrayBuffer that can be transferred, another one is done by Chromium implementation that copies from ArrayBuffer to internal storage.

reillyeon · 2024-02-29T01:41:13Z

Given the constraints on implementations based on the underlying platform APIs I think it's reasonable to define constant and input handling such that MLGraphBuilder and compute() copy any provided ArrayBuffers. If developers want to avoid a copy they can use MLBuffers instead.

In theory Chromium's XNNPACK backend could avoid copies but I'm also planning to delete it in favor of a more flexible TFLite backend. The zero-copy option only works for CPU backends anyways (without MLBuffer).

reillyeon · 2024-02-29T01:42:04Z

Renamed this issue given my current position on how to resolve the inconsistency between handling of constant and input ArrayBuffers.

- Adds validation that passed outputs are neither inputs nor constants, matching the Chromium implementation. - Traverses the graph, starting with outputs, to visit all connected operands. - Previously the outputs were iterated over to process inputs and constants, which didn't make any sense. Inputs are hooked up to [[inputsDescriptors]]. Nothing is specifically mentioned about constants, but an issue about transferring is referenced. (webmachinelearning#566) - The impact of MLGraphBuilder re-use (webmachinelearning#567) is called out, since it could allow for removing the graph traversal. - Populates graph's [[outputDescriptors]], which was previously just missing. (webmachinelearning#448) - Makes most of the validation behavior of build() happen synchronously rather than "in parallel". A promise is still returned, of course. - Converting the graph into an implementation-defined format is called out, which remains in "in parallel" steps. Fixes webmachinelearning#448, fixes webmachinelearning#457, fixes webmachinelearning#552.

* Content: Define build() steps more rigorously - Adds validation that passed outputs are neither inputs nor constants, matching the Chromium implementation. - Traverses the graph, starting with outputs, to visit all connected operands. - Previously the outputs were iterated over to process inputs and constants, which didn't make any sense. Inputs are hooked up to [[inputsDescriptors]]. Nothing is specifically mentioned about constants, but an issue about transferring is referenced. (#566) - The impact of MLGraphBuilder re-use (#567) is called out, since it could allow for removing the graph traversal. - Populates graph's [[outputDescriptors]], which was previously just missing. (#448) - Makes most of the validation behavior of build() happen synchronously rather than "in parallel". A promise is still returned, of course. - Converting the graph into an implementation-defined format is called out, which remains in "in parallel" steps. Fixes #448, fixes #457, fixes #552. * Update index.bs Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com> * Build collection of input operands, and simplify steps * Populate operand and operator sets too --------- Co-authored-by: Reilly Grant <reillyeon@users.noreply.github.com>

a-sully · 2024-06-03T18:54:27Z

#700 naively addresses this issue by copying (rather than transferring) inputs. This is a bit awkward, though:

MLComputeResult contains the copied inputs. This is completely superfluous
It's a bit odd that the inputs - but not the outputs - are transferred
Should we bother passing outputs to compute() in the first place? As @reillyeon mentioned above, if the developer wants zero/minimal-copy, they should use MLBuffer. Passing outputs doesn't actually reduce copies in the Chromium implementation anyways

Thoughts on these changes?

dictionary MLComputeResult {
-  MLNamedArrayBufferViews inputs;
  MLNamedArrayBufferViews outputs;
};

-Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs, MLNamedArrayBufferViews outputs);
+Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs);

Note that we'd like to deprecate compute() for MLBuffer + dispatch() at some point so perhaps this is not worth much discussion... Though MLBuffer isn't specified at all yet, so that may be a ways off :)

zolkis · 2024-06-03T19:07:00Z

(moved the comment here from the PR)
For me copying is fine, but also asking @inexorabletash: in general for the web platform, should we express more concisely the use case (policy) behind this, i.e. the expected immutability of inputs (transfer of ownership) and that copying is just a way (mechanism) to ensure that... we'd rather want something like "lock input" steps (choose a better name) - in most cases implemented (even spec'd) as copying the input. I understand in this case it will always end up being a copy, but (coming from RTOS times) something hurts inside :).

inexorabletash · 2024-06-03T19:40:21Z

The closest I've found (thanks to Webdex) are these annotated invocations:

The one in Encoding is probably the closest to what you're looking for:

Normative spec text just says copy, i.e. implementations can do anything, but it must be indistinguishable from a copy
Informative note encouraging implementations to optimize if possible
Warning about security hazards of the data mutating

I think the equivalent here would be to add an informative note to the call sites (in #700, "Let copiedInputs be the result of copying MLNamedArrayBufferViews inputs ...") describing why a copy is made. I think this is helpful for reviewers, future editors, and implementers to understand why it is specified this way. This could be written as just NOTE: A copy is performed because... Implementations are encouraged to... or perhaps <details class=note><summary>Why a copy?</summary>A copy is performed because...</details>

a-sully · 2024-06-04T19:57:54Z

Thanks for the input, @inexorabletash. Adding a NOTE to #700 SGTM

Thoughts on these changes?

dictionary MLComputeResult {
-  MLNamedArrayBufferViews inputs;
  MLNamedArrayBufferViews outputs;
};

-Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs, MLNamedArrayBufferViews outputs);
+Promise<MLComputeResult> compute(MLGraph graph, MLNamedArrayBufferViews inputs);

Any objections to expanding the scope of this issue to the above diff? i.e. Don't transfer input ArrayBuffers, and return output ArrayBuffers rather than passing them to compute()

reillyeon · 2024-06-04T22:53:47Z

Accepting a typed array to write the result into (or "BYOB") is a valuable pattern for reducing memory allocation overhead (and thus garbage collection overhead).

I don't think it's worth changing the interface here but we should consider providing a BYOB version of readBuffer() which accepts a typed array to copy the contents of the MLBuffer into. By specifying that the typed array is written to on the ML task source I think we can avoid the need to transfer the buffer since script won't have concurrent access to its contents (anything written to the buffer before the copy is performed will be overwritten).

reillyeon · 2024-08-26T18:51:00Z

I've prototyped a BYOB version of readBuffer() in Chromium, so if we want to go on a deprecation spree we could remove both compute() and the version of readBuffer() which always allocates a new ArrayBuffer.

inexorabletash added the question label Feb 16, 2024

reillyeon changed the title ~~Transferring constant ArrayBuffers~~ Don't transfer input ArrayBuffers Feb 29, 2024

inexorabletash mentioned this issue Mar 13, 2024

Content: Define build() steps more rigorously #603

Merged

inexorabletash mentioned this issue Mar 27, 2024

Conventions: Ensure all dict members have definitions #621

Merged

a-sully linked a pull request Jun 3, 2024 that will close this issue

Copy input ArrayBuffers #700

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't transfer input ArrayBuffers #566

Don't transfer input ArrayBuffers #566

Don't transfer input ArrayBuffers #566

Don't transfer input ArrayBuffers #566

Comments