Add indices #345

jakirkham · 2016-10-24T18:26:04Z

Adds a function for iterating over shapes. Can be handy when working with ndarrays or other such objects.

llllllllll · 2016-10-24T19:06:19Z

toolz/itertoolz.py

+    [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]
+    """
+
+    return(itertools.product(*[range(_) for _ in sizes]))


no need to add parens around the return:

return itertools.product(*map(range, sizes))

In general, I would avoid using _ like this, most people use that when they want to ignore a variable but must assign it like:

for _ in range(num_retries): # code that wants to run `num_retries` times but doesn't # need to know the count

or

# I just want the first and last, but don't care about the # middle of an iterator. first, *_, last = sequence

Part of the reason for this convention is that many python shells (default, IPython, bpython, etc) will use _ to mean: the last evaluated line. For example:

In [16]: 2 + 2 Out[16]: 4 In [17]: _ Out[17]: 4

This makes it hard to use a repl to reason about code that uses the name _ because it will get trampled and reassigned a lot.

Yep, sorry, this was kind of sloppy on my part. Will clean this up. Thanks for the tips.

llllllllll · 2016-10-24T19:07:25Z

Do have an example of where you would use this? I understand how this works but I am not sure when to apply it. Maybe provide a functional example in the docstring with an array?

jakirkham · 2016-10-24T19:46:47Z

toolz/itertoolz.py

+    l[1][0] = 3
+    l[1][1] = 4
+    l[2][0] = 5
+    l[2][1] = 6


Not sure what you had in mind for an example, but does this help? If not, do you have some other ideas of what you might like to see?

I guess I am not used to using index access inside for loops, normally people just loop over the values directly and in numpy you don't want to be doing a bunch of scalar accesses like this. To help me understand can you explain some real code that you have written that uses this.

I'll try. 😄

So in some cases I have binary data that I need to split up into smaller blocks on in separate processes and potentially combine results from at different stages. This data normally is on disk and may be a single file or split across multiple files. In these cases, I need an index for each block that I will work with. While I suppose one could compute a single index for each block, it makes the code much harder to reason about and it is already somewhat complex code (e.g. adds halos to data blocks, slices out halos afterwards, etc.). Being able to have indices like this makes it easier to reason about these cases and handle arbitrary dimensions. Not to mention stitching the pieces together becomes much more straightforward.

Hopefully that makes sense.

I think I understand, thanks for clarifying! Looking through some of my numpy code I see there are places where I could have used something like this; however, I realized that this is in numpy as numpy.indices. I wonder if I would want this when working with normal lists/tuples where numpy was not available. If we are going the route of allowing more functions into toolz but selectivly curating the top level namespace then I would be +1 on adding this, but -0 on putting it in the top level. This is because I think it is not immediatly obvious when this is the right function to use over just standard looping or slice indexing so it is more "advanced" than other functions in toolz.

Sounds reasonable. I'm ok with not including it in the main namespace.

Yeah numpy.indices is pretty different from this. Instead of doing something like this, it creates a massive array such that each index combination is specified. This ends up being pretty expensive for large arrays.

We can actually do much better if we note that much of this information is redundant and we are willing to part with having it in one big array. For most use cases, these are safe assumptions. Following them we get something like this. For decent sized arrays, it is not unreasonable to see an order of magnitude or potentially a few orders of magnitude speed up by following this strategy.*

Even if we do need a full array with all combinations like numpy.indices, we can pack the result from the xnumpy function linked above into an array and still cutdown the creation time to roughly half.*

* My benchmarking is still rather primitive at this point, but it does seem reliable thus far.

mrocklin · 2016-10-25T18:54:32Z

I think that we should create a separate repository for these kinds of
functions rather than put them in toolz.

If someone wants to set this up it could live under the pytoolz github org
and be linked to from the toolz doc pages (if the creator wanted it to).

On Mon, Oct 24, 2016 at 4:57 PM, Joe Jevnik notifications@github.com
wrote:

@llllllllll commented on this pull request.

In toolz/itertoolz.py #345:

[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]

This can help nicely index an array.

l = [[1, 2],

... [3, 4],

... [5, 6]]

for i, j in indices(3, 2):

... print("l[%i][%i] = %i" % (i, j, l[i][j]))

l[0][0] = 1

l[0][1] = 2

l[1][0] = 3

l[1][1] = 4

l[2][0] = 5

l[2][1] = 6

I think I understand, thanks for clarifying! Looking through some of my
numpy code I see there are places where I could have used something like
this; however, I realized that this is in numpy as numpy.indices. I
wonder if I would want this when working with normal lists/tuples where
numpy was not available. If we are going the route of allowing more
functions into toolz but selectivly curating the top level namespace then I
would be +1 on adding this, but -0 on putting it in the top level. This is
because I think it is not immediatly obvious when this is the right
function to use over just standard looping or slice indexing so it is more
"advanced" than other functions in toolz.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#345, or mute the thread
https://github.com/notifications/unsubscribe-auth/AASszKsWKNsjP7HtQmsx77VbHhu8436Rks5q3Ru1gaJpZM4KfH3M
.

llllllllll reviewed Oct 24, 2016

View reviewed changes

jakirkham added 2 commits October 24, 2016 15:43

itertoolz: Adds a function for iterating over indices.

4ad9583

test_itertoolz: Add some unit tests for indices.

eb618f3

jakirkham force-pushed the add_indices branch from f52c034 to eb618f3 Compare October 24, 2016 19:43

jakirkham commented Oct 24, 8000 2016

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add indices #345

Add indices #345

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

@llllllllll commented on this pull request.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add indices #345

Are you sure you want to change the base?

Add indices #345

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@llllllllll commented on this pull request.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants