Nothing Special   »   [go: up one dir, main page]

Skip to content
Robert Peszek edited this page Sep 22, 2013 · 1 revision

Fpiglet Lazy I/O Streams - Current Idea

###Idea Description: I have been intrigued by the following idea (which is has its limitations but I think it is interesting):

Idea description: For simplicity assume for a moment that we want to tokenize input stream by reading lines. I want to create a lazy list which has elements more or less like this:

[ { -> io.readLine()}, { -> io.readLine()}, ...]

You could almost think of this as

io -> (map { io.readLine() } << repeat())

only with better handling of stream end. There is obvious problem with that. Quoting Simon Peyton Jones: "Side-effects do not mix with Lazyness". If the client program decides to get second element first, this element will read the first line!

Fortunately, FunList are only semi-lazy only because head is not lazily evaluated! Thus, this is implementable. However each call to tail needs to cache/memoize the result so the lines are read in proper order and this code has a chance of passing:

assert thirdLine == head << tail << tail << tail << lazyI0
assert firstLine == head << lazyIO //head was cached/memoized during previous call

I find this idea interesting even through it has a limitation of loading too much IO into memory. Still I am keeping it for now. For streams with seeking (like files) memoization could remember just the position.

###Idea Implemented So, Fpiglet allows you to treat InputStream as a lazy 'functional' list.
Obviously I/O has side-effects all over so I am placing 'functional' in quotes. This functionality is another example of FunctorPolymorphism.

This is done, by default, with tokenization similar to FunListToString because of the fact that using RegEx would violate Functor Laws. However, you can provide your own versions of tokenize and untokenize closures when using funlistIn/funlistOut methods!

TODO needs good example.

Similarly to FunListToStrings we can use funlistIn / funlistOut, withFunList functions as well as direct functorial shortcuts. The examples below use functorial shortcuts.

It is worth noting is that file records are retrieved lazyly, retrieving only as much data as needed. So if some records are not needed in the calculation, the stream data beyond them is never accessed:

def io = TokenizedInputStreamAsFunList.withLines()
  
def first3lines = io.take(3) << myInputStream

Also, the implementation will read each 'token' from the file only once:

//only first 3 lines are read

def first3lines = io.take(3) << myInputStream
def againFirst3Please = io.take(3) << myInputStream
assert first3lines == againFirst3Please 

Here is one example how Tokenized input stream can be used:

def io = TokenizedInputStreamAsFunList.withLines()
new File('myfile').withInputStream { is->
   InputStream  << is
   //... do something with long lines
}

Important note: this functionality assumes that the InputStream is not used outside of FunList. If the client program decides to close the stream, read from it, or modify its state in any other way things will not work.

Another note: Because the list needs to 'remember/memoize' what was read the first time, this solution does not scale to large data.

However, I think it is interesting design idea even if it is suitable for small amounts of data only.

Clone this wiki locally