Nothing Special   »   [go: up one dir, main page]

Testing Readable

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Testing and Documenting Perl Code

Scott Wiersdorf
NTT/Verio

Why Developers Should Test and Document Their


Code
or "Why Something is Better Than Nothing"

Assertion
I want to convince you that just as building a house without plans is
cobbling, not engineering, writing software without knowing in advance
how the software should work is also cobbling and not software
engineering.
This is not to say that you must
the software will precisely work
Because of the complex nature of
non-linearly, which allows us to
adjustments as we go.

know all of the methods


at the beginning of the
software writing, we do
discover new things and

or even how
project.
it
make

Writing tests and documentation (which is what this presentation is


about) are essential to the discovery process. By writing tests and
documentation before you begin coding, you are creating plans for
your code. Tests and documentation are the architectural plans for a
piece of software, without which the engineering process is simply an
exercise in trial-and-error and the software itself is highly fragile
and likely poorly-designed. Testing and documenting before coding
allows you to achieve a more succinct, stable, and readable API than
might otherwise be realized.
In addition to providing plans for new code, tests and documentation
provide a reference for existing code. Regression tests give you as
the programmer a tool that can tell you almost instantly if any
refactoring you've done has broken something. This lets you go back
and clean up that section of code you've got marked as 'FIXME' that
you've been meaning to go back and fix for a year but haven't had
time. Without regression tests, the FIXME's stay forever.

Roadmap: Testing and Documenting Code

About Me (briefly)

Guilt Trip: Why Not Test/Document

A Testing Example

Fleshing Out Our Example

Tests
Documentation
Writing Code
Future Improvements

The Take-away

What you should know about me

I am not a testing expert!

I am a programmer

I do not always write tests or documentation for all my code

I do write tests and documentation for nearly all of my code

I have (good) intentions to go back when it's time to refactor


old code and write tests and documentation for them, too

Why Not Testing

Roadmap: Why Not Testing

Ignorance

Someone Else's Problem

Looming deadline/short on time

Extra work

Testing is for weenies!

Why Not Test: Ignorance

I'm supposed to test?

I don't make mistakes (denial)

Yes!
Software engineering has come a long way since "top-down
modular design" was innovative
Testing is less effort than debugging
Learning to test your code will make you a more portable
programmer
You'll have more friends if you test your code
Yeah, right.
You write your programs once...and then write them all over again
when a change needs to be made (you're a real hard worker)
Meskimen's Law: There's never time to do it right, but there's
always time to do it over.

Work smarter, not harder: regression tests are your friends

Why Not Test: Ignorance (cont'd)

I think someone else in our department tests (see next slide)

Could be. After all, what are users for?


The best unit tests are done by the developer who wrote the unit

I don't know how to set up tests

That's why you're paying careful attention!


We'll cover that later

Why Not Test: Someone Else's Problem (SEP)

We already have a department that does that

Ok, that's a valid point

But here are a few more valid points:

It's also your problem because it's your code


No one knows the code like you do
No one cares about your code like you do

If you don't care about your code as you write it, you are a curse to your
employer
You won't care much about any of your code after a couple of weeks
anyway

Point: the best time to write tests is before and while you are
coding, not after (though after is better than nothing). You're
the best person to do that

Why Not Test: Looming Deadline

"We try to solve the problem by rushing through the design


process so that enough time will be left at the end of the
project to uncover errors that were made because we rushed
through the design process." --Glenford Myers

This also applies to testing

The best time to write tests is at the beginning of a project

You'll save time either at the beginning of the project (by not
writing tests) or at the end of the project (less debugging)

With regression tests, you will save time at the end of and
throughout the project

You will have more provably correct code sooner if you test

You will save oodles of time in the future with regression tests

There really aren't many good time-based arguments for not


writing tests

Why Not Test: Extra Work

You will always be afraid of touching a finished, working piece


of code until you have tests

Fourth Law of Code: Code that has no regression or unit tests


is either dead code or very fragile code that breaks at
inconvenient times (doubly so for production code)

Corollary: Working code with a complete suite of regression


and unit tests is living, growing code and will be useful to many
people over a longer period of time and will tolerate a high
degree of change and internal reworking

You will never have real confidence in your code until you
have tests

Your software will never be world-class quality until you have


tests (almost all CPAN modules have tests and many CPAN
modules have very thorough tests)

Why Not Test: Only Weenies Test!

Some weenies:

Larry Wall
and a cast of thousands, including
Tom Christiansen
Gurusamy Sarathy
Damian Conway
Simon Cozens
and more: consider CPAN

Don't you wish you were a weenie?

Why Not Documentation

Roadmap: Why Not Documentation

Same reasons as testing

Ignorance
Someone Else's Problem (documentation team)
Looming deadline/short on time
Extra work
Documenting is for weenies!

We'll go through this quickly

Why Not Document: Ignorance

What good is an undocumented module?

No one can use it except you

Do you really want to own this module for eternity?

By documenting your module, you free the module to be


owned by someone else

You also free yourself from having to explain to others how it


works

Why Not Document: Someone Else's Problem

Isn't this why we have a documentation team?

Can they understand your code?

(No, they can't. For the love of Pete, have a heart and
document your code)

(And another thing... if documenting your code is not part of


your job description, it should be. Generally speaking, tech
writers are not programmers. To think they should document
your code is evidence of misunderstanding their function. -B.
Heaton)

You really are the only one who knows how this code works--if
you document it, then someone else will understand it too (and
you will be free!)

If you're really a poor writer, do the best you can and let the
doc team clean it up later

The doc team loves it when developers document their code

Why Not Document: Looming Deadline

I'm short on time--I'm not going to document

Good point: given a choice between documenting and coding,


choose coding

This is why we document before we code!

It will shorten the amount of time you spend coding by more


than the time spent documenting

You will be more productive (only anecdotal evidence to


support this, but I believe in it)

Why Not Document: Extra Work

You have fallen victim to false laziness

False laziness says that less work up front yields less total
work in the long run

True laziness says that more work testing and documenting up


front yields less total work in the long run

If you document your code, you'll spend less time in the


following areas:

explaining how your code works to others (this lasts as long as the
module and is the #1 benefit in my book)
rewriting code to get the API right
"We try to solve the problem by rushing through the design process
so that enough time will be left at the end of the project to uncover
errors that were made because we rushed through the design
process." --Glenford Myers
time spent documenting is time spent designing and is well spent
mental exertion

Why Not Document: Extra Work (cont'd)

Substanitated Rumor: People who document well also think


well

Writing solidifies nebulous thoughts: "How can I know what I


think till I see what I say?" --E. M. Forster

If you take the time to document, you'll resolve most of the


really hard logic problems and interface issues early, saving
you labor later

Why Not Document: Only Weenies Document

Some more weenies:

Larry Wall
and a cast of thousands
and more: consider CPAN

A Testing Example

Roadmap: An Example of Writing Tests

Creating a module

Editing test.pl

Running tests

Fixing mistakes

Creating a Module

Many ways to create a module

Some more popular than others

We're going to use 'h2xs' because it's easy and found


everywhere

Creating a Module (cont'd)

create a module:

h2xs -AX Foo::Bar

This will create the following hierarchy:

Foo/Bar/Bar.pm
Foo/Bar/Changes
Foo/Bar/MANIFEST
Foo/Bar/Makefile.PL
Foo/Bar/test.pl

Notice the 'test.pl' file

Example 1: Create a module

Goal: a module that can parse an Apache log file and give
summary information in the form of an object

Start with h2xs:

h2xs -AX Log::Apache::Object


Writing Log/Apache/Object/Object.pm
Writing Log/Apache/Object/Makefile.PL
Writing Log/Apache/Object/test.pl
Writing Log/Apache/Object/Changes
Writing Log/Apache/Object/MANIFEST

The first thing we do is edit 'test.pl'

Depending on your version of Perl, you'll see varying amounts


of stuff cluttering your file

We'll cover what's in 'test.pl' next

Example 1: sample test.pl

A sample test.pl: this much will be created for you

#########################
# change 'tests => 1' to 'tests => last_test_to_print';
use Test;
BEGIN { plan tests => 1 };
use Log::Apache::Object;
ok(1); # If we made it this far, we're ok.
#########################

Example 1: Edit test.pl

now we hack on test.pl

we write one or two tests for each "feature" we're about to add

use Test;
BEGIN { plan tests => 1 };
use Log::Apache::Object;
ok(1); # If we made it this far, we're ok.
## first test
my $obj;
ok( $obj = new Log::Apache::Object );

Example 1: Running test.pl

tests are run automatically when we type 'make test'

% perl Makefile.PL
Checking if your kit is complete...
Looks good
Writing Makefile for Log::Apache::Object
% make
cp Object.pm blib/lib/Log/Apache/Object.pm
Manifying blib/man3/Log::Apache::Object.3pm
% make test
PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib \
-I/usr/local/lib/perl5/5.6.1/i386-freebsd \
-I/usr/local/lib/perl5/5.6.1 test.pl
1..1
ok 1
Can't locate object method "new" via package \
"Log::Apache::Object" (perhaps you forgot to load \
"Log::Apache::Object"?) at test.pl line 7.
*** Error code 255
Stop in /usr/home/scottw/testing/Log/Apache/Object.

we forgot to update the number of tests

we haven't written a "new" method yet

Example 1: Fixing Mistakes

Let's update the tests:

BEGIN { plan tests => 2 };

and write a 'new' method:

sub new {
my $self = { };
my $proto = shift;
my $class = ref($proto) || $proto;
bless $self, $class;
return $self;
}

and run the tests again

% make test
cp Object.pm blib/lib/Log/Apache/Object.pm
PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib test.pl
1..2
ok 1
ok 2

Success!

Wasn't That Satisfying?

We now have a module that works and we can prove it

We can get instant feedback when we make changes

We'll never have to write a test for the constructor again

...unless we add/change functionality in the constructor

We can be bold in making wide-impact changes such as


refactoring entire methods or changing global defaults, etc.

Fleshing Out Our Example

Roadmap: More Tests, Features and Documentation

Documenting What We're About To Do

Adding New Functionality

Writing Code

What We Just Did

Future Improvements

Documenting What We're About To Do

What good is a module with no documentation?

Documenting is easy with POD (it's already there)

Documenting before coding also helps solidify the API in the


same way testing does, but even better because putting
abstract concepts into concrete language (e.g., English) forces
your brain to conceptualize it; you quickly find ambiguities and
contradictions

Documenting: SYNOPSIS
=head1 SYNOPSIS
use Log::Apache::Object;
my $obj =
new Log::Apache::Object(file => '/www/logs/access_log');
$obj->parse();
print "I have had " . $obj->count(status => 404) .
" errors since the log was rotated\n";

Documenting: DESCRIPTION
=head1 DESCRIPTION
B<Log::Apache::Object> is an OO class for parsing Apache log
files. The following methods are available:
=over 4
=item B<new>
Creates a new B<Log::Apache::Object>. Valid arguments are
key/value pairs:
file => "filename.log"
=item B<init>
Initializes the object; this is called when the object is
instantiated, but may be called any time thereafter to
re-initialize the object (e.g., with new data). Takes the same
arguments as B<new>.

Documenting: DESCRIPTION (cont'd)


=item B<file>
Sets/returns the current log file to process.
Example:
$obj = new Log::Apache::Object;
$obj->file('/www/logs/access_log');
print "Working with log " . $obj->log . "\n";

Documenting: DESCRIPTION (cont'd)


=item B<parse>
Parses the log file and stores the log data internally to the
object. Future method calls will access this storage so the
log will not have to be re-parsed.
Example:
$obj->parse();
=item B<count([$field [, $value]])>
If no argument is specified, returns the number of lines in
the log. If B<$field> is specified, the total
number of unique keys in that field will be returned (e.g.,
'host' will return unique hosts in the log). If B<$value>
is specified, the number of entries in B<$field> for
B<$value> will be returned.
Example:

print "I have " . $obj->count . " lines\n";


print "I have " . $obj->count('host') . " unique hosts\n";
print "10.0.11.1 made " . $obj->count('host' => '10.0.11.1')
" requests\n";
=back

Documenting: BUGS

BUGS gives us a public "to do" list (nobody wants to admit to


having bugs)

=head1 BUGS
=over 4
=item *
Only parses Apache combined log format
=item *
Stores the entire file in memory
=item *
Does not cache data outside of the object (re-parse each time)
=back

What We Just Did: Documentation

We wrote the documentation before the code

This helped us solidify the API before we wrote it


Now our module is completely documented (to date)

When we do 'make install', it will "manify" the POD and make a


real manpage

We did a good enough job in the docs for somebody not


familiar with the code to use it

This is the Perl Virtue of laziness


We don't have to answer any questions about our code
We just say, "RTFM"

Admission of Guilt

I didn't write all of the documentation before I began coding

'init' and 'file' were after-thoughts

But I documented them after realizing I needed them

Documenting, testing, coding is not linear but recursive and


sometimes random: that's ok (as long as it gets done)

Adding New Functionality in test.pl

Let's edit our test.pl file (remember to update test count)

## create a file
open TMP, ">.tmp_$$" or die "$!\n";
print TMP <<'_FILE_';
10.1.1.11 - - [24/Mar/2003:08:25:10 -0700] "GET /nonexistent HTTP/1.1" \
404 300 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; (R1 1.3))"
10.1.1.11 - - [24/Mar/2003:08:25:15 -0700] "GET /nonexistentential HTTP/1.1" \
404 300 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; (R1 1.3))"
10.1.1.11 - - [24/Mar/2003:08:25:24 -0700] "GET /foo/bar.pl HTTP/1.1" \
200 9328 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; (R1 1.3))"

_FILE_
close TMP;
## now further tests
undef $obj;
ok( $obj = new Log::Apache::Object(file => ".tmp_$$") );
ok( $obj->parse() );
ok( $obj->count, 3 );
ok( $obj->count('status'), 2 ); ## two 404's and one 200
ok( $obj->count('status' => '404'), 2 );
ok( $obj->count('host'
=> '10.1.1.11'), 3 );
## clean up
END { unlink ".tmp_$$" }

What We Just Did: Testing

We wrote the tests before the code

This helped us clarify the API before we wrote it


Our test interface also serves as a mock-up or reference for
development on the code
Tests help us gauge our progress on the actual module

Writing tests is easy and fast!

...usually.

Writing good tests is always satisfying and productive

Writing Code: 'new' and 'init'

Now let's write some code!

our %agg
= ();
our @lines = ();
sub new {
my $self = { };
my $proto = shift;
my $class = ref($proto) || $proto;
bless $self, $class;
$self->init(@_);
return $self;
}
sub init {
my $self = shift;
my %args = @_;
$self->file($args{'file'});
}

Writing Code: 'file' method

This could also be done with $AUTOLOAD

sub file {
my $self = shift;
my $file = shift;
return $self->{'_file'} = ( defined $file
? $file
: $self->{'_file'} );
}

Writing Code: 'parse' method


sub parse {
my $self = shift;
my %data = ();
die "Must specify file to parse\n"
unless $self->file;
open FILE, $self->file
or die "Could not open " . $self->file . ": $!\n";
while( <FILE> ) {
chomp;
next unless ($data{host}, $data{user}, $data{timestamp}, $data{method},
$data{uri}, $data{protocol}, $data{status}, $data{bytes},
$data{referer}, $data{agent}) = m!
^(\S+)\s+\S+\s+
## host, logname (not captured)
(\S+)\s+\[(.+?)\]\s+
## user, timestamp
"(\S+)\s+(\S+)\s+(\S+)"\s+ ## request URI, method, protocol
(\d+)\s+(\d+)\s+
## status, bytes sent
"(.+?)"\s+"(.+?)"$!x;
## referer, agent
## some aggregate data
$agg{host}->{$data{host}}++;
$agg{uri}->{$data{uri}}++;
$agg{status}->{$data{status}}++;
$agg{agent}->{$data{agent}}++;
## store this line
push @lines, \%data;
}
close FILE;
}

$agg{user}->{$data{user}}++;
$agg{protocol}->{$data{protocol}}++;
$agg{referer}->{$data{referer}}++;

Writing Code: 'count' method


sub count {
my $self = shift;
my $field = shift;
my $value = shift;
return scalar @lines unless $field;
return scalar keys %{$agg{$field}} unless $value;
return $agg{$field}->{$value};
}

Some Future Improvements

Future improvements:

split URI data into uri and arguments


split referer data into host and uri
split agent data into browser/version/realname/os
make 'count' method handle SQL-like syntax
write request data to disk as we read it (less memory)
store request data in a binary format (less disk space)

Future benefits of having regression tests:

we won't have to change our tests!


we'll be able to tell if adding functionality breaks anything
immediately and we'll likely know why

Conclusion

Practical Take-away

Testing is rewarding and gives instant feedback to how you're


progressing and how accurate your code is

Having regression tests enables you to refactor your code


whenever you feel like it without fear of breaking something

Writing documentation forces your brain to move from abstract


to concrete; this process uncovers potential logic problems

Concentrating on the API in testing and documentation helps


you create a cleaner API before you begin coding

You will save more total time by writing tests and


documentation up front and you will free your code to be
owned by someone else later

Philosophical Take-Away

The true quality of a module lies not in clever algorithms,


speed, or obscure use of the Perl flip-flop operator, but in
clean design and in a well-thought-out API

Testing and documenting before coding gives you a concrete


plan you can turn to while coding; you also are forced to face
and resolve logic problems and API conflicts long before you
even code. This saves the most time possible because you
hardly ever have to backward engineer anything

Testing and documenting before coding provides you with a


suite of regression tests that allows you to refactor your code
anytime and as often as you want; you get to keep your
confidence that the module still works because you have
instant feedback when something breaks. This lets you make
bold, sweeping changes or allows you to take advantage of
new technology confidently.

You'll have more friends if you test and document your code

You might also like