Since this experiment calls for an examination of
IP datagrams, you will probably want to begin with
the program you wrote for Experiment 12.2. Referring
again to The IP Datagram Header Format on page 328 of
Computer Networks and Internets you must now
deal with the two fields labelled FLAGS
and FRAGMENT OFFSET
. The same issue that
arose in dealing with H.LEN
comes up again.
You want to examine the two least significant bits.
Is the mask 0x3 or 0xc? At least on a SPARC station,
it is 0xc. Basically think of the layout in the book
as preserving the order to which you have become accustomed
from decimal arithmetic.
That being said the first part of the experiment is easy.
Things get interesting when you get to the point of handling
fragment loss and out-of-order delivery. That seems straightforward,
and it is, if you make some assumptions as to the way things can
get our-of-order. However, it is a fundamental principle that, at
least production software must deal with the worst possible case.
To see what that is look at the Optional Extension Number 6. Why
in the world, you say, would you use such a complicated data structure
when a simple array would do? To answer that check out
RFC 815. The problem is that packets may take different routes
and be duplicated. So it is possible that the same packet will arrive
on your doorstep twice, but fragmented in different ways.
In other words you should make no assumptions about the way the
data will arrive.
On Solaris 8 machines the header file /usr/include/inet/ip.h
includes an /* IP Fragmentation Reassembly Header */
which
will reinforce the notion that this is not as easy as we might think.
Writing your program the "right way" presents a dilemma in
terms of how to test it. Obviously, finding packets datagrams
that have been fragmented in more than one way is not going to
be easy. Eventually, I hope to write a program to do this with
a file of captured packets, but even then testing will be a
problem since in the case that all did not go well right away,
one would have trouble knowing whether the problem lay in the
reassembly software or in the fragmenting software. So I tried
writing two programs: one to scramble a file and the other to
put it back together. This seemed a feasible way to test my
implementation of the algorithm. The scramble program
represents a worst case scenario. It determines the length of
file in bytes and proceeds to randomly select a starting
point and a number of bytes and then to write that part of the
file with the starting point, the number of bytes, and a more
fragments byte to a second file. The program terminates when
it has written each byte in the original file at least once.
The second program simulates
fragment reassembly by reading that second file and using it
to reconstruct the original file. This approach seems to me to
be theoretically sound, but it has severe limitations in
practice. The scrambling algorithm is horribly inefficient.
I had the right idea in testing it on files of a few thousand
bytes, but a colleague challenged me to try it on
/usr/dict/words
! It would not take too much to convince
me that this programs produces files whose size varies
exponently with the size of the original file. The main
drawback when you think about it is that it takes forever for
the randomly chosen starting point to hit zero. Until that
happens you are not going to get the first byte of the file.
An obvious modification would be to cheat by writing the
first several bytes of the file outside the random process.
Perhaps the most important lesson to be learned from this
exercise is the power and beauty of the algorithm described in
RFC 815. The first version of the scramble program used a
sort of scoreboard approach. I set up an array, zeroed it,
and wrote a 1 in the appropriate spots every time I wrote
data. After writing the unscramble program, I realized that
I could use the RFC815 approach to the scramble program to
determine the point at which all the bytes had been written.
Now most of the inefficiency in that program has to do with
file input-output which will not change from one approach to
another. However, the difference in execution time was
significant.
This site is maintained by W. David Laverell
of the Computer Science Department
at Calvin College. For assistance or corrections,
please contact him at .