Cvsplot v1.7.4

Cvsplot is used for collecting statistics from CVS controlled files.
Simple statistics such as how the total number of files and lines of
code change against time.  It runs under any flavour of UNIX, and
under Windows (assuming Perl from http://www.activestate.com is
installed).

A simple invocation would be:

cvsplot.pl -cvsdir :ext:cvsbox:/usr/local/cvsroot -rlog product \
           -linedata linedata.txt -filedata filedata.txt

Note, if using perl 5.8, the DateManip module contains characters that
are non UTF-8 characters.  All invocations of cvsplot.pl should be
done with the LANG environment variable set to "C".  With the bash
shell, this would be:

LANG=C cvsplot.pl ...

I have been told this is no longer necessary with DateManip version
5.42 and above.

The above command effectively retrieves cvs history information for
all CVS controlled files in the "product" module from the CVS
repository.  The -cvsdir argument is the same as the CVSROOT
environment variable.  The results are stored into the linedata.txt
and filedata.txt files in a simple text format.  Each line consists of
a data point (corresponding to a CVS commit), which includes the date
of the commit, and the corresponding number.

For linedata.txt, this number represents the total number of lines
for active files that exist in the repository up until that date.  For
filedata.txt, this represents the total number of files that exists in
the repository up until that date.  Note files which have been
indicated as binary to CVS are ignored.

If the period of interest is well defined, then it is possible to trim
the statistics reported by optionally specifying start and/or end
dates.  For example, if we are only interested in statistics starting
from the 28th March, 2001, then the following can be entered:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -start "28th March, 2001"

The -end option is used to specify the final date, for example:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -start "28th March, 2001" \
           -end "2nd May, 2001"

The date formats supported are very flexible, as the Date::Manip perl
module is used for date parsing and manipulation.  The above command
could have also been expressed as:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -start "2001/03/28" \
           -end "2001/05/02"

It is possible to specify filesets in order to restrict what statistics
are generated.  For example, assuming we are only interested in C
files, header files and java files, the following command could be
specified:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -include '\.java$' -include '\.c$' \
           -include '\.h$'

The argument given to the -include option is in the syntax of a perl
regular expression.  To avoid shell expansion, single quotes must be
used.  It is also possible to specify files that should not be
included.  Assuming we are interested in java and C files, but don't
want to run statistics down the "kernel" sub-directory, then the
following command could be issued:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -exclude '^kernel' \
           -include '\.java$' -include '\.c$' -include '\.h$'

The order of the -exclude and -include options is important.
Whenever cvsplot examines a file, it runs through the list of -exclude
and -include options in the order specified on the command line.  If
the filename matches a -exclude option, it is skipped.  If a filename
matches a -include option, it includes the file when collecting
statistics.  If no -include or -exclude options have been specified,
then the default behaviour is to include all files.  If -include or
-exclude options have been specified, and a file doesn't match any of
the include or exclude patterns, then it is *not* included when
collecting statistics.

The -include and -exclude options and semantics were based on the
--include and --exclude options from rsync.

It is also possible to provide a specific CVS branch in which to
gather statistics.  The above command can be run against the
RELEASE1 branch as follows:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -exclude '^kernel' \
           -include '\.java$' -include '\.c$' -include '\.h$' \
           -branch RELEASE1

In addition to generating statistics into a text file, it is also
possible to generate plots as a png file, assuming gnuplot is
installed on your system.  I run cvsplot.pl in a cron job so that my
team's statistics are updated nightly on our internal web-server.
Gnuplot supports many other file formats, such as png, gif, jpg and
postscript.

To generate png files which will plot the statistics, the -gnuplot
options are specified:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -gnuplotfiledata filedata.png \
           -gnuplotlinedata linedata.png

The filedata.png file presents the statistics in filedata.txt as a png
file generated by gnuplot.  Similarly for linedata.png.  

Its also possible to generate plots which combine both the line and
file data into a single plot, using the -gnuplotlinefiledata switch.

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -gnuplotfiledata filedata.png \
           -gnuplotlinedata linedata.png \
           -gnuplotlinefiledata linefiledata.png

For gnuplot users, it is possible to specify the "terminal parameters"
sent to gnuplot when generating the plots.  For example, to generate
the plots as a postscript eps using Times-Roman font, the following
could be specified:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -gnuplotfiledata filedata.eps \
           -gnuplotlinedata linedata.eps \
           -gnuplotsetterm "post eps 'Times-Roman'"

Its also possible to specify general gnuplot commands (separated by
semi-colons) which will get executed before the final "plot" commands
to generate the graphs.  One possibility might be to change the
formatting of the x values, from their default %m/%y (month/year)
format, such as %d/%m (day/month) and to set the title of the graph:

cvsplot.pl -cvsdir /usr/local/cvsroot -rlog product -linedata linedata.txt \
           -filedata filedata.txt -gnuplotfiledata filedata.eps \
           -gnuplotlinedata linedata.eps \
           -gnuplotsetterm "post eps 'Times-Roman'" \
           -gnuplotcommand "set format x '%d/%m'; set title 'CVS History'"

The above commands use the "cvs rlog" command, to retrieve the
relevant information.  This command is properly implemented in CVS
versions >= 1.11.1.  If you have an older version of CVS, you can still
use cvsplot, however you need a checked out version of the module you
want to gather statistics from (to run "cvs log" against).  Make sure
the checkout is done without the -P flag, and those directories which
are pruned will be excluded from the statistics, which is not what you
want.  The only difference in command syntax is the argument to
-cvsdir refers to your checked out sandbox, and the -rlog command is
omitted.  For example, the previous command would be the following,
assuming the product module was checked out in the ~/product
directory.

cvsplot.pl -cvsdir ~/product -linedata linedata.txt \
           -filedata filedata.txt -gnuplotfiledata filedata.eps \
           -gnuplotlinedata linedata.eps \
           -gnuplotsetterm "post eps 'Times-Roman'" \
           -gnuplotcommand "set format x '%d/%m'; set title 'CVS History'"

Finally, for large plots, it is definately worth trying the -linestyle
option, as this can dramatically improve readability.

For platforms (such as windows) where gnuplot may not be in the
standard PATH, and/or has a different name, the -gnuplot option can be
used to specify the full path to the gnuplot binary.

Since version 1.7.0, it is now possible to retrieve per-user
statistics as well.  The -userdata option specifies the file which
will store user commit information.  This file is used as input for
gnuplot, when plotting per-user information.  An example invocation
is:

cvsplot.pl -cvsdir ~/product -linedata linedata.txt \
           -filedata filedata.txt -userdata userdata.txt \
           -gnuplotfiledata filedata.png \
           -gnuplotlinedata linedata.png \
           -gnuplotuserdata userdata.png \
           -userlist fred,joe,peter,paul

This command will create userdata.png, which will contain CVS line
counts for contributions made by usernames fred, joe, peter and paul.
If there is no -userlist argument, this will default to all found users
in the CVS logs, which for some installations can be very large, and
produce unreadable graphs.

It is also possible to specify groups of users, as another way of
reducing graph clutter.  The userdata graph will have a line displayed
per-group rather than per-user.

cvsplot.pl -cvsdir ~/product -linedata linedata.txt \
           -filedata filedata.txt -userdata userdata.txt \
           -gnuplotfiledata filedata.png \
           -gnuplotlinedata linedata.png \
           -gnuplotuserdata userdata.png \
           -userlist group1=fred,joe \
           -userlist group2=peter,paul

It is also possible to specify a "default" group, so that any users
not explicitly listed will be automatically become a member of this
group.  This is achieved via the -defaultusergroup option, as shown
below, in the case of the default group known as "group3".

cvsplot.pl -cvsdir ~/product -linedata linedata.txt \
           -filedata filedata.txt -userdata userdata.txt \
           -gnuplotfiledata filedata.png \
           -gnuplotlinedata linedata.png \
           -gnuplotuserdata userdata.png \
           -userlist group1=fred,joe \
           -userlist group2=peter,paul \
           -defaultusergroup group3

An option to mention which affects all linecount statistics
(-linedata and -userdata options) is -countchangedlines.  If this
option is specified, then the line counts reported are the number of
lines _changed_, not lines _added_.  For example, if a commit involved
removing two lines and adding three lines, with -countchangedlines,
this would be recorded as an addition of five lines.  Without
-countchangedlines, this would be just one line.  Some users requested
-countchangedlines, as it can be used a form of a very rough
productivity meaurement.

CVS global arguments can be set for all CVS commands executed by
cvsplot.pl, via the -cvs-global-args switch.  For example, the command
below:

cvsplot.pl -cvs-global-args "-f -q" ...

Will ensure that ~/.cvsrc will not be read for default settings, and
that all CVS commands will output the minimal amount of information to
stderr.  Execute cvs --help-options for the complete list of global
arguments available.

----------------------------------------------------------------------

Requirements:

Cvsplot uses the Date::Manip and String::ShellQuote (for UNIX
platforms) perl modules.  Run cvsplot and it will tell you if these
modules are missing, and will provide instructions on how to fetch and
install them.

Gnuplot comes with most Linux distributions, and can be found at
http://www.gnuplot.org.  If the -gnuplot* options are not used, then
its not necessary to install gnuplot.  Window versions are also
available.

----------------------------------------------------------------------

Updates:

Updates can be found from http://cvsplot.sourceforge.net.

Comments:

Please send comments to sits@users.sourceforge.net
Thank you!


