taggiasca.com
log scanner

--

If you are a webmaster or a site owner you have to know who are your visitors to fine-tune your site on their requests, or they don't come back.

The best way to know your users is to analyze logs: to do this in a very nice way (look our stats, for a good example) you MUST use a good program.

The best FFF (fast, friendly and free) I know is Stephen Turner's ANALOG: you can download it at http://www.analog.cx.

But if you need for example to know where is from the user looking for a strange word on your site, you are on your own: usually this job is done reading logs line-by-line, or using search function on text processors, or mailing Stephen Turner with boring odd requests about Analog config files (his answers are usually kinder than mine: RTFM!).

So I wrote this Perl utility: you can query your logs to search for a word contained into a log line and/or for a domain/IP.

Here is a sample of the search on our logs of the domain .it and of the word journey.

Note that this utility, until now, has some drawbacks:
  • it don't perform a DNS lookup
  • you can't use multiple words
  • if you look for visitors coming from ".va" (Vatican City) you find also visitors from "modem42.vaccapazza.it" because I'm too lazy to write a domain analysis routine
  • your logs MUST be in this Apache format:
     
    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
    or your log lines MUST be like this one:
     
    fw1.correoargentino.com.ar - - [19/Jan/2000:00:20:13 +0100] "GET /images/o.gif HTTP/1.0" 200 180 "http://www.taggiasca.com/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)"
Maybe, soon or later, I do something for these drawbacks. But I'm not a politician, so I don't promise nothing.

Also, I suggest you don't use it on your main server: doing the wrong request you can have a page containing ALL your logs, that's always a bit more of what your server can manage without dumping.

Here is the list: comments about things to change to fit your box are in this color (you too have color-impaired friends?) and variables has Italian names, so they don't look like strange commands.


#/usr/bin/perl -w set your Perl path here!
# scannalog.pl - log scanner
# (c) Jan 2000 Marco Bernardini - webmaster@taggiasca.com
# under PERL ARTISTIC LICENSE - see
# http://www.perl.com/pub/language/misc/Artistic.html
# for details

my $adesso = time;
my $dir ="D:/www/logs/taggiasca"; directory containing your logs
my $pw="rapunzel"; set your password here!

my $quante = 0;
my $tot = 0;
my $cosacerco = "";
read(STDIN, $string, $ENV{'CONTENT_LENGTH'});
    @buffer=split(/&/,$string);

  foreach $item(@buffer)
   {
    ($key,$content)=split(/=/,$item);
    $content=~tr/+/ /;
    $content=~s/%(..)/pack("c",hex($1))/ge;
    $content=~s/\t/ /g;
    $campi{$key}=$content;
   }

print "Content-Type: text/html\n\n";

if ($pw ne $campi{'pw'}) {
print qq~
<font color=red>
<h1 align=center>Wrong Password</h1>
</font>
~;
exit 1;
}

$cosacerco = $campi{'q'};
$chedominio = $campi{'d'};

opendir DIR, $dir or die "Error opening directory $dir: $!\n";

@files=grep !/^\./, readdir(DIR);
chdir $dir;

foreach $logfile (@files) {
open (PAGE ,"<$logfile") || die $!;
push ( @pagina , <PAGE>);
close PAGE;
}

print qq~
<html>
<head><title>Results - query $cosacerco $chedominio</title>
<head><body bgcolor=white>
<h1>Query <font color=red>";
~;
print "domain/IP $chedominio - " if ($chedominio) ;
print qq~
$cosacerco</font></h1>
<table border=1 cellspacing=0 cellpadding=5>
<tr bgcolor=#E0E0E0>
<th>IP</th>
<th>date</th>
<th>log line</th>
</tr>\n";
~;
if ($chedominio) { # avoid to check domain if not required
foreach $linea (@pagina) {
$tot ++;
if ($linea =~ m#\Q$cosacerco\E#i ) {
        @pezzi = split (/ /,$linea);
        if ($pezzi[0] =~ m#\Q$chedominio\E#i) {
        $quante++;
        print "<tr><td>$pezzi[0]</td>";
        $pezzi[3] =~ s/\[//;
        print "<td>$pezzi[3]</td>";
        print "<td>$linea</td>";
        print "</tr>\n";
        }
        }
  }
} else {
foreach $linea (@pagina) {
$tot ++;
if ($linea =~ m#\Q$cosacerco\E#i ) {
        @pezzi = split (/ /,$linea);
        $quante++;
        print "<tr><td>$pezzi[0]</td>";
        $pezzi[3] =~ s/\[//;
        print "<td>$pezzi[3]</td>";
        print "<td>$linea</td>";
        print "</tr>\n";
        }
}
}

my $dopo = time;

my $tempo = $dopo-$adesso;

print qq~
</table>
<br><br>
Job time: $tempo seconds<br>
Find <b>$quante</b> lines on
<b>$tot</b> checked
</body></html>\n\n"
~;
### EOF ###

To use this utility you need a Web page like this one:

<HTML><HEAD>
<TITLE>Scan log</TITLE>
</HEAD>
<BODY bgcolor=white>
<h1>Scan log</h1>
<form action="/cgi-bin/scannalog.pl" method="post">
<table>
<tr><td><font face="Arial,Helvetica" size=2>password:</td><td><font size=3><input type="password" name="pw"></td></tr>
<tr><td><font face="Arial,Helvetica" size=2>Search for:</td><td><font size=3><input type="text" name="q" width=30></td></tr>
<tr><td><font face="Arial,Helvetica" size=2>Domain or IP:</td><td><font size=3><input type="text" name="d" width=30></td></tr>
</table>
<input type="submit" value="SEARCH">
</form>
</BODY>
</HTML>

You can use this utility freely, and you can modify and improve it (let me know, if you do this!): it is under Perl Artistic License.
It may be used for commercial use only by prior arrangement with the author (Marco Bernardini, webmaster of taggiasca.com).


--

indietroPrevious page   indietroUpper level   *Main page

*Producers   *Site Map

--
copyright © 1999 - 2000 by It-Web Information Technology - all rights reserved