tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: A log monitoring tool



Hi David,
On Tue, Jan 3, 2012 at 1:57 AM, David Holland 
<dholland-tech%netbsd.org@localhost> wrote:

> That's about right; it's not really research (it's more about applying
> current research in machine learning) but it's not trivial. Also, the
> first thing to do is go find out what other people have done along the
> same lines. I think there has been some since I originally wrote that.

So, I looked around a bit. My Google searches lead me to two similar projects.

1. logwatch (http://sourceforge.net/projects/logwatch/): Quoting from
the project's page "Logwatch is a customizable log analysis system.
Logwatch parses through your system's logs and creates a report
analyzing areas that you specify."

It is basically a Perl script which does some text processing on the
logs and provides a summarized report at regular intervals. Although I
did not really looked into the code but it does not seem to be a
machine learning based system. You can see the results of logwatch in
this article: http://www.linux-mag.com/id/7800/

2. syslog-ng 
(http://www.balabit.com/network-security/syslog-ng/opensource-logging-system/overview):
syslog-ng is a rewrite of the traditional Unix syslogd daemon
(licensed under LGPL). Apart from providing logging facilities it also
provides post-log analysis facilities. And in addition to the flat
file based storage it also supports Sql databases.

In it's most recent version (3.x) they have introduced features like
log classification and filtering. They use a pattern database to match
the log entries and perform the classification accordingly. The
Wikipedia article (http://en.wikipedia.org/wiki/Syslog-ng) and this
article (http://lwn.net/Articles/369075/) provide more details.
Quoting from the latter article, a typical pattern would look like
this:

Accepted @QSTRING:auth_method: @ for @QSTRING:username: @ from \
        @QSTRING:client_addr: @ port @NUMBER:port:@ @QSTRING:protocol_version: @

Where the strings beginning with @ specify to the parser that the data
in this field has a specific structure so parse accordingly. For
example the parser would parse an IPv4 address (@IPV4) differently
than an IPv6 address (@IPV6).  So, although this comes close to
perhaps what I would like to do with this project but still it does
not seem to be using any AI techniques underneath.

I think using techniques like logistic regression or neural networks
(provided enough training data), a sophisticated model for classifying
the logs can be developed. Although I have never implemented such
systems myself but I think it is an area worth investigating. Recently
a bug prediction system was developed at Google where they developed a
training model out of their SCM logs
(http://google-engtools.blogspot.com/2011/12/bug-prediction-at-google.html),
so there is hope.

> I'm not qualified to advise it though :-/

I am not sure I can do it as well, but I just learned some machine
learning and I thought this would be a cool project to try on. If I
can come to an understanding of what is really expected of such a tool
then perhaps the job of implementing it will become much easier. I
think you can advise or help with that :)

Thanks
Abhinav


Home | Main Index | Thread Index | Old Index