Command: High Performance Log Analyzer and Parser

Overview

Library

Command

The sequence command is developed to demonstrate the use of this package. You can find it in the sequence directory. The sequence command implements the sequential semantic log parser.

Usage:
  sequence [command]

Available Commands:
  scan                      scan will tokenize a log file or message and output a list of tokens
  analyze                   analyze will analyze a log file and output a list of patterns that will match all the log messages
  parse                     parse will parse a log file and output a list of parsed tokens for each of the log messages
  bench                     benchmark scanning or parsing of a log file, no output is provided
  help [command]            Help about any command

 Available Flags:
  -c, --config="./sequence.toml": TOML-formatted configuration file
  -f, --fmt="general": format of the message to tokenize, can be 'json' or 'general'
  -h, --help=false: help for sequence
  -i, --infile="": input file, required
  -o, --outfile="": output file, if empty, to stdout
  -d, --patdir="": pattern directory,, all files in directory will be used
  -p, --patfile="": initial pattern file, optional

Use "sequence help [command]" for more information about that command.

Scan

Usage:
  sequence scan [flags]

 Available Flags:
  -h, --help=false: help for scan
  -m, --msg="": message to tokenize

Example

  $ ./sequence scan -m "jan 14 10:15:56 testserver sudo:    gonner : tty=pts/3 ; pwd=/home/gonner ; user=root ; command=/bin/su - ustream"
  #   0: { Field="%funknown%", Type="%ts%", Value="jan 14 10:15:56" }
  #   1: { Field="%funknown%", Type="%literal%", Value="testserver" }
  #   2: { Field="%funknown%", Type="%literal%", Value="sudo" }
  #   3: { Field="%funknown%", Type="%literal%", Value=":" }
  #   4: { Field="%funknown%", Type="%literal%", Value="gonner" }
  #   5: { Field="%funknown%", Type="%literal%", Value=":" }
  #   6: { Field="%funknown%", Type="%literal%", Value="tty" }
  #   7: { Field="%funknown%", Type="%literal%", Value="=" }
  #   8: { Field="%funknown%", Type="%string%", Value="pts/3" }
  #   9: { Field="%funknown%", Type="%literal%", Value=";" }
  #  10: { Field="%funknown%", Type="%literal%", Value="pwd" }
  #  11: { Field="%funknown%", Type="%literal%", Value="=" }
  #  12: { Field="%funknown%", Type="%string%", Value="/home/gonner" }
  #  13: { Field="%funknown%", Type="%literal%", Value=";" }
  #  14: { Field="%funknown%", Type="%literal%", Value="user" }
  #  15: { Field="%funknown%", Type="%literal%", Value="=" }
  #  16: { Field="%funknown%", Type="%string%", Value="root" }
  #  17: { Field="%funknown%", Type="%literal%", Value=";" }
  #  18: { Field="%funknown%", Type="%literal%", Value="command" }
  #  19: { Field="%funknown%", Type="%literal%", Value="=" }
  #  20: { Field="%funknown%", Type="%string%", Value="/bin/su" }
  #  21: { Field="%funknown%", Type="%literal%", Value="-" }
  #  22: { Field="%funknown%", Type="%literal%", Value="ustream" }

Parse

Usage:
  sequence parse [flags]

 Available Flags:
  -h, --help=false: help for parse
  -i, --infile="": input file, required
  -o, --outfile="": output file, if empty, to stdout
  -d, --patdir="": pattern directory,, all files in directory will be used
  -p, --patfile="": initial pattern file, required

The following command parses a file based on existing rules. Note that the performance number (9570.20 msgs/sec) is mostly due to reading/writing to disk. To get a more realistic performance number, see the benchmark section below.

  $ ./sequence parse -d ../../patterns -i ../../data/sshd.all  -o parsed.sshd
  Parsed 212897 messages in 22.25 secs, ~ 9570.20 msgs/sec

This is an entry from the output file:

  Jan 15 19:39:26 jlz sshd[7778]: pam_unix(sshd:session): session opened for user jlz by (uid=0)
  #   0: { Field="%createtime%", Type="%ts%", Value="jan 15 19:39:26" }
  #   1: { Field="%apphost%", Type="%string%", Value="jlz" }
  #   2: { Field="%appname%", Type="%string%", Value="sshd" }
  #   3: { Field="%funknown%", Type="%literal%", Value="[" }
  #   4: { Field="%sessionid%", Type="%integer%", Value="7778" }
  #   5: { Field="%funknown%", Type="%literal%", Value="]" }
  #   6: { Field="%funknown%", Type="%literal%", Value=":" }
  #   7: { Field="%funknown%", Type="%string%", Value="pam_unix" }
  #   8: { Field="%funknown%", Type="%literal%", Value="(" }
  #   9: { Field="%funknown%", Type="%literal%", Value="sshd" }
  #  10: { Field="%funknown%", Type="%literal%", Value=":" }
  #  11: { Field="%funknown%", Type="%string%", Value="session" }
  #  12: { Field="%funknown%", Type="%literal%", Value=")" }
  #  13: { Field="%funknown%", Type="%literal%", Value=":" }
  #  14: { Field="%object%", Type="%string%", Value="session" }
  #  15: { Field="%action%", Type="%string%", Value="opened" }
  #  16: { Field="%funknown%", Type="%literal%", Value="for" }
  #  17: { Field="%funknown%", Type="%literal%", Value="user" }
  #  18: { Field="%dstuser%", Type="%string%", Value="jlz" }
  #  19: { Field="%funknown%", Type="%literal%", Value="by" }
  #  20: { Field="%funknown%", Type="%literal%", Value="(" }
  #  21: { Field="%funknown%", Type="%literal%", Value="uid" }
  #  22: { Field="%funknown%", Type="%literal%", Value="=" }
  #  23: { Field="%funknown%", Type="%integer%", Value="0" }
  #  24: { Field="%funknown%", Type="%literal%", Value=")" }

Benchmark

Usage:
  sequence bench [flags]

 Available Flags:
  -c, --cpuprofile="": CPU profile filename
  -h, --help=false: help for bench
  -i, --infile="": input file, required
  -d, --patdir="": pattern directory,, all files in directory will be used
  -p, --patfile="": pattern file, required
  -w, --workers=1: number of parsing workers

The following command will benchmark the parsing of two files. First file is a bunch of sshd logs, averaging 98 bytes per message. The second is a Cisco ASA log file, averaging 180 bytes per message.

$ ./sequence bench -p ../../patterns/sshd.txt -i ../../data/sshd.all
Parsed 212897 messages in 1.69 secs, ~ 126319.27 msgs/sec

$ ./sequence bench -p ../../patterns/asa.txt -i ../../data/allasa.log
Parsed 234815 messages in 2.89 secs, ~ 81323.41 msgs/sec

Performance can be improved by adding more cores:

GOMAXPROCS=2 ./sequence bench -p ../../patterns/sshd.txt -i ../../data/sshd.all -w 2
Parsed 212897 messages in 1.00 secs, ~ 212711.83 msgs/sec

$ GOMAXPROCS=2 ./sequence bench -p ../../patterns/asa.txt -i ../../data/allasa.log -w 2
Parsed 234815 messages in 1.56 secs, ~ 150769.68 msgs/sec

< Getting Started Configuration >