Quick Analysis of Apache Logs for Visited Page Count Statistics using Shell-Tools

A small chart using boxes made in gnuplot. It shows decreasing boxes representing page visit counts generated from Apache. Actual page names were redacted.

Sometimes, Apache logs need to be analyzed in order to find frequently visited pages. This can either be done by using complex tools, or one could go to simple shell one-liners that do the one sub-task that is required quickly and without issues. In this case, I wanted to see the most often visited pages of a web application. I did this using a combination of zcat, awk, sort, uniq and head, combined with gnuplot to visualize the results. This is the script:

zcat access_log*.gz              \
  | awk '{print $7}'             \
  | awk -F'?' '{print $1}'       \
  | grep -v -F -f ignore.txt     \
  | sort | uniq -c | sort -n -r  \
  | head -n 20                   \
  | gnuplot -e "set xtics rotate; plot '-' using 0:1:xtic(2) with boxes notitle" -p

Gnuplot and head may also be left out of course, in order to get more results out of this small peek into the logs. The ignore.txt file contains patterns (line per line) for grep that are then ignored, which is very useful for .css, .js, and other such static files that should be left out from the statistics.