Parsing Apache Logs with tail, cut, sort, and uniq

A client experienced some intermittent website down time last week during the final few days of April 2021, and sent over that month’s Apache logs for me to see if there is anything out of the ordinary – excessive crawling, excessive probing, brute force password attacks, things of that nature. Below are a few commands I have used that I thought would be nice to keep handy for future uses. I am currently using Ubuntu 20.04 LTS.

While unrelated, just to form a complete picture, my client sent the logs to me in gz-compressed format. If you are not familiar on how to uncompress it, it is fairly straight forward:

gunzip 2021-APR.gz

Back on topic… I ended on parsing the file in three separate ways for me to get an overall view of things. I found that the final few days of April are represented in the roughly final 15,000 lines of the log file, so I decided to use the tail command as my main tool.

First, I did following command below to find which IP addresses hit the server the most:

tail -n 15000 filename.log | cut -f 1 -d ' ' | sort | uniq -c | sort -nr | more

Quick explanation:

  • The tail command pulls the final 15,000 lines from the log file (final few days of the month)
  • The cut command parses each line using a space delimiter and returns the first field (the IP addres)
  • The sort command sorts the results thus far
  • The uniq command groups the results thus far and provides a count
  • The second sort command reverse the sort so the highest result is on top
  • Finally, the more command creates screen-sized pagination so it’s easier to read

There is always more than one way to do something in Linux, of course. Just as an aside, the following functions very similarly:

cat filename.log | awk '{print $1}' | sort | uniq -c | sort -nr | tail -15000 | more

Then, I thought it would be nice to get an idea of how many requests were made per hour. This can be achieved with the command below.

tail -n 15000 filename.log | cut -f 4 -d ' ' | cut -f 1,2 -d ':' | sort | uniq -c | more

The main difference here is that I opted for the 4th (rather than 1st) result of the cut command, which gets me the timestamp element (rather than IP address), and then a second cut command parses it on the colon symbol and returns the first (date) and second (hour) for further grouping.

Finally, I tweaked it a little bit more so I get an idea of whether there was excessive requests within any minute-span. This can be achieved by expanding the second cut command slightly, as per below.

tail -n 15000 filename.log | cut -f 4 -d ' ' | cut -f 1,2,3 -d ':' | sort | uniq -c | more

Performing 301 Redirect with Apache, IIS, PHP, and others

Apache

If you use Apache with the mod-rewrite module installed, 301 Redirect may be set up by editing your .htaccess file found at the root directory of your website. The following is a sample of what you might add to your .htaccess file; in this example, we are redirecting any visits to /blah/pictures.html to its new location, http://ww2db.com/photo.php.

Redirect 301 /blah/pictures.html http://ww2db.com/photo.php

Sometimes you may encounter a situation where you changed technology of your platform, for example, from static HTML pages to dynamic PHP pages. With the sample code below, you can redirect all HTML pages to PHP pages by the same name.

RedirectMatch 301 (.*).html$ http://ww2db.com$1.php

Looking to use .htaccess to redirect an entire site because you have moved from one domain to another? This is what you want to do:

Redirect 301 / http://ww2db.com/

PHP

If your site’s pages are made of PHP files, redirection is fairly easy. If you had moved the /old/pictures.php page to the new location of http://ww2db.com/photo.php, modify the /old/pictures.php page by adding the following PHP snipped on the top of the code, ie. before any HTML code is written to screen.

header("HTTP/1.1 301 Moved Permanently");
header("Location: http://ww2db.com/photo.php");

IIS

If you using Microsoft’s Internet Information Services (IIS), perform the following steps to do a 301 Redirect.

  • Right click on “My Computer”, go to “Manage”, “Services and Applications”, and then “Internet Information Services (IIS) Manager”.
  • Under “Web Sites”, look for the particular folder or file you wish to redirect. Right click and select “Properties”.
  • Click the “A redirection to a URL” radio button. In the bottom section that had just appeared, and enter the new URL in the “Redirect to:” text box, and make sure to check “A permanent redirection for this resource” checkbox.
  • Click OK to complete the process.

ASP and ASP.NET

Classic Active Server Pages (ASP) web pages can be redirected in a method very similar to PHP above.

Response.Status="301 Moved Permanently"
Response.AddHeader "Location","http://ww2db.com/photo.php"

ASP.NET redirect code is very similar to its ASP predecessor’s.

private void Page_Load(object sender, System.EventArgs e) {
  Response.Status="301 Moved Permanently";
  Response.AddHeader("Location","http://ww2db.com/photo.php");
}

JSP

Redirection for Java Server Pages (JSP), again, is similar to the other scripting languages we have seen above.

response.setStatus(301);
response.setHeader("Location","http://ww2db.com/photo.php");
response.setHeader("Connection", "close");

CGI/Perl

If you are using Perl-based CGI code, this is the code you may wish to deploy to perform a 301 Redirect.

$q=new CGI;
print $q->redirect("http://ww2db.com/photo.php");

Cold Fusion

ColdFusion pages can be redirect also by working with the header.

<.cfheader statuscode="301" statustext="Moved permanently">
<.cfheader name="Location" value="http://ww2db.com/photo.php">

Ruby on Rails

Not surprisingly, Ruby on Rails pages are redirected in a very similar manner.

def old_action
headers["Status"]="301 Moved Permanently"
redirect_to "http://ww2db.com/photo.php"
end

Using .htaccess file to control web directory access

Naturally, if the .htaccess (ht.acl in Windows) does not already exist in the directory we wish to protect, we must create it first. It is a plain text file, so you may use any text editor to create/modify this file, such as pico, emacs, Notepad, or TextEdit.

Our first step is to add these lines below to the .htaccess file.

AuthName "This is a restricted area, please log in first."
AuthType Basic
AuthUserFile /directory/path/passwdfile

AuthName is the text that will appear in the browser pop-up when the user is challenged. AuthType value of “Basic” means we are using basic HTTP authentication. AuthUserFile is the path and file name of our password file; more on that later.

Also in the .htaccess file, we add a list of user names we wish to allow to access the web directory we are locking down. For example:

require user jdoe
require user spannu

We are now done with the .htaccess file. Now we just have to create the password file. In the Apache bin, there is an executable called “htpasswd”. The first example below is used to create a new password file with the user “jdoe”; note that when using the -c parameter to create a new file, we will overwrite any password file that exists in the same directory, so be careful. To add a new user to an existing file, we should run the second example, the difference being the lack of the -c parameter.

htpasswd -c -b /directory/path/passwdfile jdoe secUr3Pwd

htpasswd -b /directory/path/passwdfile spannu an0therPwd

The -b parameter allows us to type in the password in the command line, which is helpful when you are setting up a script that creates a large number of users at once. If having the password in the command line cache is a concern, just remove the -b parameter, and we will be prompted to enter a password for each user.

We should now be all set. The next web visitor that reaches the directory where the .htaccess file resides should be challenged with a password prompt.

To remove a user from a certain password file:

htpasswd -D /directory/path/passwdfile jdoe

For our reference, below is the help text for the htpasswd command.

Usage:
        htpasswd [-cmdpsD] passwordfile username
        htpasswd -b[cmdpsD] passwordfile username password

        htpasswd -n[mdps] username
        htpasswd -nb[mdps] username password
 -c  Create a new file.
 -n  Don't update file; display results on stdout.
 -m  Force MD5 encryption of the password (default).
 -d  Force CRYPT encryption of the password.
 -p  Do not encrypt the password (plaintext).
 -s  Force SHA encryption of the password.
 -b  Use the password from the command line rather than prompting for it.
 -D  Delete the specified user.
On Windows, NetWare and TPF systems the '-m' flag is used by default.
On all other systems, the '-p' flag will probably not work.