Any software application must log events to record them. Nginx Access logs, in particular, record client IP addresses, URLs, and HTTP status codes for web server requests. This blog post look into a Python script that generates a sample nginx access log for testing and debugging.

The script


import random
import time
import argparse

# create an ArgumentParser object and set the description and default values for the optional arguments
parser = argparse.ArgumentParser(description='Generate a sample access log.')
parser.add_argument('--filename', type=str, help='Name of the log file', default='access.log')
parser.add_argument('--lines', type=int, help='Number of log lines to generate', default=2000)

# create a list of IP addresses to use in the log lines
ip_list = ['192.168.1.1', '192.168.1.2', '192.168.1.3', '192.168.1.4', '192.168.1.5',
           '192.168.1.6', '192.168.1.7', '192.168.1.8', '192.168.1.9', '192.168.1.10',
           '192.168.1.11', '192.168.1.12', '192.168.1.13', '192.168.1.14', '192.168.1.15',
           '192.168.1.16', '192.168.1.17', '192.168.1.18', '192.168.1.19', '192.168.1.20',
           '192.168.1.21', '192.168.1.22', '192.168.1.23', '192.168.1.24', '192.168.1.25',
           '192.168.1.26', '192.168.1.27', '192.168.1.28', '192.168.1.29', '192.168.1.30',
           '192.168.1.31', '192.168.1.32', '192.168.1.33', '192.168.1.34', '192.168.1.35',
           '192.168.1.36', '192.168.1.37', '192.168.1.38', '192.168.1.39', '192.168.1.40',
           '192.168.1.41', '192.168.1.42', '192.168.1.43', '192.168.1.44', '192.168.1.45',
           '192.168.1.46', '192.168.1.47', '192.168.1.48', '192.168.1.49', '192.168.1.50',
           '2001:db8:0:1:0:0:0:1', '2001:db8:0:1:0:0:0:2', '2001:db8:0:1:0:0:0:3',
           '2001:db8:0:1:0:0:0:4', '2001:db8:0:1:0:0:0:5', '2001:db8:0:1:0:0:0:6',
           '2001:db8:0:1:0:0:0:7', '2001:db8:0:1:0:0:0:8', '2001:db8:0:1:0:0:0:9',
           '2001:db8:0:1:0:0:0:a', '2001:db8:0:1:0:0:0:b', '2001:db8:0:1:0:0:0:c',
           '2001:db8:0:1:0:0:0:d', '2001:db8:0:1:0:0:0:e']

# create a list of user agent strings to use in the log lines
user_agents = ['Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0', 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 OPR/45.0.2552.898', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36']

args = parser.parse_args()

# open the log file in write mode and create log lines using the specified number of lines
with open(args.filename, 'w') as f:
    for i in range(args.lines):
        ip = random.choice(ip_list)
        timestamp = time.strftime('%d/%b/%Y:%H:%M:%S %z', time.localtime())
        method = random.choice(['GET', 'POST', 'PUT', 'DELETE'])
        url = '/' + ''.join(random.choices('abcdefghijklmnopqrstuvwxyz', k=random.randint(1, 10)))
        refurl = random.choice(['https://www.example.com/','https://www.google.com/','https://www.ibm.com/','https://www.msn.com/'])
        protocol = 'HTTP/1.1'
        status_code = random.choice([200, 201, 204, 301, 302, 400, 401, 403, 404, 500])
        size = random.randint(100, 10000)
        user_agent = random.choice(user_agents)
        # create the log line with the chosen values and write it to the file
        line = f"{ip} - - [{timestamp}] \"{method} {url} {protocol}\" {status_code} {size} \"{refurl}\" \"{user_agent}\"\n"
        f.write(line)

Usage

You can run this script with the following command:


python generate_logs.py --filename mylog.log --lines 5000

This will generate 5000 log lines in a file named mylog.log. If you omit the --filename and --lines arguments, the defaults of access.log and 2000 lines will be used.

How It Works

The Python script uses the built-in random and time modules to generate random data for each log line. It also uses the argparse module to parse command-line arguments, allowing us to specify the name of the log file to be generated and the number of log lines to be generated. Here’s a breakdown of the key parts of the script:

Parsing Command-Line Arguments: The argparse module is used to parse the --filename and --lines optional arguments, which specify the name of the log file to be generated and the number of log lines to be generated, respectively.

Generating Log Lines: The for loop generates log lines by iterating args.lines times. For each iteration, a random IP address is chosen from the list of IP addresses, the current timestamp is generated in the specified format using the time module, a random HTTP method is chosen, a random URL path of 1 to 10 lowercase letters is generated, a random referer URL is chosen, the protocol is set to HTTP/1.1, a random HTTP status code is chosen, a random response size between 100 and 10000 bytes is generated, and a random user agent string is chosen. These values are then used to construct the log line, which is written to the log file.

Writing Log Lines to a File: The with statement is used to open the log file in write mode and create log lines using the specified number of lines. The write() method is used to write each log line to the file.

Generated Nginx Log Screenshot
Generated Nginx Log Screenshot

Conclusion

We looked at a Python script that makes it easy and quick to make sample Nginx access logs for testing and debugging. For parsing command-line arguments, the script uses built-in modules like random and time and the argparse module. The generated log file has information about each request to a web server, such as the IP address of the client, the URL requested, and the HTTP status code returned by the server.

Link to Github:
https://github.com/linuxinsights/Generating-Nginx-Access-Logs

Got any queries or feedback? Feel free to drop a comment below!

Categorized in: