Download AWS S3 Logs with Python & boto
I’ve started to move my static content for some of my web sites to Amazon Web Services using S3 and CloudFront for delivery. I’ve enabled logging for my CloudFront distributions as well as my public S3 buckets, and wanted to be able to automatically download the logs using cron to my server for processing with AWStats.
To make this happen I’ve written a script in Python with the boto module that downloads all generated log files to a local folder and then deletes them from the Amazon S3 Bucket when done. The log files downloaded to the local folder can then be further processed with logresolvemerge and AWStats.
You need to have the boto module installed for this to work. Personally I’m working with Ubuntu 10.04, where boto can be easily installed by executing:
sudo apt-get install python-boto
The script takes some command line arguments that are listed in the doc header. All of these can have a default value set in the head of the get_logs class. If you set default values in the script, the command line arguments are useful if you need to override a default value on some occasions.
get-aws-logs.py
#! /usr/bin/env python
"""Download and delete log files for AWS S3 / CloudFront
Usage: python get-aws-logs.py [options]
Options:
-b ..., --bucket=... AWS Bucket
-p ..., --prefix=... AWS Key Prefix
-a ..., --access=... AWS Access Key ID
-s ..., --secret=... AWS Secret Access Key
-l ..., --local=... Local Download Path
-h, --help Show this help
-d Show debugging information while parsing
Examples:
get-aws-logs.py -b eqxlogs
get-aws-logs.py --bucket=eqxlogs
get-aws-logs.py -p logs/cdn.example.com/
get-aws-logs.py --prefix=logs/cdn.example.com/
This program requires the boto module for Python to be installed.
"""
__author__ = "Johan Steen (http://www.artstorm.net/)"
__version__ = "0.5.0"
__date__ = "28 Nov 2010"
import boto
import getopt
import sys, os
_debug = 0
class get_logs:
"""Download log files from the specified bucket and path and then delete them from the bucket.
Uses: http://boto.s3.amazonaws.com/index.html
"""
# Set default values
AWS_BUCKET_NAME = '{bucket}'
AWS_KEY_PREFIX = '{prefix}'
AWS_ACCESS_KEY_ID = '{access key}'
AWS_SECRET_ACCESS_KEY = '{secret key}'
LOCAL_PATH = '{local path}'
# Don't change below here
s3_conn = None
bucket_list = None
def __init__(self):
s3_conn = None
bucket_list = None
def start(self):
"""Connect, get file list, copy and delete the logs"""
self.s3Connect()
self.getList()
self.copyFiles()
def s3Connect(self):
"""Creates a S3 Connection Object"""
self.s3_conn = boto.connect_s3(self.AWS_ACCESS_KEY_ID, self.AWS_SECRET_ACCESS_KEY)
def getList(self):
"""Connects to the bucket and then gets a list of all keys available with the chosen prefix"""
bucket = self.s3_conn.get_bucket(self.AWS_BUCKET_NAME)
self.bucket_list = bucket.list(self.AWS_KEY_PREFIX)
def copyFiles(self):
"""Creates a local folder if not already exists and then download all keys and deletes them from the bucket"""
# Using makedirs as it's recursive
if not os.path.exists(self.LOCAL_PATH):
os.makedirs(self.LOCAL_PATH)
for key_list in self.bucket_list:
key = str(key_list.key)
# Get the log filename (L[-1] can be used to access the last item in a list).
filename = key.split('/')[-1]
# check if file exists locally, if not: download it
if not os.path.exists(self.LOCAL_PATH+filename):
key_list.get_contents_to_filename(self.LOCAL_PATH+filename)
if _debug:
print "Downloaded from bucket: "+filename
# check so file is downloaded, if so: delete from bucket
if os.path.exists(self.LOCAL_PATH+filename):
key_list.delete()
if _debug:
print "Deleted from bucket: "+filename
def usage():
print __doc__
def main(argv):
try:
opts, args = getopt.getopt(argv, "hb:p:l:a:s:d", ["help", "bucket=", "prefix=", "local=", "access=", "secret="])
except getopt.GetoptError:
usage()
sys.exit(2)
logs = get_logs()
for opt, arg in opts:
if opt in ("-h", "--help"):
usage()
sys.exit()
elif opt == '-d':
global _debug
_debug = 1
elif opt in ("-b", "--bucket"):
logs.AWS_BUCKET_NAME = arg
elif opt in ("-p", "--prefix"):
logs.AWS_KEY_PREFIX = arg
elif opt in ("-a", "--access"):
logs.AWS_ACCESS_KEY_ID = arg
elif opt in ("-s", "--secret"):
logs.AWS_SECRET_ACCESS_KEY = arg
elif opt in ("-l", "--local"):
logs.LOCAL_PATH = arg
logs.start()
if __name__ == "__main__":
main(sys.argv[1:])
Have in mind that I’m pretty new to Linux and to Python, so I bet things can be solved better, easier or in a more beautiful way than what I’ve done, as well as making it more fail safe.
Feel free to suggest improvements that can be made to the code.


karthickDecember 3, 2010
This works, cool. Thanks for the script.
( )JamieJanuary 5, 2011
Can you give more details on the full setup? I’m trying to do the exact same thing with S3/Ubuntu/AWstats. a Walkthrough would be awesome.
( )JohanJanuary 5, 2011
I sure can write a follow up post to this one and do that. Thanks for the interest.
Anything in particular you would like me to include? Or else I can just wing it of how I’ve set it up.
Cheers,
( )Johan
JamieJanuary 5, 2011
A full guide aimed at newbs would be great. I use ubuntu in a command line environment so I’m not a total goof linux wise but I’ve never really used boto and python scripting so I’ve had to dig around some to get it working right.
I’ve dug all over for an S3 log solution and this is the best i’ve found, so a guide on a full setup start to finish i think would be useful for tons of people. I know there has to be many others in the same boat.
( )JohanJanuary 7, 2011
That I can do.
It’s not that much more, but a script to tie it all together. So I hope it will make sense. Keep your eye out and I’ll have a follow up post up and alive within the next few days, which I hope will give you the answers you are looking for.
Cheers,
Johan
JohanJanuary 13, 2011
Check out this one:
http://wpstorm.net/2011/01/awstats-amazon-s3-cloudfront/
Enjoy,
Johan