I have an AWS account. I created it less than a year ago, so I get some free stuff with the account. In the billing dashboard, I was somewhat concerned to see this:
It is only 5 days to the end of the month, but I thought I hadn’t uploaded that many objects to S3, and wondered where this was coming from.
A few weeks ago, I was exploring AWS CloudTrail, which logs every API call, and I had set it up to store data for some later analysis. Now seemed like a good time to do so. I also wanted to explore AWS Athena, which is AWS’ implementation of Presto as a service, and allows you to run SQL queries directly on your S3 content (without having to import it into expensive Redshift, or elephantine EMR).
On the Cloud Trail interface is a handy link to Athena: it’s like they WANT you to use it:
Clicking that link starts a simple wizard that creates the “External Table” in Athena pointing to the S3 bucket where your CloudTrail logs are stored. This table tells Athena how to find the CloudTrail data, how to interpret it, how to read it, etc. The syntax is familiar to anyone with a background in SQL, and the details are familiar if you have dug into the depths of Apache Hive (SQL-on-Hadoop).
SELECT count(useragent) as QTY, eventname, sourceipaddress, awsregion
requestparameters LIKE '%my-cloudtrail-logs-bucket-%'
GROUP BY sourceipaddress, eventname, awsregion
ORDER BY QTY DESC
This code gives the top 5 combinations of event-name, sourceIP and region. Here are the results:
We learn several things from this exercise:
- The culprit, the reason I went over my Free Tier limit for S3, is in fact CloudTrail. It has put over 13,000 objects into an S3 bucket.
- I should also be concerned with the top row: Athena doing 173,000 “GetObject” calls to parse my data. That count increased to 202,989 the next time I ran the script. Row 3 has similar behaviour.
- The source IP addresses in lines 4 and 5 are my own. No concerns there.
The big money question: how much did the Athena script cost to run?
- Athena charges: $5.00 per TB scanned. The Athena console says it scanned 35 Mb, so the Athena cost is $0.00017
- S3 charges: $0.0004 per 1000 GET requests, which is about $0.08.
- Incidentally, those PUT requests from CloudTrail cost a further $0.07.
Closer inspection of the log files suggests that CloudTrail is logging API calls made by CloudTrail when it writes to S3, which it writes to S3 … CloudTrail added over 170 new objects the next time I ran this query! Time to modify my configuration of CloudTrail.
Justin – 26 March 2019