Export Cloudwatch Logs to AWS S3 – Deploy using SAM

With due reference to the blog which helped me in the right direction, the tensult blogs article Exporting of AWS CloudWatch logs to S3 using Automation, though at some points I have deviated from the original author’s suggestion.

Some points are blindly my preference and some other due to the suggested best practices. I do agree that starters, would be better off with setting IAM policies with ‘*’ in resource field. But when you move things into production it is recommended to use least required permissions. Also some critical policies were missing from the assume role policy. Another unnecessary activity was the checking of existence of s3 bucket and attempt to create if not exists, at each repeated execution. Again for this purpose the lambda role needed create bucket permission. All these were over my head, and the out come is this article.

Well if you need cloudwatch logs to be exported to S3 for whatever reason, this could save your time a lot, though this needs to be run in every different region where you need to deploy the stack. Please excuse me as the whole article expects to have aws-cli and sam-cli pre installed.

First lets create the bucket and apply a policy that permits the cloud watch logs service to write into the bucket. Also the bucket owner has full permission on any created object.

aws s3 mb s3://<bucketname> --region <AWS_REGION>
aws s3api put-bucket-policy --bucket <bucketname> --policy file://policy.json

The first line creates the bucket and the second line adds the policy, for the sake of explanation the file is reproduced below.

  "Version": "2012-10-17",
  "Statement": [{
      "Effect": "Allow",
      "Principal": {
        "Service": "logs.ap-south-1.amazonaws.com"
      "Action": "s3:GetBucketAcl",
      "Resource": "arn:aws:s3:::<bucketname>"
      "Effect": "Allow",
      "Principal": {
        "Service": "logs.ap-south-1.amazonaws.com"
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::<bucketname>/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control"

Now we can deploy the template using

sam deploy -g

Then respond with the responses as expected.

To cut the story short, using the template we create two roles, one for the state machine, the second for lambda, since the state machine requires lambda:InvokeFunction permission on the lambda and actual lambda does not need the same. Continuing the Lambda Function, which actually does the whole job, is mostly a branch from the same function provided in the original article but logically those parts which does the bucket existance check and the sort was removed.

The whole template and its accessories can be downloaded here. Download Export Cloudwatch Logs to AWS S3 SAM Template.

Low-Code/No-Code path to Business Applications – AWS Scores again

Introducing HoneyCode a new, fully managed low-code/no-code development tool that aims to make it easy for anybody in a company to build their own applications. All of this, of course, is backed by a database in AWS and a web-based, drag-and-drop interface builder.

Developers can build applications for up to 20 users for free. After that, they pay per user and for the storage their applications take up. There is no wait for applications to be approved on play store / app store as the applications are not directly deployed, rather through a pre deployed player ( interpreter ).

Like similar tools, Honeycode provides users with a set of templates for common use cases like to-do list applications, customer trackers, surveys, schedules and inventory management. Traditionally, AWS argues, a lot of businesses have relied on shared spreadsheets to do these things.

Honeycode allows AWS clients to build interactive mobile and web applications with no programming required. Honeycode has a simple visual application builder customers can use to, in Amazon’s words, “create applications that range in complexity from a task-tracking application for a small team to a project management system that manages a complex workflow for multiple teams or departments.”

The company is hoping that Honeycode can eliminate the need to resort to spreadsheets and emails to schedule events, create to-do—lists, track personnel progress and track content and inventory, among other business functions. Honeycode apps will make it easier for clients to sort, filter and link data together and will also give them way to create data dashboards that are updated in real-time. Clients don’t even have to worry about managing and maintaining any hardware or software — Amazon will take care of those.

Honeycode has pre-built templates clients can use, but they can also build apps from scratch using the visual spreadsheet-like interface to manually add elements like lists, buttons and input fields onto app screens. Apps with up to 20 users are free, and clients will be able to pay for more users and storage if they need to.

During a test drive, I have felt that this is going to bring a big change.

Take advantage of AI/ML to do your Code Reviews and Profiling

Get application performance recommendations and automated code reviews through Amazon CodeGuru, which is a machine learning service. Find the most expensive lines of code that can affect application performance and frustrate you with troubleshooting. The service gives you best recommendations to fix or write better code.

Powered by machine learning, best practices, and hard-learned lessons across millions of code reviews and thousands of applications profiled on open source projects and internally at Amazon, CodeGuru is ready to face any challenge. Find and fix code issues such as resource leaks, potential concurrency race conditions, and wasted CPU cycles, using CodeGuru. Also with moderate, on-demand pricing, it is affordable enough to use for almost all code review and application one might need. Java applications are currently supported by CodeGuru, with support for more languages in the anvil. Catch and resolve problems earlier and with better efficency, with CodeGuru such that you can build and run better software.

The machine learning models are trained on Amazon’s code bases comprising hundreds of thousands of internal projects, as well as over tens of thousands of open source projects in GitHub. Tens of thousands of Amazon developers have contributed to CodeGuru’s training based on years of experience in code review and application profiling. For example, CodeGuru Reviewer is trained using rule mining and supervised machine learning models that use a combination of logistic regression and neural networks.

CodeGuru Profiler is always searching for application performance optimizations, identifying your most “expensive” lines of code and recommending ways to fix them to reduce CPU utilization, cut compute costs, and improve application performance. CodeGuru Profiler provides specific recommendations so you can take action immediately on issues such as excessive recreation of expensive objects, expensive deserialization, usage of inefficient libraries, and excessive logging. CodeGuru Profiler runs continuously in production, consuming minimal CPU capacity so it does not significantly impact application performance. You can begin profiling your application by installing a small agent using code that CodeGuru provides and configuring it in the CodeGuru console.

CodeGuru is currently in “Preview” and is expected to be GA soon.

EBS Provisioning VS Performance – Confusions cleared

For almost over the last decade ( since 2009 ), I was never worried about the EBS performance indexes. Used to create a single volume and attached to an instance as and when required. Today just for wandering, and to entertain myself, did a couple of tests. Thanks to aws-cli without which this could have taken more than what it would.

Straight into what I found in a short summary. Note that the values are Bps.

Raid 0631M671M740M1.3G366K631M8851.47
Raid 5336M250M332M1.2G9.9k315M8306.52
Performance across different combination of EBS Volumes

Kicked up an EC2 instance and mounted a 200gb EBS volume to run a series of tests. Thanks to nixCraft article titled “Linux and Unix Test Disk I/O Performance With dd Command“.


 dd if=/dev/zero of=/data/test1.img bs=1G count=1 oflag=dsync
 rm -f /data/test1.img
 dd if=/dev/zero of=/data/test2.img bs=64M count=1 oflag=dsync
 rm -f /data/test2.img
 dd if=/dev/zero of=/data/test3.img bs=1M count=256 conv=fdatasync
 rm -f /data/test3.img
 dd if=/dev/zero of=/data/test4.img bs=8k count=10k
 rm -f /data/test4.img
 dd if=/dev/zero of=/data/test4.img bs=512 count=1000 oflag=dsync
 rm -f /data/test5.img
 dd if=/dev/zero of=/data/testALT.img bs=1G count=1 conv=fdatasync
 rm -f /data/test6.img

hdparm -T /dev/<device>
With single disk of 200G

Well after that tore down removed the single ebs volume and deleted the same. Then created 12 20Gb ebs volumes. The listing in text mode was dumped into a text file and against each, a device id of pattern xvd[h-s] was added to the text. This was done just to further enable looping commands.

Then 10 20GB disks were attached to the instance, and internally this was assembled into /dev/md0 using raid level 0. The same test was run again and the output is as below.

With 10 x 20GB in raid level 0

How good it seems but we are allocating too much, actually most of our major projects would not take more than a 100gigs and this was way toomuch. So thought about playing with it further.

The raid was stopped, unmounted and super-block erased using dd command. The next test was conducted with the same configuration only change was that I just added 5 disks this time.

Ha! Ha.. not much of a degradation. I am still confused at this one, might be that we are having only 2VCPu in the vm. Sometime later I should attempt this with a different hardware.

But again thought about another option, why not try the raid5. Yes again did the cleanup, and added 6 of the volumes back to the same instance and did the same test.

Aw!.. as expected the performance is dropped 🙁 might be due to the parity writing overhead.

As per the EBS Volume types document it is more or less 3 IOPS per GiB of volume size, with a minimum of 100 IOPS. This means the 200G allocates about 600 IOPs, raid 0 with 10 20G will give 1000 IOPs, raid0 with 5 x 20G will give 500, and the raid5 with 6×20 has 600 IOPs.

For a reference the commands which I used are illustrated below. The ami id used is from Ubuntu Amazon EC2 AMI Finder, for region ap-south-1, focal hvm.

# create instance
aws ec2 run-instances --image-id ami-06d66ae4e25be4617 --security-group-ids <sg-id> --instance-type m5.large --count 1 --subnet-id <subnet-id> --key-name <keyname>

# create volume attach and then finally cleanup
aws ec2 create-volume --availability-zone ap-south-1c --size 200
aws ec2 attach-volume --device xvdf --volume-id vol-058b551d8ce21e37d --instance-id i-04373f3985b1a13e6
aws ec2 detach-volume --volume-id vol-058b551d8ce21e37d --instance-id i-04373f3985b1a13e6
aws ec2 delete-volume --volume-id vol-058b551d8ce21e37d

# create 12 identical volumes 
seq 1 12 | while read i; do aws ec2 create-volume --availability-zone ap-south-1c --size 20; done

# find those volumes which are available
aws ec2 describe-volumes --output text | grep available

# vol-071e9b06c24627554 xvdh
# vol-0097f6ee4b1f0f614 xvdi
# vol-05a882cefed9a8c13 xvdj
# vol-0cf57aab66e51c68b xvdk
# vol-075ab760f2df2270c xvdl
# vol-073b272e1f84b4450 xvdm
# vol-0c98e527a16d34764 xvdn
# vol-0603e3976cd6f0c34 xvdo
# vol-0c488b59b353d51bd xvdp
# vol-05a04ea90f18d52ff xvdq
# vol-0a8726c93947641e2 xvdr
# vol-08903b57f0d5518d0 xvds

# take 10 volumes from the list and attach them
head -10 vols | while read vol dev; do aws ec2 attach-volume --device $dev --instance-id i-04373f3985b1a13e6 --volume-id $vol ; done
# create raid, create partition, format and mount
mdadm -C /dev/md0 -l raid0 -n 10 <list of devices>
fdisk /dev/md0 [p, enter 3 time, wq]
mkfs.ext4 /dev/md0p1
mount /dev/md0p1 /data

** run tests 

# unmount, stop raid, write zeros into the first 12M (clears partition and super-block)
umount /data
mdadm --stop /dev/md0 
seq 1 10 | while read f; do dd if=/dev/zero of=/dev/nvme${f}n1 bs=12M count=1; done

# detach volumes
head -10 vols | while read vol dev; do aws ec2 detach-volume --instance-id i-04373f3985b1a13e6 --volume-id $vol ; done

# final cleanup
cat vols | while read vol dev; do aws ec2 delete-volume --volume-id $vol ; done
aws ec2 terminate-instances --instance-id i-04373f3985b1a13e6

Podcasting Solution on AWS

PodCasting – on AWS can be damn cheap while being ready for a bigbillion hit…


  1. Amazon S3
  2. Amazon Lambda ( S3 Events [and @ Edge ( if not authenticating from Cognito) ] )
  3. Amazon CloudFront
  4. Amazon Route 53
  5. Amazon Cognito ( optional if social login is required )
  6. A pinch of html and some javascript ( will be provided by me )

S3 stores raw files in one bucket, and trigers lambda to do the transcoding, if mobile from any format to mp3. Meta information should be uploaded to same bucket as flat file. Also multiple quality files will be generated. Interface will upload meta.json and pod.raw files to S3 bucket.

Lambda will convert these into multiple mp3 files as well as pod index. Also will update the rss feeds and resubmit to podcast directories.

IF sole author, admin app can be local or a private mobile app, local will need to install NodeJS (>= 10) and npm, any further dependencies can be piped using npm install. Run is “npm run” will show the url in command line interface. For the same some environment settings are required.

Comments by disqus and analytics by google and custom search are some recommendations for better SEO and placement in search results.

The solution is already built and available with me which uses Node.JS for almost all purposes, and the public interface is just simply open and does not use cognito federated socical login. It was an academic project and security was not a concern. It could be converted to a commercial one if there are takers.

Piwik Analytics Custom Variables Bug

After a long gap, I had the opportunity to dig into Piwik Analytics, the latest version (2.13.1), which has many new features from the last one we were using. During the time when implementing the same, I wanted to apply some custom variables, which showd the logged in user, the internal reference numbers and some other parameters. Whatever I did according to the documentations, the custom variables were not showing up.
Continue reading “Piwik Analytics Custom Variables Bug”

SQL – Always use aggregate functions if you can

One would wonder what the title means. Well it was a thunder bolt for me when I was trying to optimize some headless scripts. Well we at Saturn do heavily use headless scripts written in php. Some use mysql some use xml and some other use memcache, in fact pretty much all would use memcache. But that is not the situation now.

In an attempt to multi thread a cron job part of optimizations done an year back, instead of sequential processing, we had switched it to single row processing. Which required to have a method getNextRow which was passed a parameter, the primary key of the table. The cron starts with a value 0 (zero) and after each is processed, the value is supplied as the last processed one. The getNextRow had

select ID from [table] WHERE ID > [LAST_ID] ORDER BY ID ASC LIMIT 1;

Continue reading “SQL – Always use aggregate functions if you can”

Javascript API Credentials – Just a port

There is not much to write than to attribute the logic up to a post on online code generator. Well since this was written in php and I needed the same in javascript, there was some cutting corners, and finally the script which is attached came up.



The vars defined hold the two array of strings, and the function generates the key. Calling the function with genKey(16, access_salt) will generate a 16 char random string from the defined array access_salt.


Script to redirect based on date or time or day of the week

Recently made a mistake by accepting a project before the initial milestone was deposited. Though I completed the project, the client did not pay, instead commented SCAM and deleted the project. So I am posting this over here hoping that the beggar will pick from here. Or this will be of use to some one.

This is not much of a magic or php expertise. The configuration area requires a default configuration with the tag ( index ) ‘0’ and then on the indexes have some meanings. A four digit will be treated as time. Eg ‘1200’ will take as 12 noon and any time after that unless another time is configured. But if you put ‘1170’ that will also be taken as 12 noon and continue. If put ‘Tuesday’ any day that falls on Tuesday will use that configuration. Again a blank configuration means not to show anything.
Continue reading “Script to redirect based on date or time or day of the week”