Generic Reference Architecture for massive Data Lake

Organised data archival and analysis for prediction and improving productivity is a challenge for many industries. The architecture referenced can be used for live data ingestion and breakdown into reports with machine-learning capabilities. For a traditional customer premises system when migrating into this massive scale data storage and processing this architecture requires some modifications which will also be explored as case studies.

Reference Architecture

ETL Edge though depicted with an AWS Snowball Edge device, this could even be swapped with IOT sensors and scanners polled in through a Raspberry PI which has the AWS IOT Core libraries or ulitizes the message broking MQTT to dynamically send telemetry data, to be ingested into the warehouse and to be processed by machine learning process in Amazon SageMaker. This is just a reference architecture which requires further polishing to adapt it to any real-world situations. We can explore some use cases.

Continue reading “Generic Reference Architecture for massive Data Lake”

Low-Code/No-Code path to Business Applications – AWS Scores again

Introducing HoneyCode a new, fully managed low-code/no-code development tool that aims to make it easy for anybody in a company to build their own applications. All of this, of course, is backed by a database in AWS and a web-based, drag-and-drop interface builder.

Developers can build applications for up to 20 users for free. After that, they pay per user and for the storage their applications take up. There is no wait for applications to be approved on play store / app store as the applications are not directly deployed, rather through a pre deployed player ( interpreter ).

Like similar tools, Honeycode provides users with a set of templates for common use cases like to-do list applications, customer trackers, surveys, schedules and inventory management. Traditionally, AWS argues, a lot of businesses have relied on shared spreadsheets to do these things.

Continue reading “Low-Code/No-Code path to Business Applications – AWS Scores again”

EBS Provisioning VS Performance – Confusions cleared

For almost over the last decade ( since 2009 ), I was never worried about the EBS performance indexes. Used to create a single volume and attached to an instance as and when required. Today just for wandering, and to entertain myself, did a couple of tests. Thanks to aws-cli without which this could have taken more than what it would.

Straight into what I found in a short summary. Note that the values are Bps.

T1T2T3T4T5T6T7
Single272M492M268M1.3G393K272M8954.02M
Raid 0631M671M740M1.3G366K631M8851.47
Raid 5336M250M332M1.2G9.9k315M8306.52
Performance across different combination of EBS Volumes

Kicked up an EC2 instance and mounted a 200gb EBS volume to run a series of tests. Thanks to nixCraft article titled “Linux and Unix Test Disk I/O Performance With dd Command“.

Continue reading “EBS Provisioning VS Performance – Confusions cleared”

First taste of bbpress – was sweet but getting sour

Hey hey.. when we had to implement a bulletin board for an institution, the first thought was for phpbb, but the time limitation as well as lack of resources to hack through a completely new code took the turn and prompted us to take a new course. The bb press way. Atleast it was the same language written and used in the same way we were handling for more than a year. Well _ck_ should be thanked for all the goodies. And when we find bugs or errors, we all should help each other to make the system more better.

When I was using the bbPress Attachments, trying to delete a post, even by admin or moderator, would confront with an SQL error. The same error was reported by Joseph, for which _ck_ responded with downloading a new version from the trunk. For me even after the new version came in, I was getting the error. Continue reading “First taste of bbpress – was sweet but getting sour”

Configure two mysql server and connecting php

One would think what is there in configuring two mysql server, or even think what the purpose behind achieving this. Well there are different requirements, and these different requirements may lead to take us through various possiblities. For instance it may be that certain projects may need the advanced features of MySQL 5.2, where as some others could even be run on MySQL 4.12. In my case it was very peculiar and different, in that about half of our projects used transactional tables and other half could go without transactional tables. And we preferred that these two were configured on two different mysql servers. When the system was explained and the need described to the management, they ruled out the option to have different dedicated server for those projects which was not using transactional tables. Thus I thought about configuring multiple mysql server on the same hardware and operating system.

Continue reading “Configure two mysql server and connecting php”

12 PHP optimization tips

  1. If a method can be static, declare it static. Speed improvement is by a factor of 4.
  2. Avoid magic like __get, __set, __autoload
  3. require_once() is expensive
  4. Use full paths in includes and requires, less time spent on resolving the OS paths.
  5. If you need to find out the time when the script started executing, $_SERVER[’REQUEST_TIME’] is preferred to time()
  6. See if you can use strncasecmp, strpbrk and stripos instead of regex
  7. preg_replace is faster than str_replace, but strtr is faster than preg_replace by a factor of 4
  8. If the function, such as string replacement function, accepts both arrays and single characters as arguments, and if your argument list is not too long, consider writing a few redundant replacement statements, passing one character at a time, instead of one line of code that accepts arrays as search and replace arguments.
  9. Error suppression with @ is very slow.
  10. $row[‘id’] is 7 times faster than $row[id]
  11. Error messages are expensive
  12. Do not use functions inside of for loop, such as for ($x=0; $x < count($array); $x) The count() function gets called each time.

As blogged by alexmoskalyuk

Hit the WDDX Bug twice

In Saturn we had a handful of XUL projects where the XUL part was for backend administration. It was in the year 2004, when we developed most of the packages, and we were on php 4.2. There was heavy use of WDDX serialization, since that was found to be easy, we never knew about soap and soap implementations. By the end of 2005, most of the projects got woundup, and our XUL first hand developer had also quit. Since then we were swaying away from XUL development, and by mid 2006, we had almost dropped any further XUL support, as well as development.

Recently, the management decided to revamp, and pull out one of the old project to be reworked as a new product with solid backend. It was then I got bitten by the WDDX bug, [#38839], and I overcame that by patching the latest hourly patch.

Later on our COO needed the same project to be deployed on his laptop, where we downloaded the TSW which is Easy, modular and flexible WAMP bundling Apache2/SSL, MySQL4, PHP4, Perl5.8/ASP, Python2.3, Tomcat5, FirebirdDB, FileZilla, Mail/News-Server, phpMyAdmin, Awstats, WordPress, etc. It also includes a web-GUI to control/manipulate all bundled services. But even with the php 4.3.4, it also seemed to have the WDDX bug, but in a different way. ie; when the XUL application sends an AJAX request, where the output was expected as a wddx serialized string, the Apache server started crashing.

Finally in the TSW also, I downloaded a cvs snapshot and patched that, then the error went away.

Rent A Coder – Resume Page Ratings to RSS

Myself is more or less active at RAC, and wanted to show off my buyers comments in a page for promoting me. And wrote this script rac2rss. Download from here.

The main script is rac2rss.php, change the line [php] $coder_id = “1242159”; [/php], in rac2rss.php to get your resume. [php] $myFileMgr = new fileMgr(‘./cache’,7 * 24 * 60 * 60); [/php] defines the cache, and the cache folder should be world / webserver writable, and it can be outside web path for security reasons, only that the absolute or relative path should be provided, as well the lifetime is in seconds, actually in the said script I have put it as a week.

As a last note, thanks to all members and maintainers of RAC, for making it a wonderful place for all of us.