AWSAthena – JijuTM.COM

In any web application project, selecting the optimal database is crucial. Each project comes with unique requirements, and the final decision often depends on the data characteristics, the application’s operational demands, and future scaling expectations. For my most recent project, choosing a database meant evaluating a range of engines, each with strengths and trade-offs. Here, I’ll walk through the decision-making process and the architecture chosen to meet the application’s unique needs using AWS services.

Initial Considerations

When evaluating databases, I focused on several key factors:

Data Ingestion and Retrieval Patterns: What type of data will be stored, and how will it be accessed or analyzed?
Search and Select Complexity: How complex are the queries, and do we require complex joins or aggregations?
Data Analysis Needs: Will the data require post-processing or machine learning integration for tasks like sentiment analysis?

The database engines I considered included MariaDB, PostgreSQL, and Amazon DynamoDB. MariaDB and PostgreSQL are widely adopted relational databases known for reliability and extensive features, but DynamoDB is particularly designed to support high-throughput applications on AWS, making it a strong candidate.

The Project’s Data Requirements

This project required the following data structure:

Data Structure: Each row was structured as JSON, with a maximum record size of approximately 1,541 bytes.
Attributes: Each record included an asset ID (20 chars), user ID (20 chars), a rating (1 digit), and a review of up to 1,500 characters.
Scale Expectations: Marketing projections suggested rapid growth, with up to 100,000 assets and 50,000 users within six months, resulting in a peak usage of about 5,000 transactions per second. Mock Benchmarks and Testing

To ensure scalability, I conducted a benchmarking exercise using Docker containers to simulate real-world performance for each database engine:

MariaDB and PostgreSQL: Both performed well with moderate loads, but resource consumption spiked sharply under simultaneous requests, capping at around 50 transactions per second before exhausting resources.
Amazon DynamoDB: Even on constrained resources, DynamoDB managed up to 24,000 requests per second. This performance, combined with its fully managed, serverless nature and built-in horizontal scaling capability, made DynamoDB the clear choice for this project’s high concurrency and low-latency requirements. Amazon DynamoDB – The Core Database

DynamoDB emerged as the best fit for several reasons:

High Availability and Scalability: With DynamoDB, we can automatically scale up or down based on traffic, and AWS manages the underlying infrastructure, ensuring availability across multiple regions.
Serverless Architecture Compatibility: Since our application was API-first and serverless, built with AWS Lambda in Node.js and Python, DynamoDB’s seamless integration with AWS services suited this architecture perfectly.
Flexible Data Model: DynamoDB’s schema-less, JSON-compatible structure aligned with our data requirements.

Tag: AWSAthena

Choosing the Right Database for High-Performance Web Applications on AWS