Designing scalable system
Q: How do
you design scalable applications?
A: “I apply hexagonal architecture and domain-driven design, as
seen in NAV Station and PDS projects. I’ve used design patterns like Adapter
and Chain of Responsibility to improve flexibility and maintainability.”
Microservices: Breakdown
into multiple small domain module : in microservices style.
Load distribution: using circuit breaker, etc.
Event driven: Implement the event driven architechure; use kafka or other framework to messaging.
Caching: Implement caching mechanishm, just like redis or caffeine cache
Virtual thread: Implement threading s, muti threaded application using virtual threads.
Database optimization:
Implememt the optimized queries, in such a way we can achive higher performance
Implement hikaripool for database. Etc.
Implement indexing , partitionaing and shrading.
Java code optimization: Implement multithreading practices, parallelstream, and other java practices like parallelmap instead of hash map etc.
Baching and pagination:
- Batch inserts/updates to reduce transaction overhead.
- Use pagination for large result sets to avoid memory issues.
Optional
Code quality:
Hexagonal Architecture:Implement
the module in hexagonal architecture style in the coding.
Clean code: Implement
the clean code, implement unit test and achive the higher coverage more than
90%
Implement CICD , and pipelines that runs the test in parallel.
Implement
the git hooks so that arc unit is run and tested to make sure we are maining
the arc , and also git hooks run the sonar coverage etc.
For
documentation we have been using the devdocs.
Full detail:
Improving performance in a highly scalable Java
application and its database involves a combination
of architectural decisions, code-level optimizations, and infrastructure
tuning. Here's a breakdown of strategies for both:
๐ง Java Application
Performance
1. Concurrency and Thread Management
- Use non-blocking
I/O , we used project reactive or reactive programming
language in NAV station and prisym Leverage parallel stream,
completeable future, suppy async or run async.
- Leverage ExecutorService or ForkJoinPool for
parallel processing.
- Leverage
parallel stream, completeable future, suppy async or run async.
Avoid thread contention and
excessive synchronization.
ExecutorService executor = Executors.newFixedThreadPool(4);
List<List<String>> batches = ...; // divide your data into
batches
for (List<String> batch : batches) {
executor.submit(() -> processBatch(batch));
}
executor.shutdown();
2. JVM Tuning
- Profile
with tools like JVisualVM, YourKit, or Flight
Recorder or you can use datadog to check the jvm usage and implement
the performance accordly.
- Tune Garbage
Collection (GC) based on workload (e.g., G1GC for low pause
times).
- Set
appropriate heap sizes (-Xms, -Xmx) and thread stack sizes.
3. Efficient Data Structures and Algorithms
- Use
appropriate collections (ConcurrentHashMap, ArrayList, etc.).
- Synchronized list:
- List<String> list = Collections.synchronizedList(new
ArrayList<>());
- Avoid
unnecessary object creation and boxing/unboxing.
- Minimize
memory footprint by reusing objects (e.g., object pools).
4. Caching
- Use in-memory
caches like Caffeine, Ehcache, or Guava
Cache.
- For
distributed caching, use Redis or Hazelcast.
- Cache
expensive computations and frequently accessed data.
5. Asynchronous Processing
- Use message
queues (Kafka, RabbitMQ) for decoupling and async workflows.
- Offload
heavy tasks to background workers.
6. Microservices and Load Distribution
- Break
monoliths into microservices for better scalability.
- Use API
gateways, service registries, and load balancers.
- Virtual
threads and parallel stream and completable futures.
๐️ Database Performance
1. Indexing
- Create
indexes on frequently queried columns.
- Use composite
indexes for multi-column filters.
- Monitor
and remove unused indexes to reduce write overhead.
- Creating
an index in Oracle Database helps improve query
performance by allowing the database engine to quickly locate rows. Here's
how you can create different types of indexes:
Basic index
CREATE INDEX index_name
ON table_name (column1, column2, ...);
2. Query Optimization
- Use EXPLAIN
PLAN to analyze query performance.
- Avoid SELECT
*; fetch only needed columns.
- Use prepared
statements to reduce parsing overhead.
3. Connection Pooling
- Use
tools like HikariCP or Apache DBCP for
efficient connection management.
- Tune
pool size based on application load.
Why Use HikariCP?
High performance: Minimal
overhead and fast connection acquisition.
Lightweight: Small
footprint and low latency.
Reliable: Actively
maintained and widely adopted (used by Spring Boot by default).
Advanced tuning: Offers
fine-grained control over pool behavior.
<dependency>
<groupId>com.zaxxer</groupId>
<artifactId>HikariCP</artifactId>
<version>5.0.1</version> <!-- Check for
latest version -->
</dependency>
spring.datasource.url=jdbc:mysql://localhost:3306/mydb
spring.datasource.username=user
spring.datasource.password=password
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.connection-timeout=30000
4. Partitioning and Sharding
- Partition
large tables to improve query performance.
- Use sharding for
horizontal scaling in distributed databases.
Partitioning in databases is a technique used to divide large tables into smaller, more manageable pieces while maintaining the logical integrity of the data. It improves performance, scalability, and maintenance—especially in high-volume transactional or analytical systems.
1. Horizontal Partitioning
(Sharding)
Splits rows across multiple tables
or databases.
Common in distributed systems.
Example: Users from different
regions stored in separate databases.
2. Vertical Partitioning
Splits columns into separate
tables.
Useful when some columns are
accessed more frequently than others.
Example: Separating user profile
info from login credentials.
Benefits of Partitioning
- ✅ Improved
Query Performance: Queries scan only relevant partitions.
- ✅ Faster
Maintenance: Easier to archive, delete, or backup partitions.
- ✅ Parallelism:
Queries and operations can run concurrently on different partitions.
- ✅ Scalability:
Supports large datasets without degrading performance.
5. Caching Layer
- Use Redis or Memcached to
cache frequent queries.
- Implement read-through or write-through caching
strategies.
6. Replication and Read Scaling
- Use read
replicas to distribute read load.
- Ensure
replication lag is monitored and managed.
7. Batching and Pagination
- Batch
inserts/updates to reduce transaction overhead.
- Use
pagination for large result sets to avoid memory issues.
๐งช Monitoring &
Profiling Tools
- Java:
JProfiler, JMC, Prometheus + Grafana, Micrometer
- Database:
pg_stat_statements (PostgreSQL), Performance Schema (MySQL), AWR (Oracle)
Comments
Post a Comment