Zerodha, India’s largest stock broker, has successfully managed to harness the power of PostgreSQL to handle the immense scale and complexity of big data. Through a series of strategic decisions and technical adjustments, they have fine-tuned their PostgreSQL database to optimize performance and reliability. Here’s a look into how they achieved this feat and the lessons they learned along the way.
Thank you for reading this post, don't forget to subscribe!Indexing Wisely
Zerodha understands the importance of indexing but cautions against overdoing it. While indexes can significantly speed up query performance, maintaining too many can degrade it. They strike a balance by indexing selectively, which ensures quick data retrieval without adding unnecessary overhead.
Leveraging Partial Indexing
Partial indexing is employed to index only specific subsets of data. This selective approach ensures that only the most relevant data is indexed, enhancing the efficiency and performance of queries targeting these subsets.
Denormalization for Performance Gains
To boost performance, Zerodha denormalized substantial portions of their datasets. Although this increased the database size, it reduced the complexity of joins, resulting in faster query responses. This trade-off between storage space and performance was deemed worthwhile for their operational needs.
Utilizing Materialized Views and CTEs
Materialized views and Common Table Expressions (CTEs) are extensively used to handle complex queries and improve performance. Materialized views store the results of expensive queries, making subsequent accesses faster. CTEs help in breaking down complex queries into more manageable parts, improving readability and maintainability.
Thoughtful Schema Design
A deep understanding of the data and a well-thought-out schema design are critical for efficient database management. Zerodha emphasizes the importance of comprehensively understanding the data to design a schema that ensures optimal performance and meets business requirements.
Continuous Database and Table Tuning
Tuning PostgreSQL parameters and tables is an ongoing process at Zerodha. Regular vacuuming, including manual vacuuming and analysis after bulk imports, is essential to maintain performance. This prevents database bloat and ensures that the system remains responsive and efficient.
Efficient Query Planning and Direction
Optimizing the query planner and ensuring efficient query execution are crucial. Zerodha fine-tunes their queries to reduce database load and improve execution times. This involves analyzing query plans and adjusting queries to ensure they run as efficiently as possible.
Regular Vacuuming Practices
While autovacuum helps maintain the database, it’s not always sufficient. Zerodha performs regular manual vacuuming, especially after bulk imports, to keep the database optimized. This practice ensures that the database remains performant and responsive.
Managing Data Imports with Parallelism
Handling large data imports efficiently is a challenge. Zerodha uses parallelism to manage data imports, distributing the load across multiple processes to prevent bottlenecks. This approach ensures that large data imports do not adversely affect system performance.
Using PostgreSQL as a Caching Layer
Zerodha employs PostgreSQL as a caching layer with the help of their open-source library, sqljobber. This setup allows PostgreSQL to cache query results, supporting efficient sorting and search functionalities. By creating millions of tables daily, they ensure that data retrieval is quick and efficient.
Ensuring Resilience and Quick Recovery
High availability and quick recovery are vital. Zerodha ensures that their PostgreSQL database can be quickly restored using S3 file backups. Running two instances simultaneously adds to the resilience and availability of their system.
Scaling with Sharding
Sharding is a key strategy for managing large-scale data. Zerodha distributes data across multiple shards to improve query performance and scalability. Setting a hard limit on query execution time helps optimize schemas and queries, ensuring that the system remains responsive.
Maintaining a Lean Engineering Setup
A lean engineering setup is crucial for Zerodha. They focus on finding organic solutions and maintaining consistent database management practices. While PostgreSQL may not solve every problem, it is the right tool for their current needs, providing a robust and efficient database solution.
Avoiding Data Overload
Zerodha recognizes the importance of not overloading PostgreSQL with unnecessary data. Heavy queries are placed behind a synchronous setup to avoid hindering system performance. This practice ensures that the database remains efficient and responsive.
Conclusion
Zerodha’s success with PostgreSQL showcases their strategic approach to database management. By balancing indexing, leveraging partial indexing, denormalizing datasets, using materialized views and CTEs, optimizing schema design, and continuously tuning the database, they have created a robust system that handles their big data needs efficiently. Their experience highlights the importance of a tailored approach, ensuring that the right tools and strategies are used to meet specific business requirements.