While creating a projection for a specific use-case, we faced the problem
that there was no more room for optimization with indexes on the database.
So we had to step back and re-think the whole data model which consisted of
4 tables each having more than 50 million rows. Until this point, we hat cheap
inserts and expensive
gets on the data model.
After the analysis of the data, we figured out that the approach we took, in the beginning, was a bit too naive. Well at a certain point when your data grows
to an amount you have barely experience how to handle it or until now
you could easily handle all data, and then it’s time to re-define the use case and optimize from both sides:
It turned out that not all the data is relevant and the data was connected in certain ways, so we already reduced the number of tables from 4 down to 2. This improvement already reduced the cost for joining significantly, but on the other hand, the inserts were now attached to a certain logic, because the data had to be combined. Nevertheless, the service and database could handle this amount of requests to cope with the load.
On the other hand, we wanted the projection to be lightning fast. On a projection, there should usually be no
joins if it is meant to be fast. So we took an intermediate
step to update the projections (there were 2 in total) on each
of the intermediate tables. Furthermore, we computed the eligibility of a data record for the projection. Additionally, we partitioned the projection tables. Fortunately
Postgres 10 does much work on partioning
already for you. You still have to create the partitions on your own but a
CREATE TABLE IF EXISTS does the magic before each insert. In Postgres 9
you still had to write triggers.
You still can in Postgres 10 as well if you want full flexibility. On AWS RDS the latest
Postgres version available is currently 10, meaning if you cannot to the lovely
INSERT ... ON CONFLICT DO UPDATE, but you have to implement it on your own. With our experience
, we already knew how to achieve this. (BTW
will support those
Domain-driven Design the question rose where to put which logic: put the
CREATE TABLE IF EXISTS for the partition as a stored procedure or in
the infrastructure layer of the service? We decided on the infrastructure layer, although
it might be slower than the stored procedure, but it is better testable and
extendable and the code can be put under version control.
It was a good decision to split logic between
update of the projection.
The next step will be the optimization of the UI which is another topic. But the database layer is now at a stable state.
There will be a point in time when all your data cannot be handled any by a single data source. Projections are a neat solution but they should be optimized as much as possible or kept at a specific amount of data, meaning old data should be deleted or archived.