You never know which of your marketing channels works best in terms of real gross profit. For instance, if some track gives you the most sales, that doesn't mean it produces the most yield. To get accurate information, you should consider many factors like total shipping costs at your expense, pick & pack fees, payment fees, cost of goods, and more. And here, the platform we developed plays the game.
While developing this service, we encountered some technical challenges.
The first challenge we encountered was the necessity of providing acceptable delays in online order processing in real-time with ever-increasing loads.
The nearest Black Friday has shown that the orders can come in at the speed of 500 a second.
800ms was taken as the maximum allowable delay, 300-800ms - satisfactory, with 300ms being the desired delay.
In the first versions of the platform, Server GTM did not yet exist. We had to accept the order data, store it, and, staying within the frame of an acceptable delay, return in response the calculated data. We had to make the correct selections from hundreds of millions of records to do so.
The fact that the calculations were also quite complex added to the task's difficulty.
This task was decomposed into many separate subtasks, and each of them was full of technical nuances.
By separating the logic for calculating economic metrics into a separate micro-service, we have achieved greater fault tolerance due to the dynamic scalability of the required cloud resources.
Fine optimization of queries in the storage helped satisfy the query execution time limit.
We have optimized the calculation execution time by making computationally intensive processes asynchronous and separated some specific operations into separate microservices implemented in R, which was better suited for this.
One of the valuable features of the service is the ability to apply calculations not only to orders received in real time but also to existing ones.
Thus, the service begins to become useful, even before the store is connected - you can make a retrospective analysis of orders from previous years and instantly apply the insights received.
On the technical side, the solution to this problem required the implementation of importing a large number of rows (hundreds of thousands or even millions) into existing order tables with the calculation of marketing parameters for each row.
This is a blocking process that takes a lot of time. We allocated the import logic to the service with cloud resources separately dedicated for it, dynamically scaling on demand.
The distribution of the process in time to reduce the peak load has become a separate non-trivial queuing task, which we solved by batching the entire imported volume into and organizing them into a queue.
While testing the service, it turned out that data containing orders is sometimes of insufficient quality (values of essential parameters are partially or entirely missing). Because this significantly increased the error in the calculated economic metrics. The next task arose by itself - to reduce this error.
In this case, the solution was to develop and carefully manually analyze a large number of such data to identify patterns. Without this, it would be difficult to understand the correct requirements for the problem-solving algorithm.
As a result of the analysis, we understood how to correctly supplement the algorithm so that it "smartly" skipped the low-quality data when processing millions of records. Of course, this was not without subtle optimizations of queries to the storage.
Our next task was to export data for sending to third-party analytics systems, exchange with colleagues, etc.
We have reached the existing rate limits for the number of requests to Athena very quickly.
Of course, you could write a support request to increase the limits, but we would quickly reach the following, which would be significantly more expensive.
From here, a new challenge appeared. We had to find a way to solve the task without exceeding the allowable AWS request limits and optimize infrastructure costs.
Plunging deeper into AWS services, we sought suitable opportunities for this task. We divided a single database in Athena / S3 into Data Lake of raw data, Data Lake of cleaned data, and Data Frames. The solution was to separate storage by “projects” within the system.
The widespread use of caching intermediate and final query results has significantly reduced the load on Athena. S3 pre-signed URLs (including those to cached files) shifted the task of delivering data from the system to the user and from the user to the system to Amazon cloud services. As a result, we reduced the load on the API, increased the system's reliability, and lowered costs.
In the platform's first version, calculated profit data was returned in response to an order tracking request and then sent to GA via GTM.
In the second version, to ensure the privacy of the calculated marketing parameters, this issue had to be solved by bypassing the front-end.
To solve this problem, we used the new Server GTM service.
Having studied the documentation, we moved the request for calculated marketing data to the sGTM level, not to receive data on the client side.
Now sGTM itself requests data from the back-end of our platform, and they are no longer available on the front-end of the e-commerce site - the data privacy is kept, and the problem is solved.
An exciting challenge we encountered later was updating tens of millions of records stored in RDS. Due to the peculiarities in relations, we had to continue storing some data in RDS and did not have the opportunity to consider migrating data to other storages. There was quite a lot of data, and the JSON file available via URL was the source for updates.
The problem was solved using S3, Athena, and RDS. S3 was used as temporary storage, Athena - for data analysis and comparing new and old datasets (adding new records, deleting irrelevant ones, updating changed ones).
This approach made it possible to perform such operations optimally, and RDS remained the main storage for the necessary data.
Sync with Magento was another obstacle we faced. It was necessary to pull out a relatively large amount of data, process it, supplement it with new calculation attributes, and write it to the database without overloading RDS with requests to read the data necessary for calculations.
Magento API gives a maximum of 100 records per request. We have implemented a mechanism that reads data in stream mode in 100 rows, collecting a larger batch. As soon as enough data for the batch is collected, it starts a separate operation that processes this larger batch, supplements it with the necessary data, and writes it to RDS with minor optimizations for the number of write requests.
At the same time, as soon as a new large batch is launched, the reading stream from the Magento API continues to work. When a new large batch is collected, it will also be launched as soon as the required data is gathered. Thus, synchronizing large amounts of data from Magento was implemented without overloading RDS. Analysis, supplementing data with calculated values, and the calculations were delegated to the Athena / S3 bundle, which also removed the extra load from the Node.js service.
A few days after testing the platform on major e-commerce sites, we discovered a rapid increase in storage volumes and subsequent fetch speed issues.
In this way, we revealed another challenge. We had to ensure an acceptable fetch speed rate for any amount of data in the storage.
We found a way out in organizing data partitioning by calendar dates.
The system must buffer the orders received through the real-time API. Otherwise, each order will be placed in a separate file, increasing the data processing cost.
We used Kinesis Firehose for buffering. We solved one problem - we got another: if you deliver data through the Firehose, Athena herself does not create new partitions.
We applied the S3Events->SQS->Lambda pipeline to solve this problem. Lambda listens to changes in storage events and creates new partitions. A separate queue in SQS transfers records from Data Lake with dirty data to Data Lake with clean data.
The continuous flow of data to the storage is accompanied by automated partitioning, which ensures an acceptable speed of subsequent fetches on almost any storage volume.
Later, we had to return to the calculation parameters privacy issue. Some of them that were sensitive had to remain private and needed to be sent to Google Analytics in some alternate way.
To make this happen, we had to figure out a new technology - Measurement Protocol.
After high-quality research and reading the documentation, we implemented integration with GA through the Measurement Protocol, making secret calculation parameters inaccessible on the front-end side of e-commerce websites.
Another difficult task was the import of orders from Google Analytics. At first glance, it sounds like a simple task.
However, the documentation of the GA API, AppFlow, and the unexpected and undocumented behavior of GA turned this task into a real challenge.
Here we had to move almost blindly and empirically find the correct query parameters to integrate with GA through AppFlow eventually.
It took a couple of weeks of very close work, including working with the client on an experimental study of the GA API behavior, after which everything worked as it should.
The solution to this problem was the skill to move forward even in the face of uncertainty. Despite the lack of clarity, it was very nice when we could still get everything to work together correctly.
As a result, we managed to build, deliver and launch a working, sophisticated enterprise analytics service and profit optimization platform that exceeded the MVP requirements
We have built a platform that can receive all the real-time data necessary for calculating profitability. Data that is usually hidden and inaccessible to analytics systems such as Google Analytics. Then we supplemented the data in these systems with calculated parameters, collecting a complete and accurate picture of profitability, making it possible to optimize advertising costs effectively.
Benefits of profit optimization:
Although the main functionality of the platform has already been built and is up and running, there is still a large field for upgrading the mechanisms of analysis, calculations, and predictions. In the near future, we plan to consistently improve the efficiency of profit optimization mechanisms and expand the list of integrations with third-party services. Work is also planned to refine the architecture and optimize the platform's algorithms for optimal use of resources, increase in speed, and the ability to handle higher loads.
Give clients the power to get exactly what they asked for, make it easier to evolve APIs over time, and enable powerful developer tools.