Modern serverless data ingestion solution on AWS
Aug 29, 2022 • 4 min read

Introduction
A meal kit company that specializes in delivering healthy, organic, ready meal kits was about to start its journey to the cloud. The company was recently acquired by a leader in the food and beverage industry who intended to merge the business under one umbrella, with the same brand, customer management, operations and technologies, all managed by the parent company. The Client engaged Grid Dynamics to integrate data into their ecosystem through the development of an effective data ingestion solution that provides data model reconciliation and data backfill.
The challenge
During the pandemic, the client grew substantially, released to new markets, and made several acquisitions, leading to the need for a new approach to manage business, run operations and maintain technical solutions. Due to this tremendous business growth, consistent operations improvements were required to compete in the market. Multiple IT operations, platform solutions, technical departments and integrations across acquisitions made it hard to manage a sophisticated technical landscape.
Grid Dynamics had a specific focus on integrating the acquired business into the parent technological ecosystem. The biggest challenge of any acquisition is merging businesses that have a greater number of different components than common components. Unification of business processes for this client involved:
- Unification of customer audience;
- Unification of marketing strategies: building a marketing strategy for each brand complimentary to other brands; and
- Technical architecture and solutions unification.
Further considerations for integrating acquisitions into the parent architecture included:
- An integration strategy for different technical stacks;
- Recommendations on how the parent architecture should be adopted in order to expose the integration API; and
- A data management strategy.
The rest of the derived use cases, like unified customer 360, marketing campaigns, customer acquisition and retention policies, were out of scope for the engagement.
For this case study, we’ll focus on the unification of the technical architecture, including the approaches we used, and the solutions we built on top of AWS. We also tackle the other major goal of the integration, which was to create a defined technical roadmap for future acquisition integrations.
With these defined requirements, Grid Dynamics developed a lightweight solution hosted on AWS. Below we explain why certain AWS services were beneficial for this particular integration use case.
Solution expectations
At the beginning of our engagement, the client was running an on-premise platform, with some infrastructure components migrated to AWS. Coming from an on-premise world, where supporting hardware, infrastructure, services and applications is a prerequisite, the client wanted to build a serverless platform that required close to zero infrastructure support.
Serverless considerations
Integration between the two businesses required data transformation and exposure to the parent company. While considering the serverless approach we would take, AWS Glue as a serverless data integration service stood out for its features that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. Furthermore, AWS Glue provides all the capabilities needed for data integration out of the box, enabling greater speed to market.
There are three AWS Glue components:
- AWS Glue Data Catalog – This is basically a central repository for your metadata, built to hold information in metadata tables, with each table pointing to a single data store. In other words, it acts as an index to your data schema, location, and runtime metrics, which are then used to identify the targets and sources of your ETL (Extract, Transform, Load) jobs.
- Job Scheduling System – The job scheduling system, on the other hand, is intended to help you automate and chain your ETL pipelines. It comes in the form of a flexible scheduler that’s capable of setting up event-based triggers and job execution schedules.
- ETL Engine – AWS Glue’s ETL engine is the one component that handles ETL code generation. It automatically provides this in Python or Scala, and also gives you the option of customizing the code.
Architectural solution
Grid Dynamics opted for a solution based on AWS serverless to help the client achieve their data integration goal fast. Using serverless scalable services like AWS Glue and AWS Redshift enabled us to optimize operating costs and development expenses.
The Analytics Platform that we built, based on AWS Glue capabilities, involved data ingestion from MongoDB to the data lake with an intermediate data lake in AWS S3. For ingestion and transformation, Glue ETL services based on Apache Spark were used. To meet best practices, the intermediate data lake was split into several logical layers:
- S3 Landing Zone – a place that contains the source data as is, with no transformations
- S3 Consumption Zone – a place that contains transformed landing data to corresponding data models. It contains ready-to-use data for analytics.
The data ingestion process can be summarized as follows:
- All data in the intermediate data lake were categorized by AWS DataCatalog. If needed, data is accessible using AWS Athena.
- The data ingestion pipeline writes the final data to AWS S3, AWS RDS PostgreSQL and AWS Redshift.
- The data ingestion pipeline is triggered by AWS Glue Workflow services to create and visualize complex ETL activities involving multiple crawlers, jobs and triggers.
- Finally, the needed credentials for services intercommunication are stored in AWS Secrets Manager.
The results
The project timeline was aggressive: the integration needed to go live within three months, including production infrastructure, pipelines, data quality, monitoring and support runbooks. Grid Dynamics completed the project within the timeline, providing the client with:
- Fully automated infrastructure provisioning;
- CI/CD and version control;
- Data ingestion and transformation pipelines;
- Data quality checks and schema enforcement;
- Data catalog and self service access to the data.
The solution was built on top of serverless components of the AWS cloud, and since all data pipelines are batch in nature, there is no need to run infrastructure constantly – all services are provisioned on demand and released after pipeline completion. This approach resulted in drastic infrastructure cost reductions, no more infrastructure support engineers, and greater scalability as the client grows.
Tags
You might also like
When it comes to the best web development frameworks, finding the right balance between efficiency, creativity, and maintainability is key to building modern, responsive designs. Developers constantly seek tools and approaches that simplify workflows while empowering them to create visually strikin...
Most enterprise leaders dip their toe into AI, only to realize their data isn’t ready—whether that means insufficient data, legacy data formats, lack of data accessibility, or poorly performing data infrastructure. In fact, Gartner predicts that through 2026, organizations will abandon 60% of AI pr...

For many businesses, moving away from familiar but inherently unadaptable legacy suites is challenging. However, eliminating this technical debt one step at a time can bolster your confidence. The best starting point is transitioning from a monolithic CMS to a headless CMS. This shift to a modern c...
Many organizations have already embraced practices like Agile and DevOps to enhance collaboration and responsiveness in meeting customer needs. While these advancements mark significant milestones, the journey doesn't end here. Microservices offer another powerful way to accelerate business capabil...
From AI/ML workloads and multi-tenancy to test labs and edge computing, uncover 5 practical examples of Kubernetes-based platform engineering.

Accessibility is a critical factor for businesses across various industries, including retail, technology and media, insurance, FMCG, HORECA, and manufacturing. The potential impact of neglecting accessibility can be immense, not only from a legal standpoint but also in terms of lost revenue an...
Buckle up, web enthusiasts! We’re about to explore the fascinating world of Google’s Web Vitals—the crucial initiative that has reshaped how we approach web performance and user experience. My name is Maksym, and with more than 8 years in front-end development, I’ve seen firsthand how Web Vitals ha...