First Name*

Last Name*

Email ID

Phone*

College - Where did you study?*

One of the IITs

One of the NITs

One of the BITs

One of the IIITs

One of the NIDs

Agnel Charities' FR. C. Rodrigues Institute of Technology, Vashi, Navi Mumbai

Atal Bihari Vajpayee Indian Institute of Information Technology & Management Gwalior (IIIT)

B M S College of Engineering Basavanagudi,Bangalore(BMSCE)

B.R.A.C.T's Vishwakarma Institute of Information Technology, Kondhwa(VIIT)

Bansilal Ramnath Agarawal Charitable Trust's Vishwakarma Institute of Technology, Bibwewadi, Pune (VIT Pune)

Bhartiya Vidya Bhavan's Sardar Patel Institute of Technology , Andheri, Mumbai (SPIT)

Bhilai Institute of Technology, Bhilai House, Durg(BIT)

Bhilai Institute of Technology.

Birla Institute of Technology, Goa

Birla Institute of Technology, Hydrabad

Birla Institute of Technology, Mesra, Ranchi

Birla Institute of Technology, Pilani, Rajasthan

CHAITANYA BHARATHI INSTITUTE OF TECHNOLOGY(CBIT)

Coimbatore Institute Of Technology(CIT) (Autonomous)

College of Engineering, Pune (COEP)

CV Raman Global University

Dayananda Sagar College of Engineering Bangalore (DSCE)

Delhi Technological University, DTU Delhi

Desai University, (DDU), Nadiad

Dhirubhai Ambani Institute of Info. & Comm. Tech.,(DA-IICT)

Don Bosco Institute of Technology, Mumbai

Dr. Ambedkar Institute Of Technology Bangalore

Faculty Of Technology & Engineering(MSU), Vadodara

Faculty Of Technology And Engineering(GIA), Dharmsinh

Fr. Conceicao Rodrigues College of Engineering, Bandra,Mumbai

Garv Institute of Management & Technology.

Government College of Engineering, Amravati

Govt Engineering College, Bilaspur.

Govt Engineering College, Raipur.

Govt. Engineering College, Raipur (GEC Raipur)

IIIT Hyderabad

Indian Institute of Art and Design(IIAD), Delhi

Indian Institute of Engineering Science and Technology, Shibpur (IIEST Shibpur)

Indian Institute of Information Technology (IIIT) Pune

Indian Institute of Information Technology (IIIT)Kota, Rajasthan

Indian Institute of Information Technology Surat (IIIT)

Indian Institute of Information Technology(IIIT) Kilohrad, Sonepat, Haryana

Indian Institute of Information Technology(IIIT), Vadodara, Gujrat

Indian Institute of Information Technology, Design & Manufacturing, Kancheepuram (IIIT)

Indian Institute of Technology (BHU) Varanasi

Indian Institute of Technology (ISM) Dhanbad

Indian Institute of Technology Bhilai

Indian Institute of Technology Bhubaneswar

Indian Institute of Technology Bombay

Indian Institute of Technology Delhi

Indian Institute of Technology Dharwad

Indian Institute of Technology Gandhinagar

Indian Institute of Technology Goa

Indian Institute of Technology Guwahati

Indian Institute of Technology Hyderabad

Indian Institute of Technology Indore

Indian Institute of Technology Jammu

Indian Institute of Technology Jodhpur

Indian Institute of Technology Kanpur

Indian Institute of Technology Kharagpur

Indian Institute of Technology Madras

Indian Institute of Technology Mandi

Indian Institute of Technology Palakkad

Indian Institute of Technology Patna

Indian Institute of Technology Roorkee

Indian Institute of Technology Ropar

Indian Institute of Technology Tirupati

Indraprastha Institute of Information Technology Delhi (IIIT-Delhi)

INSTITUTE OF ENGINEERING & TECHNOLOGY,LUCKNOW (0052)(IET Lucknow)

Institute of Engineering and Management, Kolkata

Institute of Engineering and Technology, DAVV, Indore (1996)

Institute Of Technology, Nirma University Of Science & Technology, Ahmedabad

International Institute of Information Technology, Bhubaneswar

International Institute of Information Technology, Naya Raipur

Jabalpur Engineering College, Jabalpur, (JEC) (1947)

Jadavpur Uni

Jadavpur University

JSS Science and Technology University(Formerly SJCE) Mysore

K J Somaiya Institute of Engineering and Information Technology, Sion, Mumbai

K.J.Somaiya College of Engineering, Vidyavihar, Mumbai

Kalinga Institute of Industrial Technology

L.D.College Of Engineering, Ahmedabad (LDCE)

M S Ramaiah Institute of Technology Bangalore (MSRIT)

Madhav Institute of Technology & Science, Gwalior (1957)

MAEER’S MIT, Pune

Maharashtra Academy of Engineering and Educational Research

Maharashtra Institute of Technology (MIT)

Malaviya National Institute of Technology Jaipur

Manipal Institute of Technology (MIT)

Maulana Abul Kalam Azad University of Technology, Kolkata

Maulana Azad National Institute of Tehnology Bhopal

MIT Academy of Engineering,Alandi, Pune

MKSSS's Cummins College of Engineering for Women, Karvenagar,Pune

Motilal Nehru National Institute of Technology Allahabad

National Institute of Design(NID)

National Institute of Technology Calicut

National Institute of Technology Delhi

National Institute of Technology Durgapur

National Institute of Technology Hamirpur

National Institute of Technology Jalandhar

National Institute of Technology Karnataka, Surathkal

National Institute of Technology Patna

National Institute of Technology Raipur

National Institute of Technology, Andhra Pradesh

National Institute of Technology, Jamshedpur

National Institute of Technology, Kurukshreta

National Institute of Technology, Rourkela

National Institute of Technology, Silchar

National Institute of Technology, Tiruchirappalli

National Institute of Technology, Warangal

Netaji Subhas University of Technology, New Delhi (NSUT Delhi)

O U COLLEGE OF ENGG HYDERABAD (UCE)

P E S University (Electronic City Campus) Bangalore(PES)

P E S University (Ring Road Campus) Bangalore(PES)

Pandit Deendayal Petroleum University ,Gandhinagar(PDPU)

Pimpri Chinchwad Education Trust, Pimpri Chinchwad College of Engineering, Pune(PCCOE)

PSG College of Engineering and Technology

Pt. Dwarka Prasad Mishra Indian Institute of Information Technology, Design & Manufacture Jabalpur

Pune Institute of Computer Technology, Dhankavdi, Pune(PICT)

Punjab Engineering College, Chandigarh (PEC)

R. V. College of Engineering Bangalore(RVCE)

Sardar Patel Institute of Technology, Andheri, Mumbai

Sardar Vallabhbhai National Institute of Technology, Surat

School of Engineering and Applied Science, Ahmedabad (SEAS)

Shri G.S. Institute of Technology & Science, Indore (M.P.) (1952)

Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded

Shri Shankaracharya Technical Campus,(Shri Shankaracharya Group of Institutions).

Shri Vile Parle Kelvani Mandal's Dwarkadas J. Sanghvi College of Engineering, Vile Parle,Mumbai (DJSCE)

Silicon Institute of Technology

Sir M.Visveswaraya Institute of Technology Hunasemaranahalli,Bangalore,

SOA ITER, Bhubaneshwar

Sri Jayachamarajendra College of Engineering(Const. of JSS Univ.) Mysore

Sri Sivasubramaniya Nadar College Of Engg (Autonomous) (SSN)

Srishti Institute of Art and Design, Bangaluru

SSN CoE, Kalavakkam

Symbiosis Institute of Design(SID),Pune

The National Institute of Engineering Mysore (NIE)

Thiagarajar College Of Engineering (Autonomous) (TCE)

University Institute of Technology RGPV, Bhopal (1986)

University of Kalyani, Kalyani

University Visveswariah College of Engineering Bangalore (UVCE)

VASAVI COLLEGE OF ENGINEERING (VCE)

Veer Surendra Sai University of Technology

Veermata Jijabai Technological Institute(VJTI), Matunga, Mumbai

Vellore Institute of Technology(VIT Vellore)

Vidyalankar Institute of Technology,Wadala, Mumbai

Vishwakarma Government Engineering College, Chandkheda,Gandhinagar (VGECG)

Visvesvaraya National Institute of Technology, Nagpur

Vivekanand Education Society's Institute of Technology, Chembur, Mumbai

Walchand College of Engineering, Sangli (WCE)

Field of Study (Graduation)*

BTech

BDES/MDES

BCA

BSc

Others

Upload your CV*

Yes, I would like Talentica Software to contact me. Click here to read our full Privacy Policy.

First Name*

Last Name*

Email ID

Phone*

Message

Yes, I would like Talentica Software to contact me. Click here to read our full Privacy Policy.

Declarative Lakehouse Pipelines: Oracle & MongoDB Integration

October 16, 2025

Nabin Kundu

Senior Software Engineer

October 16, 2025

Nabin Kundu

Senior Software Engineer

Modern data systems demand real-time synchronization, reliability, and scalability- qualities that traditional ETL pipelines often fail to deliver. To address these limitations, I along with my team turned to the Declarative Lakehouse Pipeline approach powered by Delta Lake on Databricks.

Delta Lake is an open-source storage framework that brings ACID transactions, schema enforcement, and data reliability to data lakes. Built on top of Parquet files and compatible with tools like Apache Spark, it enables both batch and streaming workloads on the same data.

When combined with tools like Debezium, DeltaLake’s capabilities extend even further. Debezium makes it easier to stream CDC into raw delta lake format data in bronze layer of a medallion architecture. This ensures freshness and consistency across systems without relying on periodic bulk refreshes.

If you are a data engineer tasked with building or modernizing ELT pipelines, this walkthrough will not only provide practical implementation details but also demonstrates the architecture and design principles needed to scale efficiently while keeping data trustworthy.

The problem

Traditional ETL pipelines struggle with:

Real-time synchronization gaps: Traditional etl jobs run on large dataset instead of incremental data and often scheduled at specific intervals making it difficult to support near-real time dashboards.
Schema drift in semi-structured sources: The pipeline failed to support schema evolution with fields being added, removed or nested over time, which leads to broken jobs and needs a fix very often.
Complex backfills during pipeline initialization: Setting up a new pipeline requires replaying of historical data, custom logic, large batch jobs and downtime.
Data reliability: When data comes from multiple systems, inconsistencies and conflicting records often arise, creating multiple sources of truth and reducing trust in the data.

We needed a pipeline that could solve the problems.

Trivial solution

A common, but naive approach is to maintain additional datasets in Tableau or use federated queries to combine data from different systems. While this can provide a temporary view of unified data, it introduces several issues:

Manual maintenance and reconciliation of datasets is error prone.
Performance can suffer when querying across multiple sources.
Data inconsistencies persist, and multiple sources of truth remain.

Final solution

We designed a delta lake based ingestion flow with medallion architecture:

1. Oracle & MongoDB → Kafka (CDC + Glue) : One-time load is done via AWS Glue jobs and dumped to s3 as delta. Incremental changes are captured with Debezium CDC connectors and written to kafka. We have setup a confluent kafka. We have glue job running on AWS. An ec2 instance is setup to run kafka-connect where the debezium connectors will be deployed.

Oracle debezium connector: https://gist.github.com/nabink-dev/3ad46ec38d41b7117933365836184319

MongoDB debezium connector: https://gist.github.com/nabink-dev/dd71101fcd90ad5fd67a7bf3010c4fd9

Note: One time load + cdc can also be done using debezium connectors. But since the oracle data size was huge, we have written a glue for one-time sync.

2. Kafka → S3 (Spark Streaming): This is a databricks job which used spark streaming to consume from kafka topics and appends raw data in Delta on s3.
Sample job:
https://gist.github.com/nabink-dev/30efe4b194ca50d9bef451fbce2c00c0
https://gist.github.com/nabink-dev/801a85b94ff6063b5280a12979e139ab

3. S3 → Databricks Delta Lake (Bronze): Create external delta tables in databricks unity catalog with s3 locations. This data has the initial load from glue and cdc data from kafka so is essentially scd type 2 data.
https://gist.github.com/nabink-dev/46da843ee7a0ce8237b0065fbdf99115

4. Bronze → Silver : Since mongodb is schema less unlike oracle, so we created a job to read the bronze tables and generate schema and store it in s3. Finally a continuous lakeflow declarative pipeline is triggered which reads the bronze tables, applies the schema, converts from scd type 2 to scd type 1 and save it as a streaming delta table in an intermediate silver layer in catalog.

Generate Schema: https://gist.github.com/nabink-dev/c8c2d86ab76f4e21eaa870fbe2412034#file-generate_schema-py

Create intermediate layer of silver: https://gist.github.com/nabink-dev/c8c2d86ab76f4e21eaa870fbe2412034#file-generate-silver-1-py

We then do our dimensional modelling to create dimension and fact tables. This serves our final silver layer.

Sample fact table: https://gist.github.com/nabink-dev/4d5b5246eadfd0af38e67abd9cc7a926

5. Silver → Gold : We then read the required silver tables, aggregate using our business logic to create the gold layer with KPIs for downstream analytics.
Sample gold table: https://gist.github.com/nabink-dev/6104f430f33a23aeda4a22f2634564fe

6. Final architecture diagram:

Conclusion

The final solution effectively tackles the identified problems by establishing a real-time, resilient, and reliable data pipeline. It solves real-time synchronization gaps by using Debezium CDC and Spark Streaming to ingest incremental data continuously, enabling near-real-time analytics.

For schema drift, especially from semi-structured sources like MongoDB, the approach generates and applies schemas within continuous pipelines (DLT), allowing for flexible schema evolution. Complex backfills are streamlined through an initial AWS Glue load combined with ongoing CDC, eliminating the need for extensive custom historical data processing.

Finally, data reliability is ensured by leveraging Delta Lake’s ACID properties, schema enforcement, and the progressive refinement of data through a Medallion Architecture (Bronze, Silver, Gold), establishing a single source of truth for downstream consumption.