rdf:type → Person

Tathagata Ghosh

Data Engineer | Knowledge Graphs, ETL, Python, SQL, Cloud & Analytics. Building reliable, scalable, and well-governed data systems.

Open to Work — Data Engineer / Cloud Data Engineer / Analytics Engineer · Germany / EU (Hybrid or Remote)

Tathagata Ghosh

Graph · Key triples

Subject — predicate → object (semantic relations)

TathagataworksAtOWIPL
TathagataworksAtHZDR
TathagatastudiedAtOtto-von-Guericke Universität
TathagatacreatedDRACO DataMaster
TathagatacreatedOttoBot
TathagataknowsAboutKnowledge Graphs
TathagatalocatedInMagdeburg, Germany
75% faster screening Cognitive RPA HR toolkit (GCP)
FAIR research data DRACO DataMaster @ HZDR
M.Sc. Digital Engineering OVGU Magdeburg
EU Open to work Germany · Hybrid or Remote

ex:ProfileGraph

Instance of the profile ontology — central node and relations.

Tathagata Person HZDR OVGU OWIPL DRACO OttoBot worksAt studiedAt worksAt created created

ex:About

I'm a Data Engineer at heart, focused on building reliable, scalable, and well-governed data systems that turn complex, unstructured information into usable products.

My core background is in data engineering, cloud platforms, and metadata-driven architectures—working with Python, SQL, distributed processing, and modern storage layers. At HZDR, I built a FAIR-aligned research data platform (DRACO DataMaster) using ontologies, knowledge graphs, MongoDB, and object storage (MinIO) to structure and serve large-scale scientific data for high-energy physics workflows.

Alongside this, I work as a freelance engineer for a stealth SaaS startup, contributing to an ERP-style platform where data modeling, visualization, workflow automation, and DevOps converge—including system design, analytics dashboards, API-driven data services, and containerized deployments with Docker.

What drives me is building end-to-end data products: from ingestion and modeling, to governance, visualization, and operational reliability. I'm seeking opportunities in the data and cloud engineering domain where I can keep growing while contributing to impactful, production-grade platforms.

Focus areas: Data Engineering & ETL · Python & SQL · Metadata systems & Knowledge Graphs · Cloud & DevOps (Docker, CI/CD, GCP) · Analytics & Visualization

ex:Experience

Full-Stack Developer (Freelance) OWIPL Oct 2024 – Present Visakhapatnam, India (Remote)

Architecting and developing a multi-tenant SaaS platform for workforce and operations management (~70% complete). Designed scalable microservices with React, Node.js, FastAPI, Flask, MongoDB, and Docker. Built an AI-assisted document processing pipeline using PaddleOCR and local LLM inference for classification and structured extraction. Implemented JWT auth with refresh tokens, RBAC, rate-limiting, audit logging, and user-isolated storage. Developed multi-role dashboards (user, organization, admin) with Ant Design, Mantine UI, Recharts, and Framer Motion; job/workflow pipelines with multi-state tracking; and orchestrated 4+ services with Docker Compose and health-check monitoring.

Data Science and Visualization Student Assistant Helmholtz-Zentrum Dresden-Rossendorf (HZDR) Apr 2024 – Mar 2025 Dresden, Germany

Built DRACO DataMaster—a FAIR-compliant, open-source research data infrastructure for a laser experiment. Migrated PostgreSQL to MongoDB with an ontology-driven schema; stored terabytes in CEPH/MinIO. Developed a knowledge graph with RDF/JSON-LD and Protégé (OWL) for semantic interoperability. Implemented ETL pipelines (Python, ThreadPoolExecutor), InfluxDB + Grafana dashboards for real-time monitoring. Presented RDM solutions at OUTPUT2024 (TU Dresden).

Student RPA Developer — GCP focus Otto-von-Guericke-Universität Magdeburg Feb 2024 – Nov 2024 Magdeburg, Germany

Architected a Cognitive RPA proof-of-concept: Flask-based portal where candidates submit forms and CVs and get instant role matching via Google Cloud. Document AI & Vertex AI NLP for extracting and normalizing skills/education (≈40% better parsing than typical ATS). BigQuery pipelines with TF-IDF and cosine similarity; Python Cloud Function ranks top-5 applicant–job matches. Cloud Composer (Airflow) orchestrated form → Document AI → Vertex AI → TF-IDF → BigQuery. Terraform, Docker, Cloud Run for IaC and CI/CD. Looker Studio dashboard for application volumes and fit-score distributions; Stackdriver for pipeline monitoring. 75% reduction in initial screening time.

Wissenschaftlicher Hilfskraft (Research Data Management) Otto-von-Guericke-Universität Magdeburg Mar 2023 – Sep 2023 Magdeburg, Germany

Drafted and implemented Data Management Plans (DMPs) following FAIR and CARE principles. Evaluated RDM tools (RDMO, RADAR) and revamped bilingual RDM websites for improved accessibility. Supported legal and ethical data handling for scientific projects.

Technischer Mitarbeiter IT (Datenintegrationszentrum, MIRACUM) Universitätsmedizin Magdeburg May 2022 – Dec 2022 Magdeburg, Germany

Utilized healthcare data, standardized documentation, and supported clinical research for Project MIRACUM. Deployed software in local environment and intranet; completed FHIR server research and implementation; implemented OMOP data model for medical informatics students. Python ETL scripts, PostgreSQL for data transformation; collect, store, and retrieve medical data for analysis and reporting.

Accenture Google Cloud Winter School Magdeburg Feb 2023 – Mar 2023 Germany

Theoretical foundations for cloud operation and use; hands-on with Google Cloud technologies and complex application examples.

Real-Time Analyst · Fraud Prevention Representative TTEC Jan 2020 – Sep 2021 Ahmedabad, India

Real-Time Analyst (Nov 2020 – Sep 2021): Workforce scheduling & planning with IEX NICE & Aspect WFM; real-time management, absenteeism tracking, PTO, intraday SL monitoring; managed teams in remote locations. Fraud Prevention Representative (Jan–Oct 2020): AirBnB fraud prevention—customer chats/emails, fraud identification and resolution, productivity and quality targets, customer experience.

Trainee Mechanical Engineer Pooja Priya Construction Aug 2019 – Dec 2019 Visakhapatnam, India

Compliance of mechanical execution per contract; client follow-up; safety standards; work breakdown structure; site supervision.

Executive Trainee Brandix Apparel Solutions Ltd. Feb 2019 – Aug 2019 Visakhapatnam, India

End-to-end redesign of garment production workflow using Lean (Kanban), Six Sigma (DMAIC), and JIT. Dynamic line segregation by destination country; color-coded Kanban and batch labels; real-time dashboards (WIP, throughput, staging). Cutting-to-stitching handover ≤30 mins. ~45% defect reduction in 4 months; near-zero missing-garment incidents; ~30% improvement in on-time shipment. QA hub consolidation, barcode scanners, and shipment-ID staging.

Co-Founder & President GITAM Aeromodelling Club Sep 2016 – Apr 2019 Visakhapatnam, India

Started Aeromodelling Club at GITAM University; national-level Ornithopter workshop (150+ participants from 3 states); president until 2019.

Summer Intern (Quality Assurance) Tata Motors May 2018 – June 2018 Jamshedpur, India

Process validation of CNC machine as part of internship project in QA department.

ex:Projects

Three selected projects from my work at Otto-von-Guericke Universität Magdeburg: educational chatbots, slide recommendation systems, and team-formation tools.

OttoBot — Transformer-based Educational Chatbot

Nov 2023 – Feb 2024 · OVGU Magdeburg

OttoBot is a university guide at your fingertips: a Transformer-based educational chatbot for Otto-von-Guericke University that answers questions about OVGU policies and procedures using retrieval-augmented generation (RAG). We integrated LangChain with Llama-2, HuggingFace embeddings, and FAISS for semantic search, plus web crawlers and Unstructured URL Loader for ingesting live content. The system combines document loaders, character text splitting, vector stores, and a Streamlit UI—all with open-source tools—to deliver tailored, context-aware answers for students and staff.

  • Llama-2
  • LangChain
  • HuggingFace
  • FAISS
  • RAG
  • Streamlit
  • Python
Read paper (PDF)

OCR Strategy for Keyword Extraction & Slide Recommendation

Nov 2022 – Feb 2023 · OVGU Magdeburg

A recommendation subsystem for SQLValidator that delivers automatic instructional feedback during online exercise sessions. We used optical character recognition (Tesseract) to extract keywords from lecture slides and exercise sheets, then applied TF-IDF and cosine similarity to map SQL tasks to the most relevant course slides. When a student submits an incorrect solution, the system recommends specific lecture slides to review. The pipeline includes preprocessing (cropping, logo masking), NLTK stop-word removal, keyword dictionaries for German and English, and a 0.2 similarity threshold—achieving 72% precision for English slides.

  • Tesseract
  • OCR
  • TF-IDF
  • Cosine Similarity
  • Python
  • NLTK
  • PyQt
Read paper (PDF)

Project Partner Recommendation System (Big Five)

May 2022 – Oct 2022 · IEEE WCCCT 2023

A team-formation recommendation system for university course projects that addresses the challenge of project breakdowns due to mismatched personalities and preferences. We used a Big Five personality questionnaire to elicit collaboration-relevant traits (neuroticism, agreeableness, conscientiousness, extraversion, openness), then combined collaborative filtering—grouping students with similar personality profiles—with utility-based recommendation so that team academic scores fall within a chosen threshold. The result is academically balanced teams better suited to productive collaboration. Co-authored with Chukwuka Victor Obionwu, Damanpreet Singh Walia, Taruna Tiwari, David Broneske, and Gunter Saake.

  • Big Five
  • Collaborative Filtering
  • Recommendation Systems
  • KNN
  • PHP
  • Python
Read paper (PDF)

Other projects

Credit Card Fraud Detection (BQML, GCP, Google Data Studio) · Apr–Sep 2021

ex:Education

M.Sc. Digital Engineering

Otto-von-Guericke Universität Magdeburg

Apr 2021 – Mar 2025 · Magdeburg, Germany

Specializations: Databases, In-Memory Technology, System Architecture, DevOps. Master's thesis: DRACO DataMaster — FAIR-compliant research data infrastructure and knowledge graph (supervised by HZDR).

B.Tech Mechanical Engineering

GITAM University

Sep 2015 – Apr 2019 · Visakhapatnam, India

Specializations: Operation Management, Material Technology, Mechanics, Statistics.

ex:Thesis

DRACO DataMaster: A Metadata-Driven Approach Utilizing Ontologies and Knowledge Graphs for the Laser Particle Acceleration

Otto-von-Guericke Universität Magdeburg, February 2025. Supervised by Prof. Dr.-Ing. Bernhard Preim (OVGU) and Dr. Oliver Knodel (HZDR).

Abstract

DRACO (Dresden Laser Acceleration Source) is a state-of-the-art high-power ultra-short pulse laser experiment at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR). This thesis develops a DRACO DataMaster extension for advanced data handling: an automated pipeline that builds knowledge graphs from unsorted tabular data, enriched with metadata via ontologies tailored for DRACO experiments. The approach aligns with FAIR principles (Findable, Accessible, Interoperable, Reusable), enabling deeper scientific insight through improved data integration, structuring, and visualization—and a robust toolset for data-driven research at HZDR.

Key contributions

  • Knowledge graph–based exploration — Force-directed layout (Barnes-Hut, O(N log N)) to visualize relationships between experimental entities (devices, shots, measurements).
  • Interactive visualization — Real-time filtering, parameterized measurement plots, and device-activity views for anomaly detection and validation.
  • Ontology-driven data management — RDF/OWL ontology (Protégé, RDFLib); MongoDB and MinIO for structured storage; semantic consistency and efficient retrieval.
  • UX & accessibility — Shneiderman’s mantra, colorblind-safe palettes (Paul Tol), and reproducible pipelines (Git, Docker).

Research questions addressed

  • RQ1: How can knowledge graphs enhance the interpretability of complex experimental datasets?
  • RQ2: What role does interactive visualization play in anomaly detection and validation?
  • RQ3: How can an ontology-based framework improve data integration, retrieval, and interoperability?

Tech stack

Python 3.9 · MongoDB · MinIO · Protégé (OWL) · RDFLib · Streamlit · PyVis (Vis.js) · Plotly · Pandas · NumPy · NetworkX · Docker

Download full thesis (PDF)

ex:Skills

Top skills

JavaScript · PHP · Data Warehouse Architecture · Python · SQL · ETL · Knowledge Graphs

Scientific SW & Data

RDF/OWL, FAIR/CARE principles, ontology engineering, semantic traceability, knowledge graphs, Protégé

Languages & Backend

Python (Pandas, NumPy), SQL, Bash · Flask, FastAPI · Git/GitLab CI, Docker, GCP, MongoDB, MinIO

Frontend & Visualization

React, PyVis (Vis.js), Streamlit · Looker Studio, Power BI, Grafana · Ant Design, Mantine UI, Recharts

Data & AI

Vertex AI, Document AI, PaddleOCR, LLM inference · Reproducible research, CI/CD, automated testing

Certifications

A Hands-on Introduction to Engineering Simulations · SQL · Introduction to programming with MATLAB · HTML Essential Training · Introduction to CSS

ex:References

… fulfilled the student assistance tasks to the fullest satisfaction. His behavior towards colleagues and research partners was always exemplary.

— Annette Strauch-Davey, OVGU (RDM website & Egotech training)

We thank Mr. Ghosh for his performance and wish him all the best for the future.

— Prof. Dr. J. Bernarding & Dr.-Ing. T. Herrmann, IBMI / DIZ, Otto-von-Guericke-Universität Magdeburg

ex:Contact

Open to Data Engineer / Cloud Data Engineer / Analytics Engineer roles in Germany or EU (Hybrid or Remote). Work authorization available. Say hello or share an idea.

Magdeburg, Saxony-Anhalt, Germany