rdf:type foaf:Person

Tathagata Ghosh

Research data · FAIR stewardship · Knowledge graphs

I work at the intersection of research data management, metadata and ontologies, and knowledge graphs—making scientific data easier to integrate, interpret, and reuse. My deepest experience is in FAIR-oriented infrastructure for experimental workflows (DRACO at HZDR); I also ship robust data and software where implementation quality matters.

Open to research data management, data stewardship, research software engineering, and PhD-related opportunities · Germany & EU · Hybrid or remote

Tathagata Ghosh

Graph · Key triples

A slice of the career graph: phases, study and research-support roles (OVGU RDM, MIRACUM, HZDR), how they connect, and outputs such as DRACO DataMaster and the thesis. Same facts as the interactive graph below—structured like triples for clarity.

tg:Person pursued Bachelor phase
tg:Person pursued Master phase
tg:Person pursued Current phase
bach:phase studiedAt GITAM · B.Tech
ms:phase studiedAt OVGU · M.Sc.
tg:Person transitionedTo Master phase
tg:Person transitionedTo Current phase
ms:phase parallel work @ OVGU / UMMD / HZDR
tg:Person workedAt HZDR
tg:Person worksAt OWIPL
tg:Person locatedIn Magdeburg
tg:Person built DRACO DataMaster
tg:Person authored M.Sc. thesis
th:Thesis about DRACO DataMaster
th:Thesis relatedTo HZDR · Knodel
tg:Person created OttoBot
tg:Person developed OCR · Partner rec.
tg:Person strengthened Data Eng. · ops roots
DRACO RDM platform Ontology-driven · HZDR
FAIR stewardship DMPs · CARE · OVGU RDM
M.Sc. Digital Engineering Thesis · knowledge graphs
EU Open to opportunities RDM · stewardship · research software · PhD paths

ex:About

I focus on research data infrastructure and data stewardship: how metadata, standards, and semantics help scientific data stay findable, interoperable, and reusable—not only at publication time, but throughout active experiments.

At HZDR, I built DRACO DataMaster as a student assistant: a FAIR-aligned stack for the DRACO laser experiment, with ontology-driven modeling, RDF/OWL in Protégé, automated Python ETL, MongoDB and MinIO at scale, and interactive exploration for validation—so heterogeneous experimental data can be integrated and interpreted with clear provenance. I presented this research data management work at OUTPUT2024 (TU Dresden).

Earlier at OVGU, I supported Research Data Management as a Hilfskraft: Data Management Plans, FAIR and CARE, evaluation of RDMO and RADAR, and bilingual RDM web content—work that sits close to what universities expect from stewardship and policy-facing roles. At Universitätsmedizin Magdeburg (MIRACUM), I worked on clinical data integration with FHIR, OMOP, and PostgreSQL—exposure to standardized health data that complements my later scientific RDM focus.

Alongside that trajectory, I have delivered end-to-end software and data systems in other settings (including a multi-tenant SaaS build with Docker-based services and document pipelines). That implementation experience supports the same goal: dependable, documented systems researchers and operators can actually run.

I am looking for roles where I can contribute to metadata-driven infrastructures, semantic interoperability, and reproducible workflows—in research groups, RDM teams, or doctoral projects that need both conceptual clarity and solid engineering.

ex:ResearchDataInterests

Areas I want to grow in and contribute to—aligned with my HZDR, OVGU, and thesis work, not a separate wish list.

ex:ProfileGraph

This is an interactive career graph, not a static org chart. It traces how earlier engineering and operations experience feeds into a path centered on research data, stewardship, and semantic systems—HZDR, OVGU, MIRACUM, thesis work, and supporting software delivery. Zoom, pan, and click nodes to explore connections; use filters or expand a phase for detail.

Interactive Cytoscape.js graph: phases, organizations, roles, projects, thesis, skills, and motivation nodes with semantic relationships. Use filters and click nodes for details.

ex:Experience

Full-Stack Developer (Freelance)

OWIPL Visakhapatnam, India (Remote)

Context

Freelance engineering on a multi-tenant SaaS product (workforce and operations)—supporting evidence of full-stack delivery, not my primary research identity. Platform in active development (~70% complete).

Systems & data

  • Microservices stack (React, Node.js, FastAPI, Flask, MongoDB) with Docker Compose, health checks, tenant-isolated storage, audit logging, and RBAC—patterns relevant to operational research software and multi-user data boundaries.
  • JWT with refresh, rate limiting, and documented APIs—emphasis on maintainable services and clear data access paths.
  • Workflow-centric dashboards; document pipelines with PaddleOCR and local LLM steps for classification and extraction—applied structured information from unstructured inputs.

Tools

Ant Design, Mantine, Recharts, Framer Motion.

Data Science and Visualization Student Assistant

Helmholtz-Zentrum Dresden-Rossendorf (HZDR) Dresden, Germany

Impact

Primary research-data role: built DRACO DataMaster—FAIR-oriented infrastructure for the DRACO laser experiment—so experimental data, metadata, and semantics stay aligned for integration, retrieval, and reuse across the workflow.

RDM · semantics · infrastructure

  • Ontology-driven schema design; migration from PostgreSQL to MongoDB; object storage on CEPH/MinIO at terabyte scale—supporting the scientific data lifecycle and large-file handling.
  • Knowledge graph (RDF/JSON-LD, OWL in Protégé) for semantic interoperability and experiment-wide linking of entities.
  • Python ETL (including ThreadPoolExecutor), reproducible service layout, InfluxDB and Grafana for monitoring—transparent, operable pipelines for the group.

Community

Presented RDM work at OUTPUT2024 (TU Dresden).

Student RPA Developer — GCP focus

Otto-von-Guericke-Universität Magdeburg Magdeburg, Germany

Impact

University proof-of-concept for structured document intelligence in HR workflows: forms and CVs ingested, entities extracted, ranked matches returned. Project documentation reported 75% reduction in initial screening time for the evaluated flow—relevant to metadata extraction, semi-structured data, and orchestrated pipelines in institutional settings.

Pipelines · cloud · governance hooks

  • Flask portal with Google Document AI and Vertex AI NLP for skills/education extraction (project documentation cites ~40% better parsing than typical ATS for evaluated inputs).
  • BigQuery, TF-IDF/cosine similarity, Cloud Functions; Airflow (Composer) for end-to-end orchestration—patterns transferable to reproducible ETL and scheduled research jobs.
  • Terraform, Docker, Cloud Run; Looker Studio and Stackdriver—observable, deployable services.

Tools

GCP (Document AI, Vertex AI, BigQuery, Composer, Cloud Run), Terraform, Python, Looker Studio.

Wissenschaftlicher Hilfskraft (Research Data Management)

Otto-von-Guericke-Universität Magdeburg Magdeburg, Germany

Impact

Direct research data stewardship support: clearer bilingual RDM presence for researchers, practical Data Management Plans, and alignment with FAIR and CARE expectations.

Stewardship · documentation · tooling

  • DMP drafting and iteration; evaluation of RDMO and RADAR for institutional research data workflows.
  • Revamped bilingual RDM web content; supported legal and ethical data handling questions alongside scientific teams.

Technischer Mitarbeiter IT (Datenintegrationszentrum, MIRACUM)

Universitätsmedizin Magdeburg Magdeburg, Germany

Impact

Contributed to clinical research data integration (Project MIRACUM): standards-based exchange, reproducible ETL, and deployment in trusted environments—foundational exposure to interoperable health data that complements scientific RDM.

Standards · ETL · deployment

  • FHIR server research and implementation; OMOP for medical informatics support—semantic and structural conventions for multi-site data.
  • Python ETL and PostgreSQL for transform, storage, and reporting; local and intranet deployment.

Accenture Google Cloud Winter School

Magdeburg Germany

Intensive GCP foundations—useful background for cloud-hosted research services, managed pipelines, and operational monitoring in RDM contexts.

Real-Time Analyst · Fraud Prevention Representative

TTEC Ahmedabad, India

Real-Time Analyst

Nov 2020 – Sep 2021: workforce scheduling and planning with IEX NICE and Aspect WFM; intraday service-level monitoring; coordination across remote teams.

Fraud Prevention Representative

Jan – Oct 2020: Airbnb fraud prevention via chat and email; investigation, resolution, and quality targets.

Trainee Mechanical Engineer

Pooja Priya Construction Visakhapatnam, India

Mechanical execution against contract; client coordination; safety standards; work breakdown structure; site supervision.

Executive Trainee

Brandix Apparel Solutions Ltd. Visakhapatnam, India

Impact

Production workflow redesign using Lean (Kanban), Six Sigma (DMAIC), and JIT: roughly 45% defect reduction in four months, ~30% improvement in on-time shipment, and near-zero missing-garment incidents; cutting-to-stitching handover held to ≤30 minutes.

Tools & methods

Dynamic line segregation by destination; color-coded Kanban; real-time WIP and throughput dashboards; QA hub consolidation with barcode scanners and shipment staging.

Co-Founder & President

GITAM Aeromodelling Club Visakhapatnam, India

Founded the club; ran a national-level ornithopter workshop with 150+ participants from three states; served as president through 2019.

Summer Intern (Quality Assurance)

Tata Motors Jamshedpur, India

Process validation for CNC equipment as part of the QA department internship project.

ex:Education

M.Sc. Digital Engineering

Otto-von-Guericke Universität Magdeburg

Apr 2021 – Mar 2025 · Magdeburg, Germany

Focus: databases, system architecture, DevOps. Master’s thesis: DRACO DataMaster at HZDR—FAIR-aligned research data infrastructure, ontology-driven modeling, and knowledge graphs (supervision: OVGU & HZDR).

B.Tech Mechanical Engineering

GITAM University

Sep 2015 – Apr 2019 · Visakhapatnam, India

Focus: operations management, materials, mechanics, statistics.

ex:Thesis

Master’s thesis · Core research contribution

DRACO DataMaster

A metadata-driven approach using ontologies and knowledge graphs for laser particle acceleration research

Otto-von-Guericke Universität Magdeburg, February 2025 · Supervisors: Prof. Dr.-Ing. Bernhard Preim (OVGU), Dr. Oliver Knodel (HZDR)

Problem

DRACO at HZDR generates rich, heterogeneous experimental data. Without strong metadata and semantics, integration, retrieval, and reuse lag behind what the science demands.

Contribution

An automated path from messy tabular inputs to knowledge graphs enriched with experiment-specific ontologies, aligned with FAIR so teams can integrate, structure, and visualize data for data-driven research.

Methods

  • Graph exploration — Force-directed layout (Barnes–Hut, O(N log N)) linking devices, shots, and measurements.
  • Interactive analytics — Filtering, parameterized measurement plots, device-activity views for validation and anomaly review.
  • Ontology-driven data management — RDF/OWL in Protégé with RDFLib; MongoDB and MinIO for structured and object storage.
  • UX & reproducibility — Shneiderman-style interaction patterns, colorblind-safe palettes (Paul Tol), Git and Docker for repeatable runs.

Research questions

  • RQ1 — How do knowledge graphs improve interpretability of complex experimental datasets?
  • RQ2 — What role does interactive visualization play in anomaly detection and validation?
  • RQ3 — How does an ontology-based framework improve integration, retrieval, and interoperability?

Tools

  • Python 3.9
  • MongoDB
  • MinIO
  • Protégé (OWL)
  • RDFLib
  • Streamlit
  • PyVis (Vis.js)
  • Plotly
  • Pandas
  • NumPy
  • NetworkX
  • Docker
Download full thesis (PDF)

ex:Projects

The thesis section above is the authoritative write-up of DRACO DataMaster. Below: the same flagship summarized as a project card, plus further OVGU work on knowledge access, retrieval, and applied data systems—unchanged facts, research-oriented framing.

DRACO DataMaster — FAIR research data & knowledge graph

Apr 2024 – Mar 2025 · HZDR · Master’s thesis

Problem DRACO at HZDR produces large, heterogeneous experimental datasets; without strong metadata and semantics, integration and reuse lag behind what the science needs.

Approach Metadata-driven pipelines, experiment-specific ontologies (Protégé / OWL), RDF and RDFLib, MongoDB and MinIO, and interactive graph and chart tooling so teams can validate, explore, and link data across the workflow.

  • FAIR
  • OWL
  • RDFLib
  • MongoDB
  • MinIO
  • Python
  • Streamlit
  • PyVis
  • Docker

Thesis Full treatment, methods, and research questions are in the Thesis section; PDF available for download there.

OttoBot — Transformer-based Educational Chatbot

· OVGU Magdeburg

Problem Students and staff need fast, accurate answers about OVGU policies without hunting PDFs and portals—an institutional knowledge access problem.

Approach RAG-style system: LangChain, Llama-2, HuggingFace embeddings, FAISS, crawlers and Unstructured URL Loader, Streamlit UI—grounding answers in ingested university content (open-source end to end).

  • Llama-2
  • LangChain
  • HuggingFace
  • FAISS
  • RAG
  • Streamlit
  • Python

Outcome Context-aware answers grounded in ingested university content (see paper).

Read paper (PDF)

OCR Strategy for Keyword Extraction & Slide Recommendation

· OVGU Magdeburg

Problem Learners using SQLValidator need feedback tied to the right lecture material—linking exercise errors to relevant slides.

Approach OCR (Tesseract) to recover text from slides, keywording, TF-IDF and cosine similarity for information retrieval-style ranking; bilingual stop-word handling and tuned thresholds.

  • Tesseract
  • OCR
  • TF-IDF
  • Cosine Similarity
  • Python
  • NLTK
  • PyQt

Outcome 72% precision on English slide recommendations in the evaluated setup.

Read paper (PDF)

Project Partner Recommendation System (Big Five)

May 2022 – Oct 2022 · IEEE WCCCT 2023

Problem Course projects suffer when team composition ignores preferences and performance constraints—an applied data and decision-support question in an academic setting.

Approach Big Five questionnaire plus collaborative filtering and utility-based recommendation so teams respect score bands while matching collaboration style.

  • Big Five
  • Collaborative Filtering
  • Recommendation Systems
  • KNN
  • PHP
  • Python

Publication Co-authored with Chukwuka Victor Obionwu, Damanpreet Singh Walia, Taruna Tiwari, David Broneske, and Gunter Saake.

Read paper (PDF)

Other projects

Credit Card Fraud Detection (BQML, GCP, Google Data Studio) · Apr–Sep 2021

ex:Skills

Grouped for research data and stewardship contexts; technical depth reflects HZDR, OVGU RDM, MIRACUM, thesis, and supporting software work—see the career graph for how they connect.

Research data & stewardship

FAIR and CARE principles · Data Management Plans · metadata modeling and documentation · bilingual RDM web content · evaluation of tools (e.g. RDMO, RADAR) · research data lifecycle support · presentation of RDM work (e.g. OUTPUT2024)

Semantic technologies

RDF, RDFS, OWL · ontology engineering in Protégé · RDFLib · JSON-LD · knowledge graphs · semantic interoperability for scientific data

Data engineering & infrastructure

Python · SQL · ETL (batch and threaded) · MongoDB · MinIO · PostgreSQL · Docker · Docker Compose · Git / GitLab CI · GCP (Document AI, Vertex AI, BigQuery, Composer, Cloud Run, Terraform)

Visualization & research support

Streamlit · PyVis (Vis.js) · Plotly · NetworkX · Grafana · Looker Studio · interactive exploration and validation of complex datasets

Collaboration & coordination

Stakeholder communication · clear documentation and process notes · project coordination · cross-functional collaboration · support mindset alongside researchers and operators · earlier experience in operations, QA, and team leadership (student club)

Certifications

A Hands-on Introduction to Engineering Simulations · SQL · Introduction to programming with MATLAB · HTML Essential Training · Introduction to CSS

ex:References

… fulfilled the student assistance tasks to the fullest satisfaction. His behavior towards colleagues and research partners was always exemplary.

— Annette Strauch-Davey, OVGU (RDM website & Egotech training)

We thank Mr. Ghosh for his performance and wish him all the best for the future.

— Prof. Dr. J. Bernarding & Dr.-Ing. T. Herrmann, IBMI / DIZ, Otto-von-Guericke-Universität Magdeburg

ex:Contact

Open to research data management, data stewardship, research software / infrastructure, and PhD or project opportunities in Germany or the EU (hybrid or remote). Happy to discuss how DRACO DataMaster, OVGU RDM, and MIRACUM experience maps to your group. Work authorization available.

Magdeburg, Saxony-Anhalt, Germany