Senior AI Data Engineer
Software Engineering, Data Science
york, uk
Responsibilities:
- Implement JSON-LD-based semantic models designed by the ontologist into production data systems
- Build and maintain knowledge graph structures that reflect canonical domain models • Develop and manage graph database schemas, queries, and data ingestion pipelines
- Ensure semantic consistency between ontology definitions and downstream data product
- Design and implement embedding pipelines that represent Comply’s financial and regulatory data in
vector space - Build and operate vector database infrastructure for semantic search and similarity retrieval
- Implement RAG (Retrieval-Augmented Generation) architectures that ground LLM outputs in Comply’s
proprietary data - Evaluate and integrate LLM tooling and frameworks appropriate to Comply’s use cases
- Build reliable, observable data pipelines that feed the semantic layer from upstream broker and
regulatory data sources - Apply DataOps practices including testing, monitoring, lineage tracking, and SLAs
- Work with Data Engineers and Backend Engineers to embed semantic models into APIs and data contracts
- Ensure the semantic layer scales with data volume and platform growth
- Partner closely with the Ontologist to ensure implemented models faithfully reflect domain intent
- Support consuming application teams in understanding and adopting AI-ready data products
- Contribute to resolving cross-domain data integration challenges
Skills and Qualifications:
- Strong hands-on experience in data engineering, with a focus on semantic or AI data infrastructure
- Experience building and operating knowledge graphs or graph databases (e.g. Jena Fuseki, Neo4j, Amazon
Neptune, or equivalent) - Experience with vector databases and embedding pipelines (e.g. Pinecone, Weaviate, Qdrant, pgvector)
- Practical experience implementing RAG architectures or LLM-integrated data pipelines
- Familiarity with semantic web standards — JSON-LD, RDF, OWL, or SKOS
- Strong Python skills and experience with data pipeline frameworks
- Experience with cloud-native data platforms (AWS, Azure, or GCP)
- Exposure to domain-driven design (DDD) and bounded contexts is desirable.
- Experience working directly with ontologists or knowledge engineers is a plus.
- Familiarity with data contracts and data product frameworks is a plus.
- Experience with DataOps tooling, data reliability, or data observability platforms is desirable.
- Background in financial services, RegTech, or compliance data is a plus.