We are seeking a Senior Data Engineer who combines deep data engineering expertise with hands-on experience in AI agent orchestration, automated data pipelines, and intelligent data hygiene and enrichment systems. The role owns the architecture, quality, and scalability of Vequity’s data ecosystem—from ingestion to output generation. You will ensure robust data pipelines, reliable AI-assisted enrichment, and accurate, governable data feeds that power our platform’s buyer intelligence. You will collaborate with AI, product, and engineering teams to deliver value-driven data APIs and feeds while advancing our data reliability and scalability goals.
Vequity is building the world’s most robust, contextualized buyer intelligence network for investment banks, private equity firms, and strategic acquirers. Our platform currently houses over 1.5 million buyer profiles with approximately 100 structured and inferred data fields per profile. We leverage proprietary AI agents to continuously enrich, infer, and structure buyer intelligence at scale. As a Senior Data Engineer, you will own the architecture, quality, and scalability of our data ecosystem—from ingestion and cleaning to inference and output generation. You will partner with AI, product, and engineering teams to deliver data APIs and feeds that power our platform's decision-support capabilities. Your work will directly impact data reliability, operational efficiency, and the precision of buyer attributes used across our customers.
- Data Architecture & Pipelines: Design and optimize large-scale ETL/ELT data pipelines that handle structured and semi-structured data from diverse sources (APIs, web scraping agents, LLM-generated JSON outputs, internal datasets). Implement robust data validation, normalization, reconciliation, and automated ingestion/transformation using orchestration frameworks to ensure high data integrity.
- AI Agent Orchestration & Integration: Develop, train, and refine LLM-based data agents to collect, clean, update, and infer new attributes for buyer profiles. Create robust prompting architectures, JSON schema validation, and feedback loops to ensure high-quality structured outputs. Collaborate with engineering to optimize agent reasoning, reduce hallucinations, and integrate embeddings for context-aware inference.
- Data Governance & Quality: Establish standards for data versioning, lineage, and observability. Build quality-control layers for agent-generated data, including confidence scoring, human-in-the-loop validation, and automated correction mechanisms. Ensure compliance with data governance, privacy, and security requirements.
- Collaboration & Product Integration: Work cross-functionally to deliver data APIs and feeds into the platform. Align with leadership to prioritize data reliability, scalability, and innovation. Lead continuous improvement of the data infrastructure roadmap.
Preferred experience includes familiarity with LangChain or similar agent frameworks (e.g., LlamaIndex, Haystack), tool-calling, and embedding-based retrieval. Exposure to data quality scoring, schema evolution, and metadata management tools (dbt, Great Expectations, etc.). A background in investment data, market intelligence, or deal-sourcing platforms is a plus. Demonstrated experience leading small data or engineering teams is desirable.