Forward-Deployed Data Engineer at SkyPoint Cloud
Bangalore, IN

Skypoint is a HITRUST r2–certified Agentic AI platform for healthcare operations, designed to accelerate productivity and operational efficiency across healthcare organizations. Our platform enables healthcare providers, payers, and senior care organizations to unify fragmented data, model industry-specific ontologies, and deploy AI agents that automate workflows and support better, faster decision-making.

Founded in 2020 in Portland, Oregon, Skypoint has grown to a team of over 75 employees and now serves more than 100 customers. We are proud to be recognized on Deloitte’s 2024 and 2025 Technology Fast 500™, celebrating the fastest-growing technology companies in North America, and to be featured on the INC. 5000 list in 2025, reflecting our strong and sustained revenue growth over the past three years.

About the Role

We are looking for a Forward-Deployed Data Engineer who thrives at the intersection of technical craftsmanship and client impact. This is a hands-on engineering role embedded within our customer-facing delivery team, working directly with healthcare clients — across payer, provider, and health system environments — to design, build, and optimize the data infrastructure that powers their most critical analytics and AI initiatives.

You are a builder at heart, but you understand that the best data pipelines are ones that serve real people making real decisions. You are fluent in SQL and dbt, meticulous about data modeling, and energized by the challenge of turning messy, complex healthcare data into clean, reliable, well-governed data products.

You also bring an AI-first mindset to your craft. You reach for AI-assisted coding tools instinctively, you think about how the pipelines you build today can power agentic workflows tomorrow, and you are genuinely excited about what it means to build data infrastructure for a world where AI agents are first-class consumers of data.

What You'll Do

Data Engineering & Pipeline Development

    Design, build, and maintain scalable ELT/ETL pipelines that ingest, transform, and serve healthcare data across cloud platforms including Databricks and Snowflake

    Develop robust dbt projects — models, tests, documentation, macros, and packages — that serve as the transformation layer for client data platforms

    Build and manage data pipelines handling complex healthcare data types: claims, clinical, eligibility, provider, and financial datasets

    Implement data quality frameworks, testing strategies, and observability tooling to ensure pipeline reliability and data trustworthiness

    Optimize query performance, warehouse configurations, and pipeline orchestration for cost-efficiency and speed

Data Modeling & Warehouse Design

    Design dimensional models and star schema architectures that are clean, well-documented, and optimized for downstream analytical and AI consumption

    Build and maintain semantic and conformed data layers that serve as the authoritative source for reporting, ML features, and agentic workflows

    Establish and enforce data modeling standards, naming conventions, and layering patterns (raw, staging, intermediate, mart) within client environments

    Work closely with analytics engineers and data consumers to ensure models meet business requirements without sacrificing technical rigor

Client Engagement & Technical Communication

    Work directly with client data and engineering teams throughout project delivery — translating requirements, reviewing existing architectures, and aligning on technical approaches

    Participate in client working sessions and technical discussions, clearly communicating data modeling decisions, trade-offs, and recommendations

    Produce clean technical documentation — data dictionaries, lineage diagrams, architecture overviews — that clients can actually use and maintain

    Act as a reliable, knowledgeable partner to client teams, building credibility through consistent delivery and clear communication

Agentic AI & AI-First Engineering

    Build the data foundations that make agentic AI systems reliable: clean, well-governed data products with clear semantics and dependable freshness SLAs

    Collaborate with AI engineers and analytics leads to ensure data pipelines meet the requirements of LLM-powered and agentic applications — including vector-ready outputs, structured tool-use schemas, and streaming data patterns where applicable

    Use AI-assisted coding tools (GitHub Copilot, Cursor, or equivalent) as a core part of your development workflow — not occasionally, but as a default

    Stay current on how agentic AI systems consume and interact with data, and apply that understanding to how you design and document data products

 

What You Bring

Healthcare Domain Knowledge

    4+ years of data engineering experience, with meaningful exposure to healthcare data environments — payer, provider, and/or health system experience strongly preferred

    Working familiarity with healthcare data concepts and standards: claims (medical, pharmacy, dental), eligibility, HL7/FHIR, EHR/EMR data structures, HEDIS, and encounter data

    Understanding of healthcare data sensitivity and compliance considerations, including HIPAA-compliant data handling and de-identification patterns

Core Technical Skills

    Advanced SQL proficiency — you write complex, performant queries and understand how to optimize them across both Snowflake and Databricks environments

    Expert-level proficiency in Power BI — including complex DAX, data modeling, deployment pipelines, row-level security, and enterprise governance

    Deep, hands-on dbt expertise — you have built and maintained production dbt projects and are comfortable with advanced features: macros, packages, incremental models, snapshots, and test frameworks

    Proven experience designing star schema and dimensional models — you know the difference between a fact and a dimension table in your sleep, and you know when to break the rules

    Strong experience with Databricks — Delta Lake, Unity Catalog, Spark SQL, notebook-based development, and workflow orchestration

    Strong experience with Snowflake — including performance optimization, Snowpark, data sharing, and cost governance

    Proficiency in Python for pipeline development, data transformation scripting, and automation

    Experience with pipeline orchestration tools such as Airflow, Prefect, Dagster, or equivalent

AI-First Tooling & Mindset

    Demonstrated adoption of AI-assisted coding tools (GitHub Copilot, Cursor, Amazon CodeWhisperer, or equivalent) as a daily productivity standard — not an occasional experiment

    Enthusiasm for agentic AI and a clear understanding of what it means to build data products for AI agents as consumers, not just human analysts

    Comfort with the data requirements of AI systems: structured schemas, embedding-ready outputs, retrieval-friendly data products, and reliable freshness guarantees

    Curiosity and initiative in applying new AI tooling to engineering challenges — you look for ways to move faster and build better with the tools available

Communication & Collaboration

    Clear, confident technical communication — you can explain a data model to a data analyst and a pipeline architecture to a platform engineer without losing either audience

    Experience working in client-facing or cross-functional delivery environments where your work is visible and your decisions have direct business impact

    Strong documentation habits — you treat docs as part of the deliverable, not an afterthought

    Comfort with ambiguity and evolving requirements, common in healthcare data environments where source systems are messy and specifications change

 

Nice to Have

    Experience with Microsoft Fabric — Fabric Lakehouses, Dataflows Gen2, Fabric Notebooks, or OneLake

    Exposure to vector databases (Pinecone, pgvector, Azure AI Search) and RAG pipeline patterns for AI-powered applications

    Experience building data pipelines that feed agentic or LLM-powered workflows — tool schemas, structured outputs, or real-time data serving

    Familiarity with healthcare interoperability platforms (Redox, Health Gorilla, Rhapsody) or FHIR API integrations

    Exposure to population health, risk stratification, or quality measure (HEDIS, STAR) reporting data

    DBT certifications or Databricks/Snowflake certifications

    Experience with streaming data platforms (Kafka, Kinesis, or Databricks Structured Streaming) for near-real-time pipeline patterns

    Azure cloud experience (Azure Data Factory, Azure Synapse, Azure Health Data Services)

 

Why This Role

    Do real engineering work that matters — the pipelines you build directly power healthcare decisions that affect real patients and populations

    Work at the cutting edge of healthcare data modernization alongside engineers who take craft seriously

    Be part of a team where AI-first is a genuine operating principle, not a buzzword — you will be expected and supported to build with the best tools available

    Grow your exposure to agentic AI and the infrastructure patterns that will define the next generation of data systems

    Competitive compensation, comprehensive benefits, and a flexible remote-first culture

 

Skypoint is an Equal Opportunity Employer. We do not discriminate based on race, color, religion, sex, national origin, age, disability, veteran status, or any other protected characteristic