Talent.com
PST.AG
Data Extraction EngineerPST.AG • Cebu, Cebu, PH
Data Extraction Engineer

Data Extraction Engineer

PST.AG • Cebu, Cebu, PH
30+ days ago
Job description
Role Overview: Data Extraction Engineer designs extraction systems (and not just scripts). They build and maintain a next-generation data acquisition platform that treats web scraping as a declarative, specification-driven discipline. Instead of hard-coding XPaths for every site, Web Scraping Developer defines what data is needed—using schemas, natural language descriptions, or visual blueprints—and lets intelligent pipelines figure out how to get it. Key Responsibilities: Specification-Driven Extraction Engineering- Design and maintain declarative extraction specifications—using Pydantic models, JSON schemas, or domain-specific languages—that describe exactly which fields to capture, their types, and validation rules. Implement pipelines that translate these specifications into executable extraction plans, leveraging both classical (Scrapy, Playwright) and AI-augmented (LLM-based semantic parsing) backends. Build reusable specification libraries for recurring data types (product prices, tariff codes, regulatory texts) to accelerate onboarding of new sources. Autonomous & Self-Healing Systems- Deploy self-healing spiders that automatically detect website layout changes and repair themselves using Model Context Protocol (MCP) servers (e.g., Scrapy MCP Server, Playwright MCP). Integrate semantic extraction (Scrapy-LLM, custom LLM pipelines) to eliminate selector brittleness—spiders rely on field descriptions, not fragile XPaths. Orchestrate complex, multi-step browsing workflows with agentic frameworks (BMAD/TEA, AutoGPT-like agents) that reason about page state, adapt to anti-bot measures, and correct their own behaviour in real time. Platform Thinking & Reusability- Move beyond one-off scrapers: build a component-based extraction platform where selectors, login handlers, and pagination logic are shared, versioned, and tested. Implement monitoring, alerting, and automatic rollback for failed extraction runs. Champion ethical crawling by design—rate limiting, robots.txt respect, and compliance with GDPR/CCPA are built into the specification layer, not retrofitted. Collaboration & Continuous Innovation- Partner with data scientists and domain experts to refine extraction specifications for complex, unstructured domains (e.g., legal texts, tariff classifications). Evaluate and pilot emerging tools to push automation coverage beyond 90%. Document and evangelise specification-driven best practices across the engineering organisation. Candidate Profile: Education and Experience - Bachelor’s degree in Computer Science 3+ years of experience in web scraping or data extraction Skills and competences- Specification-Driven Extraction – Experience defining extraction requirements via schemas (Pydantic, JSON Schema) and executing them through both traditional crawlers and LLM-based semantic parsers. Self‑Healing & Semantic Extraction – Hands‑on use of Scrapy‑LLM, Scrapy MCP Server, or similar systems that decouple field definitions from page structure. Agentic Workflows – Familiarity with frameworks that give LLMs browser control (Playwright + MCP, BMAD/TEA) to handle complex, non‑deterministic crawling tasks. Classical Scraping Fundamentals – You still know how to write a Scrapy spider or a Playwright script when needed, but you actively seek to replace that work with reusable, specification-driven components. Data Validation & Storage – Ability to define validation rules within specifications and land clean data into SQL/NoSQL databases or data lakes. Python proficiency: the focus is on an extraction engineer who happens to use Python. HTTP, DOM, XPath, CSS. Basic API integration and authentication flows. Preferred / Nice-to-Have Skills: Contributions to open-source scraping or AI-automation projects. Experience training or fine-tuning small LLMs for domain-specific extraction. Familiarity with data privacy engineering (GDPR, CCPA) baked into specification design. DevOps light – Docker, CI/CD for testing extraction specifications. Mindset & Approach (Non-Negotiable): Strong belief that the future of scraping is declarative, not imperative. You’d rather write a schema that says “extract the price” than debug an XPath when a website redesigns. Looking to shift from “code that scrapes” to “systems that understand extraction”.
Create a job alert for this search

Data Extraction Engineer • Cebu, Cebu, PH

Similar jobs

Data Engineer

ECLAROcebu city, central visayas, ph

The ETL Data Engineer is mainly responsible for transforming replicated source data into data products that are easily consumed by the Reporting team and other downstream users.The role requires ex... Show more

 • Promoted

Data Engineer

HireKayanaCebu City, Central Visayas, Philippines
Quick Apply

Kayana is a talent agency that connects Filipino Remote Professionals with global companies that operate within real systems and value disciplined execution.These are long-term professional roles—n... Show more

Palantir Applied Engineer

Vanyarcebu city, central visayas, ph

We're a Palantir Foundry and AIP specialist firm.We help organisations turn messy, broken data into software people use to make decisions.We need someone who can build in Foundry, talk to clients w... Show more

 • Promoted

Data Engineer

Penbrotherscebu city, central visayas, ph

Penbrothers is an HR & remote talent management partner and one of the fastest-growing companies in the Philippines.We provide talented Filipinos with global opportunities in high-growth startups a... Show more

 • Promoted

Data Engineer - Senior (Informatica Power Center, ETL/ELT)

Arch Global Services Philippinescebu city, central visayas, Philippines

Data Engineer will work closely with the rest of the product team and business peers to effectively execute Data project initiatives.This person should be able to deliver in a fast-paced and deadli... Show more

 • Promoted

Databricks Data Engineer

Our ClientsPhilippines, Cebu, Philippines
Quick Apply

A leading global technology organization is looking for a.This role focuses on developing scalable data pipelines, ensuring data quality, and enabling advanced analytics and AI use cases.The ideal ... Show more

Senior Data Pipeline Engineer (Remote, AWS)

Isentiacebu city, central visayas, Philippines

A leading media intelligence provider seeks a Senior Software Engineer in Metro Manila.You will drive complex builds, design data solutions, and ensure high performance in data pipelines.Proficienc... Show more

 • Promoted

Remote Senior Data Engineer: Azure & Multicloud

UST España & Latamcebu city, central visayas, Philippines

UST España & Latam busca un/a Senior Data Engineer especializado/a en Azure para liderar proyectos de sincronización de datos entre nubes.El candidato ideal tiene al menos 5 años de experiencia en ... Show more

 • Promoted

Senior Data Engineer (AWS)

Med-Metrixcebu city, central visayas, ph

Denials AI workflow under the guidance of the Team Lead, Data Management.This role ensures data is reliable, compliant with HIPAA, and optimized.Collaborate with the Team Lead and cross‑functional ... Show more

 • Promoted

Senior Data Engineer - WFH/Remote

CoDev Philippinescebu city, central visayas, ph

Looker (LookML) to lead the frontend analytics and data modeling layer of a client-facing insights platform.In this role, you will transform raw data from Google Cloud Platform (GCP) into secure, s... Show more

 • Promoted

Snowflake Data Engineer

Our ClientsPhilippines, Cebu, Philippines
Quick Apply

A global organization is seeking a.This role focuses on developing scalable data pipelines, implementing efficient ELT processes, and leveraging advanced platform capabilities to support analytics ... Show more

Applied AI Engineer

ECLAROcebu city, central visayas, ph

The Applied AI Engineer designs and builds intelligent services and agentic workflows that sit on top of the cloud, data, and application stack.This role creates production-ready solutions for rank... Show more

 • Promoted

Data Warehouse Developer - Mid (Data Engineering, ADF, Snowflake)

Arch Global Services Philippinescebu city, central visayas, ph

The Datawarehouse Developer has responsibility of development and maintenance of Reinsurance Datawarehouse Application used extensively by Reinsurance business teams.The Senior Datawarehouse Develo... Show more

 • Promoted

Business Data Analyst

Staff X - Offshoringcebu city, central visayas, ph

The Business/Data Analyst is responsible for enabling data-informed decision making across the organization by ensuring data is accurate, accessible, and structured to support operational and strat... Show more

 • Promoted

Data Engineer

Atoscebu city, central visayas, Philippines

Job Type: Contract for 5 Months Plus Extension.Design, develop, and maintain ETL/ELT pipelines on Microsoft Azure (Azure Data Factory, Databricks, Azure SQL / Azure SQL Managed Instance).Optimize S... Show more

 • Promoted • New!

Data Extraction Engineer

PST.AGcebu, cebu, Philippines

Data Extraction Engineer designs extraction systems (and not just scripts).They build and maintain a next-generation data acquisition platform that treats web scraping as a declarative, specificati... Show more

 • Promoted

Senior Data Scientist

Deltekcebu city, central visayas, ph

Develop and validate supervised learning models, including customer health scoring, churn prediction, upsell/cross-sell propensity, and lead scoring using Snowflake's ML platform capabilities.Perfo... Show more

 • Promoted

Lead Data Scientist – Media, Measurement & AI

Switch Digitalcebu city, central visayas, ph

Lead Data Scientist – Media, Measurement & AI.Switch Digital is a strategically led advertising and media agency based in Sydney.We build profitable consumer relationships through rigorous media st... Show more

 • Promoted

Machine Learning Engineer

iScale Solutionscebu city, central visayas, ph

Optimize ML model serving for low-latency inference (target: sub-200ms P95) on EKS.Advise on and implement AWS-native ML infrastructure (SageMaker endpoints, model registry, A/B testing, monitoring... Show more

 • Promoted

Senior Full-Stack Cloud Engineer

My Business Care Teamcebu city, central visayas, ph

My Business Care Team (MyBCAT) is the leading Remote Hospitality Center for optometry practices in the United States.We provide human-powered call center services augmented by AI technology — handl... Show more