Hadoop PySpark Data Pipeline Build Engineer
- Category: Et cetera
- Deadline: 02nd May 20232023-05-02T01:00:00-0700
- Pennsylvania
RESPONSIBILITIES:Kforce's client seeks team of Hadoop PySpark Data Pipeline Build Engineers in Charlotte (NC), Dallas (TX), Philadelphia (PA) & Palo Alto (CA). This a 1-year contract with opportunity for fulltime employment.Summary:In this role, the Hadoop PySpark Data Pipeline Build Engineer will be responsible for owning and driving the technical delivery of big data technologies and features. This person is an expert engineer with strong PySpark experience and very knowledgeable in the Big Data Concepts/Ecosystem (Hadoop is key) and APIs. The Hadoop PySpark Data Pipeline Build Engineer will be part of successful Big Data implementations for large data integration initiatives.Responsibilities: Design and develop sophisticated, resilient and secure engineering solutions for modernizing our data ecosystem that typically involve multiple disciplines, including big data architecture, data management, and data modeling specific to consumer use cases Develop self-service, multitenant capabilities on the cyber security data lake including custom/of the shelf services integrated with the Hadoop platform, use API and messaging to communicate across services, integrate with distributed data processing frameworks and data access engines built on the cluster, integrate with enterprise services for data governance and automated data controls, and implement policies to enforce fine-grained data access Certify and deploy highly automated services and features for data management (registering, classifying, collecting, loading, formatting, cleansing, structuring, transforming, reformatting, distributing, and archiving/purging) through Data Ingestion, Processing, and Consumption stages of the analytical data lifecycle Design, code, test, debug, and document programs using Agile development practicesREQUIREMENTS: 5+ years of Big Data Platform (data lake) and data warehouse engineering experience demonstrated through prior work experiences Hands-on experience with developing services modern data pipelines, including movement, collection, integration, transformation of structured/unstructured data with built-in automated data controls, and built-in logging/monitoring/alerting, and pipeline orchestration managed to operational SLAs Hands-on experience developing big data solutions leveraging the spectrum of Hadoop Platform compatible features such as Atlas, PySpark, Flink, Kafka, Sqoop, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors Experience automating DQ validation in the data pipelines Experience implementing automated data change management including code and schema, versioning, QA, CI/CD, rollback processing Desired Qualifications: Hands-on experience developing and managing technical and business metadata Hands-on experience creating/managing Time-Series data from full data snapshots or incremental data changes Hands-on experience with implementing fine-grained access controls such as Attribute Based Access Controls using Apache Ranger Advanced understanding of SQL and NoSQL DB schemas Advanced understanding of Partitioned Parquet, ORC, Avro, various compression formats Experience with any one of Ansible, Chef, Puppet, Python, Linux Scripts Development of automation around DevOps style data pipeline deployments and rollbacks. Developing containerized Microservices and APIs Understanding of Data Governance policies and standards Google cloud data services experience (bonus) Familiarity with key concepts implemented by Apache Hudi or Iceberg, or Databricks Delta Lake (bonus)The pay range is the lowest to highest compensation we reasonably in good faith believe we would pay at posting for this role. We may ultimately pay more or less than this range. Employee pay is based on factors like relevant education, qualifications, certifications, experience, skills, seniority, location, performance, union contract and business needs. This range may be modified in the future.We offer comprehensive benefits including medical/dental/vision insurance, HSA, FSA, 401(k), and life, disability & ADD insurance to eligible employees. Salaried personnel receive paid time off. Hourly employees are not eligible for paid time off unless required by law. Hourly employees on a Service Contract Act project are eligible for paid sick leave.Note: Pay is not considered compensation until it is earned, vested and determinable. The amount and availability of any compensation remains in Kforce's sole discretion unless and until paid and may be modified in its discretion consistent with the law.This job is not eligible for bonuses, incentives or commissions.Kforce is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.
Similar jobs
Kforce Technology Staffing - Hadoop PySpark Data Pipeline Build Engineer
Kforce Technology Staffing - Hadoop PySpark Data Pipeline Build Engineer
Kforce Technology Staffing - Hadoop PySpark Data Pipeline Build Engineer
Kforce Technology Staffing - Hadoop PySpark Data Pipeline Build Engineer