Sampath T

Data engineer

Plano, Texas, United States

Skills

Technical Skills:
Programming: Scala, Core Java
Scripting: Shell, Python
Hadoop Ecosystem: Hadoop, HDFS & Map Reduce (Java)
Spark: SPARK Scala, Spark-SQL, PYSPARK, Spark streaming
Streaming: Spark Streaming, Kinesis
Batch Processing: MapReduce, Spark
Databases: SQL Server 2008 R2, Oracle, Greenplum, SAS EG, PostgreSQL, Cassandra, Mark logic, Snowflake, Redshift, Big Query
Source Code: GIT Version Control, Bitbucket
CICD: Jenkins, Bamboo

About

Professional Summary:
• Overall, I have 13+ years of experience in the IT industry with a demonstrated history of working in US health care, Semiconductor industry, and being skilled in Bigdata Platform with AWS and GCP such as HIVE, HBASE, HUDI, KINESIS, REDSHIFT, SNOWFLAKE, BIGQUERY, SPANNER, SPARK/SPARK STREAMING, KAFKA and AIRFLOW.
• Experience in AWS, GCP components EMR, S3, EC2, Step Function, SNS, Lambda, Cloud Watch, Cloud Trail, Cloud Formation Template, Dataflow, Data Proc, Cloud Functions, Cloud SQL, PUBSUB, Code Pipeline, Build, and Deploy.
• Experience in writing Kafka procedures and consumer for streaming real time messages to Kafka topics and processed them using spark streaming and written data to NOSQL database.
• Implemented data partitioning strategies and Kafka topic configurations to support parallel processing and workload balancing.
• Experience working with snowflake components such as the SNOW MACHINE, SNOW CAT, and STREAMS
• Experience in Snowflake Query Performance Tuning (clustering, caching, and partitions).
• Designed and implemented Snowflake-based data warehousing solutions, improving query performance by 30%
• Developed Spark and Scala/python programs for reading all file formats and RDBMS, transforming the data, and loading data into hive, Snowflake, Redshift, Big query, and buckets (S3, GCS).
• Sound practice in writing programming languages like Python, Spark Scala, Core Java and Scripting Languages Python and Shell.
• Worked on different file formats (Orc file, Text File, Parquet, JSON, XML, Kalrf, Tiff, and Hudi) and different compression codes (Gzip and Snappy).
• Experience working with HBase for storing and retrieving reference data, meta-data, data reconciliation, and reporting.
• Experience in developing complexes: Hive queries to extract the data from the Hadoop environment, applying necessary business rules.
• Designed the Airflow workflows that extract data from the Hadoop Data Lake environment and transform the data to XML/JSON.
• Sound knowledge of databases, with first-hand experience in MySQL, SQL, Oracle 9i, SQL Server 2008 R2, SAS EG.
• Good interpersonal communication skills, analytical and logical abilities, and a demonstrated ability to work as a team member as well as independently.
• A thorough professional and an adaptive person of recent technologies, willing to accept new responsibilities, hardworking, and flexible.
• Key Day-To-Day Responsibilities
• Conducting and Involving in Technical discussions with team, on Solution Design and Technical Decisions.
• Developing, Testing, and Deploying Data Pipelines involving ingestion, Processing and Reporting.
• Automating applications to execute on scheduled intervals satisfying all the input dependencies to meet agreed Business SLAs.
• Creating CICD Pipelines for Automated Code Deployments into Test & Production Servers using Git, Jenkins.