Skills
Technical Skills:
ETL Tools : AWS Glue, Azure Data Factory, Airflow, Spark, Sqoop,
Flume, Apache Kafka, Spark Streaming, Informatica.
Hadoop Tools : HDFS, HBase, Hive, MapReduce, Pig, HIVE, Sqoop,
Oozie, Spark
Programming Languages:Python, Scala, SQL, PL/SQL, Linux Shell Scripts
SQL Databases:Oracle DB, Microsoft SQL Server, IBM DB2, PostgreSQL,
Teradata, Azure SQL Database, Amazon RDS.
Visualization Tools Power BI, Tableau
Version Control GitHub, Bitbucket
Software Methodologies Agile, Waterfall
Azure Data Factory, Synapse, Databricks, Blob, Synapse.
AWS : EC2, S3, Glacier, Redshift, RDS, EMR, Lambda, Glue,
CloudWatch, Code Pipeline, EKS, Athena
About
Professional Summary:
• Experienced Data Engineer with around 6 years of IT expertise, specializing in Big Data and Data Analytics. My skill set encompasses data modeling, ETL pipeline development, and data visualization, with a strong command of major cloud platforms such as AWS, Azure.
• Proficiency in navigating the complex landscape of Big Data technologies, including Hadoop, MapReduce, HDFS, HBase, HIVE, PIG, Sqoop, Spark, Scala, Airflow, Flume, Kafka, Oozie, HBase, and Zookeeper.
• Skillfully managed workflow orchestration through Apache Airflow and Oozie, ensuring the efficient coordination and scheduling of diverse Hadoop tasks.
• Designed and deployed comprehensive data pipelines, incorporating Data Lake architecture, Databricks, and Apache Airflow which facilitated seamless processing and management of data.
• Utilized the potent capabilities of Databricks as a robust platform for data processing, implementing scalable and distributed data transformations to extract valuable insights from raw data effectively.
• Demonstrated a strong grasp of data streaming processes by effectively utilizing Kafka for data ingestion, Apache Spark for real-time data processing, and Hive for data warehousing and advanced analytics.
• Optimized data storage solutions for large-scale datasets, leveraging cloud platforms like AWS S3 and Azure Data Lake.
• Showcased expertise in crafting HIVE Queries to process a wide range of datasets, spanning structured, semi-structured, and unstructured data, with the adept utilization of Sqoop for loading data into HDFS and storing it in Hive tables.
• Experienced in handling large datasets using partitions, Spark in-memory capabilities, Broadcasts in Spark, effective & efficient Joins, Transformation and other during ingestion process itself.
• Integrated dbt with modern data warehouses like Snowflake and Redshift, leveraging their features for scalable and efficient data processing.
• Demonstrated proficiency with AWS services including CodePipeline, CodeCommit, CodeBuild, and CodeDeploy in addition to a strong knowledge of cloud services from Microsoft Azure.
• Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range of applications.
• Proficient in leveraging data visualization tools including Power BI, Tableau Desktop to transform complex datasets into interactive, visually compelling reports, and dashboards.
• Skillfully loaded processed data into data warehouses such as data warehouses like AWS Redshift facilitating further analysis and reporting.
• Documented Power BI processes and best practices, providing clear guidelines for report development, data modeling, and performance optimization.
• Integrated dbt with business intelligence tools such as Tableau and Power BI to ensure that transformed data was readily available for reporting and analysis.
• Experienced in Snowflake data warehousing to provide stable infrastructure, architecture, secured environment, reusable generic frameworks, technology expertise, best practices and automated SCBD.
• Worked with Snowflake features like clustering, time travel, cloning, logical data warehouse, caching etc.
• Leveraged Apache Spark for real-time data processing and analysis. Developed Spark streaming applications to process and transform data in near-real-time, providing valuable insights to stakeholders.