Skills
Big Data & Processing:
Apache Spark, Hadoop (MapReduce, HDFS), Hive, Pig, Kafka, Flink, NiFi, Storm, Samza
Snowflake, AWS Redshift, Azure Data Lake, Google BigQuery
Data Engineering & ETL Tools:
Azure Data Factory, Informatica PowerCenter, SSIS, Talend, AWS Data Pipeline, Apache Beam
Programming Languages & Libraries:
Python, SQL, PySpark, PyTorch, TensorFlow, Scikit-learn, Pandas, NumPy, R
Data Integration & Storage:
PostgreSQL, Azure SQL Database, Cosmos DB, MongoDB, DynamoDB, Cassandra, MySQL
DevOps & Infrastructure:
Docker, Kubernetes, Terraform, Jenkins, GitLab CI/CD, Git, SVN
Cloud Platforms:
AWS (S3, Glue, Lambda, IAM, RDS, EMR), Azure (Stream Analytics, Key Vault, AD), GCP
BI & Visualization:
Power BI, Tableau, Qlik Sense, D3.js, Matplotlib, Seaborn
Security & Governance:
Azure Active Directory, Apigee, OAuth 2.0, Apache Ranger, HashiCorp Vault
Other Tools & Frameworks:
Apache Airflow, ELK Stack (Elasticsearch, Logstash, Kibana), Apache Tika, Apache Camel, Splunk, Grafana, Prometheus, RSA Encryption, Apache Solr, Blockchain, IBM Watson
About
I am a seasoned Senior Data Engineer with over 8 years of experience building scalable, enterprise-level big data solutions across both cloud and on-premise environments. I specialize in optimizing data pipelines, automating ETL workflows, and enabling real-time analytics using tools such as Apache Spark, Kafka, Hadoop, Azure Data Factory, and Snowflake. I have worked extensively with both structured and unstructured data, deploying machine learning models, ensuring data quality, and applying DevOps practices to support agile data operations. My industry experience spans finance, healthcare, and e-commerce, and I have hands-on expertise with cloud platforms including AWS, Azure, and GCP.