SENIOR PARTNER'S STATEMENT
COVER LETTER:
We have had extensive experience in Big Data Cloud logical and physical design of the the datalake, staging area and warehouse with clients such as Bank of America, State of Minnesota, Royal Caribbean, Lockheed and with the corporate customer warehouse. This has involved the creating and supporting of the expense distribution work file tables and Business Objects, ad-hoc workloads.
A primary responsibilities is performance tuning of the warehouse which included running explains on sql code, running a macro which we wrote to gather the important performance metrics and give the results regarding compliance. Looking at global and specific issues on the system. As the a primary analyst, architect and modeler both at the bank and on other projects we have been directly involved with the design of the EDW enterprise data warehouse. ALPHA (ACT) has been deeply involved with the design of the new Bank Data Warehouse (BDW) at BoA and many, many other clients.
This, the EDW, was our single point of truth at the bank; with the design including business intelligence (BI) and encompassed activities such as Know The Customer (KTC) and anti-money laundering (AML) intelligence. At ALPHA's many projects we defined and developed DBA and EDW standards including database view design, extract translate and load (ETL) processes (from the operational data store(ODS) along with experience in Data Stage; to the raw data warehouse, then to staging, having gone through the ETL, and finally to the target). With both the view (logical data view) and ETL we were highly involved with performance such as an enterprise view with 32 joins, to 12 tables churning through 2.5 billion rows. We tuned this view from nearly 120 minutes run time to .40 seconds, primarily through collection of stats (something that was overlooked for a long time.) and some redesign. We are experts with ERwin and the LDM/PDM model process
We have extensive experience creating the warehouse from scratch on new hubs, setting up users using AZURE and/or AWS console management, establishing ROLE relationships (IAM) for authorization, and access to the database (DB) objects such as tables, view, macros, functions and procedures.
Certifications/Expertise:
· AZURE for Microsoft SaaS, PaaS & IaaS
o AZURE WITN VISUAL STUDIO.
o AZURE RESOURCE GROUPS, VIRTUAL MACHINE, RDBMS, WEB SERVER.
o SECURITY & DEVOPS.
o AZURE DATALAKE, BLOB STORAGE, TABLE STORAGE
o Streamlined the software development lifecycle with PaaS.
o Agile & expanded developer resources… Microservices (not Monolithic).
o Supported mobile and multichannel initiatives.
o Accelerated integration (SaaS and legacy modernization).
o Deliver “on the promise” of enterprise-grade implementations.
· AWS – Amazon Web Services in the Cloud
o Extensive use of:
§ AWS:
· IAM Security thru IAM Identity and Access Management in AWS
· RANGER - data security across the Hadoop platform.
· NameNode Heap Memory Management
· S3 Buckets, Simple Storage Service at CAS and Sate of MN
· Data Lake Arch/Design DMS Data Migration Services and Direct Connect
· HDFS & HIVE INGEST into S3 buckets (GLUE ETL the Data Lake) into Redshift (Warehouse)
· RedShift 4+ years at Chemical Abstract Services & State of MN
· EMR Elastic Map Reduce and EC2 Elastic Compute Cloud – Cluster
· VPC Virtual Private Cloud Utilization
· RDS Relational Data Service, Metastore
· ELB Elastic Load Balancing
· AWS Data Pipeline/Data Lake/Enterprise Warehouse Architect
o Functions Performed
§ 1. Worked directly with executives of IT and Business and principal engineers to orient the data team’s strategy with the company goals
§ 2. Lead data engineering team to design and implement data stores, pipelines, ETL routines and API access
§ 3. Lead data product teams to consume data and make it available to both internal and external customers for analysis, troubleshooting, BI, predictive use cases, etc.
§ 4. Designing and implementing Data Warehouse for large scale high volume data loads for customer action and churn analysis
§ 5. Design and implement features built on machine learning, such as customer behavior, churn, next best action, next best offer, etc.
o Experience & Skills
§ Expert level solution architecture skills in the following:
· 1. Metadata Management
· 2. Data Governance
· 3. Data Security
· 4. Big Data
· 5. Data Quality and Recovery
§ Expert level skills with hands-on experience in the following:
· 1. Migrate on-prem database to AWS S3 with AWS Database Migration Service (DMS)
· 2. Setup AWS Glue to prepare data for analysis through automated extract, transform and load (ETL) processes and load to enterprise warehouse on EMR EC2 instances.
· 3. Setup AWS Kinesis to process hundreds of terabytes per hour from high volumes of streaming data from various sources
· 4. Develop event driven data processing pipeline code and execute on AWS Lambda
· 5. Develop interactive query with AWS Athena to analyze the data in AWS S3
· 6. Setup AWS Elastic MapReduce (EMR) tool EC2 instances with Hadoop (and optionally Hive and/or Pig) installed and configured on them, to be able to process big data and perform analysis
· 7. Build and train machine learning models for predictive or analytical applications in the AWS SageMaker, with experience in creating notebook instance, prepare data, train the model from the data, deploy and evaluate the model performance
· 8. Setup data warehouse with Amazon Redshift. Experience in creating Redshift clusters, upload data sets and perform data analysis queries.
§ Years of Expertise
· · 8+ years of data architecture/admin experience
· · 3+ years of experience working with AWS DMS, S3, GLUE, KINESIS, LAMBDA, ATHENA, EMR, SAGE MAKER, Redshift, RDS
· · 5+ years on Big Data (Hadoop, Spark, Java, Scala, Python)
· · 7+ years on Metadata Management, Data Governance, quality and security.
· · 2+ years of Python development experience
· · 3+ years of Redshift experience. Understanding of relational data models is a must
· · 3+ years of experience working with big data architectures in high-volume environments
· · Extensive experience building and managing ETL pipelines on cloud based platforms from inception to production rollout
· Cloudera CDH 5.7, 5.8, 5.9,5.10 & 5.15Architect & Administration
o Cloudera Navigator 2.9
o Cloudera Director (Installation in cloud AWS)
§ KEBEROS Specialist
§ Hortonworks 2.0 Architect certified
§ MapR Administration
§ Hadoop/MapR certified
§ Enterprise NoSQL
§ SPSS v23 Scientific Data
· Microsoft SQL Ssis Deliveries:
o Support for Microsoft SQL Server Integration Services (SSIS) (2005, 2008, 2012, 2014, 2016 & 2017)
o Microsoft Business Intelligence Development Studio
o START_TLS security protocol extension support
o Utilized Strong 3DES encryption, message integrity checking, secure secret key exchange
o Provided for SOCKS firewall and proxy support
o Enabled 256-bit (strong) encryption
· UNIX ADMINISTRATION- SCRIPTS/CRONTABS
· TALEND MDM ADMIN 6.1, Configuration/Installation
· IBM certified DB2 Mainframe & Distributed Unix
· Teradata certified
My primary Big Data Platform (12 years overall) is AWS (6 yrs) along with a lot of Cloudera (4 yrs) and Azure (4yrs) Big Data architecture/Design not only data lake design and development but also a total solution architecture approach to Big Data implementation. 10 plus years’ experience in HADOOP both development and architecture with 6 plus years as an architect. My initial work included work at Berkeley 2003, on initial release of Hadoop and next with AWS, the Amazon Web Services, Cloudera Navigator 2.9, Cloudera CDH 5.7-5.10, Impala/KUDU, Cloudera Director and Hortonworks/Azure. TALEND MDM 4yrs, ETL process including scripting 14+ years, Zookeeper, HMaster, HBase database, HFile, Apache: Flume (log files)ingest 2 years, Oozie (sched. Workflow) 3 years, Sqoop (xfers data) 3 years, Python (2.7 & 3.6 w/SPSS Statistics 23 ) 5 years, Dev Tools such as Spark (with Perf, & Caching) 2 years, HBase 4 years, Pig 4 years, Analysis with: Drill (SQL) 2 years, Hive (HQL) 4 years, Mahout (Clustering, Classification, Collaborative filtering) 6 mos., additionally C & C++, and Shell Script 5 years. I have extensive use of MDM tools, and Erwin and additionally Power Designer and IBM’s ER tool. I have extensive work on Apache Hadoop which is a highly scalable storage platform designed to process very large data sets across hundreds to thousands of computing nodes that operate in parallel. Hadoop provides a cost effective storage solution on commodity hardware for large data volumes with no format requirements. Additionally, extensive work with MapReduce, the programming paradigm that allows for this massive scalability, is the heart of Hadoop. Note that the term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. Hadoop has two main components- HDFS and YARN.
· I utilized Ansible, Tower Red Hat, to scale automation, manage complex deployments and speed up the productivity at the client site for CAS. I further used to extend the power of the workflows process to streamline jobs and simple tools to share solutions with the CAS team. With Ansible, IT we were able to free admins from automating away the drudgery from the daily tasks. This Automation freed admins up to focus on efforts that help deliver more value to the business by speeding time to application delivery, and building on a culture of success. I was able to give teams the one thing they can never get enough of: time. Allowing smart people to focus on smart things.
· I used StreamSets Data Collector (SDC) is an Open Source lightweight in streams data in real time. It allowed us to configure data flows as pipelines through a web UI in few minutes. Among its many features, it makes possible to view real-time statistics and inspect data as it passes through the pipeline
· 25+ years of experience in IT systems or applications development
· 15+ years of experience architecting or delivering large scale systems on multiple platforms, with a focus on Big Data Hadoop
· Talend (4 years) utilized on several projects to simplify and automate big data integration with graphical tools and wizards that generate native code. This allowed the teams to start working with Apache Hadoop, Apache Spark, Spark Streaming and NoSQL databases right away. Talend Big Data Integration platform was utilized to deliver high-scale, in-memory fast data processing, as part of the Talend Data Fabric solution, so the project enterprise systems allowing more data into real-time decisions. It provided blazing
fast speed and scale with Spark and Hadoop, it allowed for anyone to access and cleanse big data while governing its use and allowed for optimization of big data performance in the cloud on several project.
· Graphics and statistics implementations with RStudio and R programming which is (a free and open source tool) utilized for the integrated development environment (IDE).
· Agile development experience as development team leader.
· Experience working in network operations center (NOC) administrators supervise, monitor and maintains work to maintain a telecommunications network
· Extensive Data Warehousing (Teradata, DB2, Teradata, SQL Server, MySQL & Oracle including building/implementing)
· Microsoft Azure Cloud Technologies (Dashboard & performance with web problem identification and drill-down capability)
· Hortonworks 2.0
· AWS Eco-System
· Cloudera 5.7-5.10 CDH w/Cloudera Manager
· Cloudera Navigator 2.9
· Extensive JAVA coding experience.
· Azure Data Factory to manage Batch, HDInsight, Machine Learning
· Administration tools such as Apache Zookeeper, MapReduce, YARN
· Maintain Cluster Configuration Info Cassandra, Zookeeper
· Monitor Cluster Heartbeats
· Apache: Flume (log files), Oozie (sched. Workflow), Sqoop (xfers data for relation DBs), Python (lang.), Scala (lang.), Java (lang.)
· Dev Tools: Spark (Perf, w/Caching), HBase, Pig, Shell, MongoDB
· Analysis with: Drill (SQL), Hive (HQL), Mahout (Clustering, Classification, Collaborative filtering)
· Tableau (dashboard) & Talend (MDM, mapping & Datalinage)
· MongoDB: One of the most popular document stores. It is a document oriented database. All data in Mongodb is treated in JSON/BSON format. It is a schema less database which goes over terabytes of data in database. It also supports master slave replication methods for making multiple copies of data over servers making the integration of data in certain types of applications easier and faster. MongoDB combines the best of relational databases with the innovations of NoSQL technologies, enabling engineers to build modern applications. MongoDB maintains the most valuable features of relational databases: strong consistency, expressive query language and secondary indexes. As a result, developers can build highly functional applications faster than NoSQL databases. MongoDB provides the data model flexibility, elastic scalability and high performance of NoSQL databases. As a result, engineers can continuously enhance applications, and deliver them at almost unlimited scale on commodity hardware. Full index support for high performance.
· Integrations & Migrations
· Collaboration as a consultant with Teradata Professional Services
· Collaboration as a consultant with IBM Professional Services
· Advanced Analytical solutions – IBM, Teradata, HCL
· PhD Business Psychology/ComSci Machine Learning and Artificial Intelegence… with excellent verbal and written communication and persuasion skills; able to collaborate and engage effectively with technical and non-technical resources, speaking the language of the business
· Have proven experience solving complex problems in a multi-platform systems environment
· Cloud/XaaS solutions
· Demonstrated comprehensive expert knowledge and exceptional insight into the information technology industry
· Expertise in application and information architecture / design artifacts and mechanisms
· TOGAF or Zachman with practical experience in the use of these common Architecture frameworks
· Experience HL Conceptual Models and the development, implementation, and management of Enterprise Data Models, Data Architecture Strategies, Delivery Roadmaps, Information Lifecycle Management, and Data Governance capabilities
· PhD Psychology with minor in Computer Science
· Encryption tools such as Protegrity and in depth understanding of security legislation that affects our businesses, including, but not limited to Sarbanes-Oxley, Payment Card Industry regulations, Customer Data Protection regulations and contemporary security legislation activities that may impact future plans
· Significant experience with three or more of the following technologies: Teradata, Tableau, Cognos, Oracle, SAS, Hadoop, Hive, SQL Server, DB2, SSIS, Essbase, Microsoft Analysis Services
Databases/Platforms:
Hadoop 1.0 & 2.0 Impala/KUDU, HBase, MongoDB, MySQL, SQLObject, SQLAlchemy, Postgres SQL, TERADATA v2r6.2 thru v15.00.01, MAIN FRAME DB2 1.3 to 7.0 & UDB DB2 & CONNECT with SHARED DATA CONCEPTS, SQL SERVER, ACCESS, ORACLE 6,7, 8, 8i, 9i 10i & 11g with install experience and RAC, INFORMIX, SYBASE, DB2/400, UNIX DB2/6000, OS DB2/2, DB2/UDB, MVS/DB2, TERADATA, IMS, IDMS, M204, NATURAL, SQL SERVER, NOMAD, DBASEIII, FOCUS, MS ACCESS, and
Column: Hbase, Document: MongoDB, Key-value: Berkeley DB
Technical Experience:
Languages: Script, Scala, Python, JavaScript, Java, C++. Many others.
Operating Systems: Linux, BSD Unix variants, Macintosh, OpenVMS
Sign-on: LDAP/OpenLDAP
Linux: Debian, Linux Mint, Ubuntu, Red Hat RHEL, Fedora, CENTOS
Desktop GUI design: Java, GTK+/GNOME, QT/KDE Custom-tailored Linux kernels for; Alpha, PowerPC, Intel,
OS configuration: filesystem layouts, packaging systems. Debian
Version control and build: CVS, subversion, GIT
Web API Development DEVOPS: Postman, REST, SOAP, JSON-RPC, Repository: Confluence
Parallel APIs: MPI, PVM (from C, FORTRAN, Python)
Threading: Pthreads from C and compiled languages, Python and Ruby threads.
Network protocols: TCP/IP (e.g.: UDP, ARP, etc.), MIDI.
Primary Databases: MySQL, SQLObject, SQLAlchemy, Postgres SQL, Teradata, DB2, Oracle
Web Frameworks: Express (ExpressJS), Django, Flask
GUI Toolkits: Java/Swing, TK, GTK, GTK+, GLADE, GNOME, PyQt, QT/KDE, Wx
Amazon Web Services: AWS, EC2
Organizational and Human Factors- Management Skills:
· Tech Support: Strong skills working with technical customers to resolve issues
· Leadership: Led teams of 17 people, 9 & 21 on various projects, Managed 20 at EDS & Westinghouse
· Managing outsourced development: Managed teams of 10+ in India, England & US.
ALPHA CONSULTANTS ARE READY TO GET THE JOB DONE FOR YOU WITH REAL ROI.
We have had extensive experience in Big Data Cloud logical and physical design of the the datalake, staging area and warehouse with clients such as Bank of America, State of Minnesota, Royal Caribbean, Lockheed and with the corporate customer warehouse. This has involved the creating and supporting of the expense distribution work file tables and Business Objects, ad-hoc workloads.
A primary responsibilities is performance tuning of the warehouse which included running explains on sql code, running a macro which we wrote to gather the important performance metrics and give the results regarding compliance. Looking at global and specific issues on the system. As the a primary analyst, architect and modeler both at the bank and on other projects we have been directly involved with the design of the EDW enterprise data warehouse. ALPHA (ACT) has been deeply involved with the design of the new Bank Data Warehouse (BDW) at BoA and many, many other clients.
This, the EDW, was our single point of truth at the bank; with the design including business intelligence (BI) and encompassed activities such as Know The Customer (KTC) and anti-money laundering (AML) intelligence. At ALPHA's many projects we defined and developed DBA and EDW standards including database view design, extract translate and load (ETL) processes (from the operational data store(ODS) along with experience in Data Stage; to the raw data warehouse, then to staging, having gone through the ETL, and finally to the target). With both the view (logical data view) and ETL we were highly involved with performance such as an enterprise view with 32 joins, to 12 tables churning through 2.5 billion rows. We tuned this view from nearly 120 minutes run time to .40 seconds, primarily through collection of stats (something that was overlooked for a long time.) and some redesign. We are experts with ERwin and the LDM/PDM model process
We have extensive experience creating the warehouse from scratch on new hubs, setting up users using AZURE and/or AWS console management, establishing ROLE relationships (IAM) for authorization, and access to the database (DB) objects such as tables, view, macros, functions and procedures.
Certifications/Expertise:
· AZURE for Microsoft SaaS, PaaS & IaaS
o AZURE WITN VISUAL STUDIO.
o AZURE RESOURCE GROUPS, VIRTUAL MACHINE, RDBMS, WEB SERVER.
o SECURITY & DEVOPS.
o AZURE DATALAKE, BLOB STORAGE, TABLE STORAGE
o Streamlined the software development lifecycle with PaaS.
o Agile & expanded developer resources… Microservices (not Monolithic).
o Supported mobile and multichannel initiatives.
o Accelerated integration (SaaS and legacy modernization).
o Deliver “on the promise” of enterprise-grade implementations.
· AWS – Amazon Web Services in the Cloud
o Extensive use of:
§ AWS:
· IAM Security thru IAM Identity and Access Management in AWS
· RANGER - data security across the Hadoop platform.
· NameNode Heap Memory Management
· S3 Buckets, Simple Storage Service at CAS and Sate of MN
· Data Lake Arch/Design DMS Data Migration Services and Direct Connect
· HDFS & HIVE INGEST into S3 buckets (GLUE ETL the Data Lake) into Redshift (Warehouse)
· RedShift 4+ years at Chemical Abstract Services & State of MN
· EMR Elastic Map Reduce and EC2 Elastic Compute Cloud – Cluster
· VPC Virtual Private Cloud Utilization
· RDS Relational Data Service, Metastore
· ELB Elastic Load Balancing
· AWS Data Pipeline/Data Lake/Enterprise Warehouse Architect
o Functions Performed
§ 1. Worked directly with executives of IT and Business and principal engineers to orient the data team’s strategy with the company goals
§ 2. Lead data engineering team to design and implement data stores, pipelines, ETL routines and API access
§ 3. Lead data product teams to consume data and make it available to both internal and external customers for analysis, troubleshooting, BI, predictive use cases, etc.
§ 4. Designing and implementing Data Warehouse for large scale high volume data loads for customer action and churn analysis
§ 5. Design and implement features built on machine learning, such as customer behavior, churn, next best action, next best offer, etc.
o Experience & Skills
§ Expert level solution architecture skills in the following:
· 1. Metadata Management
· 2. Data Governance
· 3. Data Security
· 4. Big Data
· 5. Data Quality and Recovery
§ Expert level skills with hands-on experience in the following:
· 1. Migrate on-prem database to AWS S3 with AWS Database Migration Service (DMS)
· 2. Setup AWS Glue to prepare data for analysis through automated extract, transform and load (ETL) processes and load to enterprise warehouse on EMR EC2 instances.
· 3. Setup AWS Kinesis to process hundreds of terabytes per hour from high volumes of streaming data from various sources
· 4. Develop event driven data processing pipeline code and execute on AWS Lambda
· 5. Develop interactive query with AWS Athena to analyze the data in AWS S3
· 6. Setup AWS Elastic MapReduce (EMR) tool EC2 instances with Hadoop (and optionally Hive and/or Pig) installed and configured on them, to be able to process big data and perform analysis
· 7. Build and train machine learning models for predictive or analytical applications in the AWS SageMaker, with experience in creating notebook instance, prepare data, train the model from the data, deploy and evaluate the model performance
· 8. Setup data warehouse with Amazon Redshift. Experience in creating Redshift clusters, upload data sets and perform data analysis queries.
§ Years of Expertise
· · 8+ years of data architecture/admin experience
· · 3+ years of experience working with AWS DMS, S3, GLUE, KINESIS, LAMBDA, ATHENA, EMR, SAGE MAKER, Redshift, RDS
· · 5+ years on Big Data (Hadoop, Spark, Java, Scala, Python)
· · 7+ years on Metadata Management, Data Governance, quality and security.
· · 2+ years of Python development experience
· · 3+ years of Redshift experience. Understanding of relational data models is a must
· · 3+ years of experience working with big data architectures in high-volume environments
· · Extensive experience building and managing ETL pipelines on cloud based platforms from inception to production rollout
· Cloudera CDH 5.7, 5.8, 5.9,5.10 & 5.15Architect & Administration
o Cloudera Navigator 2.9
o Cloudera Director (Installation in cloud AWS)
§ KEBEROS Specialist
§ Hortonworks 2.0 Architect certified
§ MapR Administration
§ Hadoop/MapR certified
§ Enterprise NoSQL
§ SPSS v23 Scientific Data
· Microsoft SQL Ssis Deliveries:
o Support for Microsoft SQL Server Integration Services (SSIS) (2005, 2008, 2012, 2014, 2016 & 2017)
o Microsoft Business Intelligence Development Studio
o START_TLS security protocol extension support
o Utilized Strong 3DES encryption, message integrity checking, secure secret key exchange
o Provided for SOCKS firewall and proxy support
o Enabled 256-bit (strong) encryption
· UNIX ADMINISTRATION- SCRIPTS/CRONTABS
· TALEND MDM ADMIN 6.1, Configuration/Installation
· IBM certified DB2 Mainframe & Distributed Unix
· Teradata certified
My primary Big Data Platform (12 years overall) is AWS (6 yrs) along with a lot of Cloudera (4 yrs) and Azure (4yrs) Big Data architecture/Design not only data lake design and development but also a total solution architecture approach to Big Data implementation. 10 plus years’ experience in HADOOP both development and architecture with 6 plus years as an architect. My initial work included work at Berkeley 2003, on initial release of Hadoop and next with AWS, the Amazon Web Services, Cloudera Navigator 2.9, Cloudera CDH 5.7-5.10, Impala/KUDU, Cloudera Director and Hortonworks/Azure. TALEND MDM 4yrs, ETL process including scripting 14+ years, Zookeeper, HMaster, HBase database, HFile, Apache: Flume (log files)ingest 2 years, Oozie (sched. Workflow) 3 years, Sqoop (xfers data) 3 years, Python (2.7 & 3.6 w/SPSS Statistics 23 ) 5 years, Dev Tools such as Spark (with Perf, & Caching) 2 years, HBase 4 years, Pig 4 years, Analysis with: Drill (SQL) 2 years, Hive (HQL) 4 years, Mahout (Clustering, Classification, Collaborative filtering) 6 mos., additionally C & C++, and Shell Script 5 years. I have extensive use of MDM tools, and Erwin and additionally Power Designer and IBM’s ER tool. I have extensive work on Apache Hadoop which is a highly scalable storage platform designed to process very large data sets across hundreds to thousands of computing nodes that operate in parallel. Hadoop provides a cost effective storage solution on commodity hardware for large data volumes with no format requirements. Additionally, extensive work with MapReduce, the programming paradigm that allows for this massive scalability, is the heart of Hadoop. Note that the term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. Hadoop has two main components- HDFS and YARN.
· I utilized Ansible, Tower Red Hat, to scale automation, manage complex deployments and speed up the productivity at the client site for CAS. I further used to extend the power of the workflows process to streamline jobs and simple tools to share solutions with the CAS team. With Ansible, IT we were able to free admins from automating away the drudgery from the daily tasks. This Automation freed admins up to focus on efforts that help deliver more value to the business by speeding time to application delivery, and building on a culture of success. I was able to give teams the one thing they can never get enough of: time. Allowing smart people to focus on smart things.
· I used StreamSets Data Collector (SDC) is an Open Source lightweight in streams data in real time. It allowed us to configure data flows as pipelines through a web UI in few minutes. Among its many features, it makes possible to view real-time statistics and inspect data as it passes through the pipeline
· 25+ years of experience in IT systems or applications development
· 15+ years of experience architecting or delivering large scale systems on multiple platforms, with a focus on Big Data Hadoop
· Talend (4 years) utilized on several projects to simplify and automate big data integration with graphical tools and wizards that generate native code. This allowed the teams to start working with Apache Hadoop, Apache Spark, Spark Streaming and NoSQL databases right away. Talend Big Data Integration platform was utilized to deliver high-scale, in-memory fast data processing, as part of the Talend Data Fabric solution, so the project enterprise systems allowing more data into real-time decisions. It provided blazing
fast speed and scale with Spark and Hadoop, it allowed for anyone to access and cleanse big data while governing its use and allowed for optimization of big data performance in the cloud on several project.
· Graphics and statistics implementations with RStudio and R programming which is (a free and open source tool) utilized for the integrated development environment (IDE).
· Agile development experience as development team leader.
· Experience working in network operations center (NOC) administrators supervise, monitor and maintains work to maintain a telecommunications network
· Extensive Data Warehousing (Teradata, DB2, Teradata, SQL Server, MySQL & Oracle including building/implementing)
· Microsoft Azure Cloud Technologies (Dashboard & performance with web problem identification and drill-down capability)
· Hortonworks 2.0
· AWS Eco-System
· Cloudera 5.7-5.10 CDH w/Cloudera Manager
· Cloudera Navigator 2.9
· Extensive JAVA coding experience.
· Azure Data Factory to manage Batch, HDInsight, Machine Learning
· Administration tools such as Apache Zookeeper, MapReduce, YARN
· Maintain Cluster Configuration Info Cassandra, Zookeeper
· Monitor Cluster Heartbeats
· Apache: Flume (log files), Oozie (sched. Workflow), Sqoop (xfers data for relation DBs), Python (lang.), Scala (lang.), Java (lang.)
· Dev Tools: Spark (Perf, w/Caching), HBase, Pig, Shell, MongoDB
· Analysis with: Drill (SQL), Hive (HQL), Mahout (Clustering, Classification, Collaborative filtering)
· Tableau (dashboard) & Talend (MDM, mapping & Datalinage)
· MongoDB: One of the most popular document stores. It is a document oriented database. All data in Mongodb is treated in JSON/BSON format. It is a schema less database which goes over terabytes of data in database. It also supports master slave replication methods for making multiple copies of data over servers making the integration of data in certain types of applications easier and faster. MongoDB combines the best of relational databases with the innovations of NoSQL technologies, enabling engineers to build modern applications. MongoDB maintains the most valuable features of relational databases: strong consistency, expressive query language and secondary indexes. As a result, developers can build highly functional applications faster than NoSQL databases. MongoDB provides the data model flexibility, elastic scalability and high performance of NoSQL databases. As a result, engineers can continuously enhance applications, and deliver them at almost unlimited scale on commodity hardware. Full index support for high performance.
· Integrations & Migrations
· Collaboration as a consultant with Teradata Professional Services
· Collaboration as a consultant with IBM Professional Services
· Advanced Analytical solutions – IBM, Teradata, HCL
· PhD Business Psychology/ComSci Machine Learning and Artificial Intelegence… with excellent verbal and written communication and persuasion skills; able to collaborate and engage effectively with technical and non-technical resources, speaking the language of the business
· Have proven experience solving complex problems in a multi-platform systems environment
· Cloud/XaaS solutions
· Demonstrated comprehensive expert knowledge and exceptional insight into the information technology industry
· Expertise in application and information architecture / design artifacts and mechanisms
· TOGAF or Zachman with practical experience in the use of these common Architecture frameworks
· Experience HL Conceptual Models and the development, implementation, and management of Enterprise Data Models, Data Architecture Strategies, Delivery Roadmaps, Information Lifecycle Management, and Data Governance capabilities
· PhD Psychology with minor in Computer Science
· Encryption tools such as Protegrity and in depth understanding of security legislation that affects our businesses, including, but not limited to Sarbanes-Oxley, Payment Card Industry regulations, Customer Data Protection regulations and contemporary security legislation activities that may impact future plans
· Significant experience with three or more of the following technologies: Teradata, Tableau, Cognos, Oracle, SAS, Hadoop, Hive, SQL Server, DB2, SSIS, Essbase, Microsoft Analysis Services
Databases/Platforms:
Hadoop 1.0 & 2.0 Impala/KUDU, HBase, MongoDB, MySQL, SQLObject, SQLAlchemy, Postgres SQL, TERADATA v2r6.2 thru v15.00.01, MAIN FRAME DB2 1.3 to 7.0 & UDB DB2 & CONNECT with SHARED DATA CONCEPTS, SQL SERVER, ACCESS, ORACLE 6,7, 8, 8i, 9i 10i & 11g with install experience and RAC, INFORMIX, SYBASE, DB2/400, UNIX DB2/6000, OS DB2/2, DB2/UDB, MVS/DB2, TERADATA, IMS, IDMS, M204, NATURAL, SQL SERVER, NOMAD, DBASEIII, FOCUS, MS ACCESS, and
Column: Hbase, Document: MongoDB, Key-value: Berkeley DB
Technical Experience:
Languages: Script, Scala, Python, JavaScript, Java, C++. Many others.
Operating Systems: Linux, BSD Unix variants, Macintosh, OpenVMS
Sign-on: LDAP/OpenLDAP
Linux: Debian, Linux Mint, Ubuntu, Red Hat RHEL, Fedora, CENTOS
Desktop GUI design: Java, GTK+/GNOME, QT/KDE Custom-tailored Linux kernels for; Alpha, PowerPC, Intel,
OS configuration: filesystem layouts, packaging systems. Debian
Version control and build: CVS, subversion, GIT
Web API Development DEVOPS: Postman, REST, SOAP, JSON-RPC, Repository: Confluence
Parallel APIs: MPI, PVM (from C, FORTRAN, Python)
Threading: Pthreads from C and compiled languages, Python and Ruby threads.
Network protocols: TCP/IP (e.g.: UDP, ARP, etc.), MIDI.
Primary Databases: MySQL, SQLObject, SQLAlchemy, Postgres SQL, Teradata, DB2, Oracle
Web Frameworks: Express (ExpressJS), Django, Flask
GUI Toolkits: Java/Swing, TK, GTK, GTK+, GLADE, GNOME, PyQt, QT/KDE, Wx
Amazon Web Services: AWS, EC2
Organizational and Human Factors- Management Skills:
· Tech Support: Strong skills working with technical customers to resolve issues
· Leadership: Led teams of 17 people, 9 & 21 on various projects, Managed 20 at EDS & Westinghouse
· Managing outsourced development: Managed teams of 10+ in India, England & US.
ALPHA CONSULTANTS ARE READY TO GET THE JOB DONE FOR YOU WITH REAL ROI.