Don’t you think that the amount of Big Data technologies in Azure is very confusing? I’ve distilled some information from the official Microsoft Azure blog so it’s easier to read. Before we begin, here is a potted Hadoop history:
This cheatsheet contains a high level descriptions of the tools, APIs, SDKs, and technologies that you’ll see in Azure. Together, they are used in tandem with big data solutions, and they include proprietary Azure and open source technologies.
I hope that this cheat sheet will help you to more easily identify the tools and technologies you should investigate, depending on the function.
|Data consumption||Extracting and consuming the results from Hadoop-based solutions.||Azure Intelligent Systems Service (ISS), Azure SQL Database,LINQ to Hive, Power BI, SQL Server Analysis Services (SSAS),SQL Server Database Engine,SQL Server Reporting Services (SSRS)|
|Data ingestion||Extracting data from data sources and loading it into Hadoop-based solutions||Aspera, Avro, AZCopy, Azure Intelligent Systems Service (ISS), Azure Storage Client Libraries, Azure Storage Explorer, Casablanca,Cloudberry Explorer, CloudXplorer, Cross-platform Command Line Interface (X-plat CLI),File Catalyst, Flume, Hadoop Command Line, HDInsight SDK and Microsoft .NET SDK for Hadoop, Kafka, PowerShell,Reactive Extensions (Rx), Signiant,SQL Server Data Quality Services (DQS),SQL Server Integration Services (SSIS),Sqoop,Storm,StreamInsight,Visual Studio Server Explorer|
|Data processing||Processing, querying, and transforming data in Hadoop-based solutions||Azure Intelligent Systems Service (ISS),Hcatalog, Hive,LINQ to Hive, Mahout,Map/reduce, Phoenix, Pig, Reactive Extensions (Rx), Samza, Solr,SQL Server Data Quality Services (DQS),Storm,StreamInsight|
|Data transfer||Transfer data between Hadoop and other data stores such as databases and cloud storage.||Falcon,SQL Server Integration Services (SSIS)|
|Data visualization||Visualizing and analyzing the results from Hadoop-based solutions.||Azure Intelligent Systems Service (ISS), D3.jx, Microsoft Excel, Power BI, Power Map, Power Query, Power View, PowerPivot|
|Job submission||Processing jobs in Hadoop-based solutions.||HDInsight SDK and Microsoft .NET SDK for Hadoop|
|Management||Manage and monitor Hadoop-based solutions.||Ambari, Azure Storage Client Libraries, Azure Storage Explorer, Cerebrata Azure Management Studio, Chef, Chukwa, CloudXplorer, Ganglia, Hadoop command line,Knox,Azure Management Portal, Azure SDK for Node.js,Puppet, Remote Desktop Connection, REST APIs,System Center management pack for HDInsight,Visual Studio Server Explorer|
|Workflow||Creating workflows and managing multi-step processing in Hadoop-based solutions.||Azkaban, Cascading, Hamake, Oozie,SQL Server Integration Services (SSIS)|
Any questions, please get in touch at email@example.com