Hadoop, hive, sqoop, hbase, introduction of hue

Body:

Hadoop distributed computing/storage, open source framework

1.1 Core Components

1.1.1 HDFS: Distributed storage

1.1.2 YARN: Resource management

1.1.3 MapReduce: Distributed Computing

2. Hive query analysis [SQL on Hadoop] data warehouse tool

2.1 Data in data warehouse has these two characteristics

2.1.1 Most Complete Historical Data (massive)

2.1.2 Relatively stable: It means that data warehouse is different from business system database, and data is frequently updated. Once entered into data warehouse, data will rarely be updated and deleted, but will only be queried in large quantities

2.2 Tables in Hive are purely logical tables, which are only table definitions, that is, table metadata. Hive does not store data. It relies entirely on HDFS and MapReduce. This allows you to map structured data files to a database table, provide SQL queries, and eventually convert SQL statements to MapReduce jobs for execution. Hive query analysis consumes a certain amount of time and is not suitable for real-time query scenarios.

3. Data synchronization tool in SQoop big data

Mainly used in Hadoop(Hive) with traditional databases (mysql, PostgresQL…) Data can be transferred from a relational database (such as MySQL,Oracle, and Postgres) to the HDFS of Hadoop, or from the HDFS to a relational database.

4. Hbase is a distributed, scalable storage of big data based on Hadoop. HBase can be used for real-time query.

Hue CDH A web manager that consists of the Hue UI, Hue Server, and Hue DB. Hue provides shell interfaces for all CDH components. Hue allows you to write AN Mr, view and modify HDFS files, manage Hive metadata, run Sqoop, and write Oozie workflows.

HUE official address http://gethue.com/

Default based on lightweight SQLite database management session data, user authentication and authorization, can be customized for MySQL, Postgresql, And Oracle based on File Browser to access HDFS based on Hive editor to develop and run Hive query support Solr based search applications, and provide visual data view. And Dashboard support for Impala based applications for interactive queries Spark editor and Dashboard support for Pig editor and the ability to submit script tasks Oozie editor, Workflow, Coordinator, and bundles can be submitted and monitored using dashboards. HBase browsers are used to visualize data, query data, and modify HBase tables. Metastore browsers are used to access Hive metadata. HCatalog supports Job browser, access MapReduce jobs (MR1/MR2-YARN) support Job designer, Sqoop 2 editor and Dashboard Support ZooKeeper browser and editor Support MySql, PostGresql, Sqlite and Oracle database query editor

End of the text.

Improper place also please don’t hesitate to give advice.

Hadoop, hive, sqoop, hbase, introduction of hue

Related Posts

A problem caused by too long a line of characters in Python source files

This is the nuggets we love! — Remember a brand survey

Xcodebuild error after upgrade Xcode12.1?