Big data environment built by Docker, one-click start and stop

Code does not move, environment first

I am a Docker fan. When I was learning technologies related to big data, I had an idea:

Build a big data development environment with Docker!

What’s the good of that?

As long as I have the Docker-comemage. yml container orchestration description file, I can start my big data environment on any machine that has Docker installed. Isn’t something we programmers do every day and strive for once and for all?

How to do?

Searched the country’s blogs and posts, but no suitable answer. I had to do it myself.

docker hub

First I go to docker Hub. This is github’s Version of Docker. I searched for Hadoop, Spark, and so on, and found a company;

The company makes almost all big data components into Docker images. And it’s fine-grained, role-by-role. It’s really great. For example, the image you see here is the Docker image he made for the role namenode in Hadoop. It’s especially easy if you do some encapsulation and personalization on top of it.

So I looked for the big data component I wanted from Registry

Hadoop
Hive
Spark

Easy, found them all.

The virtual machine

We need to install docker in the virtual machine. What needs a virtual machine? Here I say, install a virtual machine, Windows all kinds of inconvenient. (MAC friends can float).

Virtual machine I use virtual box, install Ubuntu. Then I started installing Docker. Installing Docker also requires installing its twin, Docker-compose

docker-compose.yml

Docker-compose makes it easy to compose docker containers. Docker-comemage. yml documents how to orchestrate. It’s a description file! Here is the docker-comemage. yml file for my big data environment!

version: '2'Services: Namenode: image: bde2020/hadoop-namenode: 1.0-hadoop2.8-java8 Container_name: Namenode volumes: - ./data/namenode:/hadoop/dfs/name environment: - CLUSTER_NAME=testenv_file: - ./hadoop-hive.env ports: - 50070:50070 - 8020:8020 datanode: image: Bde2020 / Hadoopanode: 1.0-hadoop2.8- Java8 Depends_on: -namenode Volumes: - ./data/datanode:/hadoop/dfs/data env_file: - ./hadoop-hive.env ports: - 50075:50075 hive-server: image: Bde2020 /hive:2.1.0 -postgresql-metaStore container_name: hive-server env_file: -./hadoop-hive. Env environment: -"HIVE_CORE_CONF_javax_jdo_option_ConnectionURL=jdbc:postgresql://hive-metastore/metastore"    ports:      - "10000:10000"Hive-metastore: image: bDE2020 /hive:2.1.0-postgresql-metastore container_name: hive-metastore env_file: - ./hadoop-hive.envcommand: /opt/hive/bin/hive --service metastore ports: - 9083:9083 hive-metastore-postgresql: image: Volumes: bDE2020 / hive-metastore-PostgresQL: 2.0 ports: -5432 :5432 Volumes: - ./data/postgresql/:/var/lib/postgresql/data spark-master: image: Bde2020 /spark-master:2.1.0-hadoop2.8-hive-java8 container_name: spark-master ports: -8080:8080-7077:7077 ENv_file: Env spark-worker: image: bde2020/spark-worker: 2.0-hadoop2.8 -hive-java8 Depends_on: -./hadoop-hive. Env spark-worker: image: bde2020/spark-worker: 2.0-hadoop2.8 -hive-java8 Depends_on: - spark-master environment: - SPARK_MASTER=spark://spark-master:7077 ports: -"8081:8081"Env_file: -./hadoop-hive.env mysql-server: image: mysql:5.7 container_name: mysql-server ports: -"3306:3306"environment: - MYSQL_ROOT_PASSWORD=zhangyang517 volumes: - ./data/mysql:/var/lib/mysql elasticsearch: image: Elasticsearch :6.5.3 Environment: - Discovery. type=single-node ports: -"9200:9200"      - "9300:9300"Networks: -es_network Kibana: image: Kibana :6.5.3 ports: -"5601:5601"    networks:       - es_networknetworks:  es_network:    external: trueCopy the code

Then I needed ElasticSearch and Kibana, so I added them. It’s really convenient. The most important thing is that he can easily share with your buddy, good gay friend.

Next we need to write a start script, a stop script. In this way, you can realize one-button start and stop. run.sh

#! /bin/bashdocker-compose -f docker-compose.yml up -d namenode hive-metastore-postgresqldocker-compose -f docker-compose.yml up -d datanode hive-metastoresleep 5docker-compose -f docker-compose.yml up -d hive-serverdocker-compose -f docker-compose.yml up -d spark-master spark-workerdocker-compose -f docker-compose.yml up -d mysql-server#docker-compose -f docker-compose.yml up -d elasticsearch#docker-compose -f docker-compose.yml up -d kibanamy_ip=`ip route get 1|awk '{print $NF; exit}'`echo "Namenode: http://${my_ip}:50070"echo "Datanode: http://${my_ip}:50075"echo "Spark-master: http://${my_ip}:8080"Copy the code

stop.sh

#! /bin/bashdocker-compose stopCopy the code

Take a look at the effect:

It started successfully. verify

Big data environment built by Docker, one-click start and stop

docker hub

Related Posts

Ceph operations: How to harness open source distributed storage?

AVL self-balancing binary search tree and Java implementation

Google’s new rules: Web sites must support mobile devices, not PCS