Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

Open source Spark SQL does not support high availability, but high availability in real applications is significant for users. ZTE's big data platform DAP implements the high availability of Spark SQL in the corresponding ZDH.

The high availability of Spark SQL is to register the SQL when the two Spark SQL services are online. The JDBC URL of the user connection is specified as the Zookeeper list. When connecting, the Spark SQL node information is obtained through the ZooKeeper cluster, and then connected to the Spark SQL service. node.

Spark SQL metadata dual master is mainly implemented in MySQL. MySQL supports one-way and asynchronous replication. One server acts as the primary server and one or more other servers act as slave servers during the replication process. The primary server writes updates to the binary log file and maintains an index of the log files to track the log loop. Receive any updates from the server that have occurred since then, then block and wait for the primary server to notify the next update.

In the actual project, two MySQL databases are installed on the hosts distributed in different places. The two servers are active and standby. When one of the machines fails, the other can take over the application on the server. This requires two. The data of the database should be consistent in real time. Here, the synchronization function of MySQL is used to realize synchronous replication of the two machines.

Implementation plan

Currently, the SparkSQL node accesses the dual-master metabase mainly considering two options:

The SparkSQL node connects directly to the MySQL node:

In the following figure, the SparkSQL nodes are connected to a single MySQL node. The changes made to the metabase by different SparkSQL nodes are synchronized between the MySQL nodes.

Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

The SparkSQL node connects to the metabase through the MetaStore node:

In the following figure, the SparkSQL nodes are connected to multiple MetaStore nodes. Each MetaStore node is connected to the corresponding MySQL node. The changes to the metabase of different SparkSQL nodes are synchronized between the MySQL nodes.

Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

In the above two SparkSQL nodes accessing the dual-master metabase scheme, the way the client obtains the SparkSQL service is the same, mainly through the following methods:

Beeline connection

The program is accessed through the JDBC port

The Beeline method first obtains SparkSQL node information through the Zookeeper cluster and then connects to the SparkSQL service node. When the connected SparkSQL node is abnormal, you can get the SparkSQL service by retrying it several times.

If the program connects to the corresponding SparkSQL node through the JDBC port, if the SparkSQL node is connected with an exception, you can reacquire the SparkSQL service by performing exception capture in the code.

The following is mainly to verify the functional feasibility and abnormal conditions of the two schemes.

test environment

MySQL: 10.43.183.121 and 10.43.183.122 two hosts

SparkSQL: 10.43.183.121 and 10.43.183.122 two hosts

Hive MetaStoreServer: 10.43.183.121 and 10.43.183.122 two hosts

testing scenarios

Scenario 1: SparkSQL node directly connects to MySQL high availability verification

Each SparkSQL node is directly connected to a MySQL node. Verify that the metadata is successfully synchronized and that the MySQL node failure can be automatically switched.

The test steps are as follows:

1. Modify the configuration

The SparkSQL configuration is modified as follows:

Analysis of two main schemes for SparkSQL nodes accessing dual-master metabases

10.43.183.121 corresponding JDBC connection configuration is MySQL on 10.43.183.121

10.43.183.122 corresponds to the JDBC connection configured as MySQL on 10.43.183.122

2. Beeline connects to SparkSQL at 10.43.183.121.

3. Create a table test to find the tbls table of the two MySQL hiveomm databases, and you can see the test record. Indicates that the metadata synchronization is successful.

4. Stop the MySQL that SparkSQL is currently connected to.

5. The Beeline interface executes the “show tables” command to query for exceptions.

6. Disconnect the Beeline connection and reconnect the SparkSQL of the 10.43.183.121 node multiple times. The connection is abnormal.

7. Connect the SparkSQL service with the SQL URL! connectjdbc:hive2://10.43.183.121:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=sparkThriftServer Retry several times to connect to the SparkSQL service. You can check the test table with the "show tables" command. .

8. Start the MySQL node, and Beeline reconnects to the 10.43.183.121 node and can connect to the SparkSQL node. Run the show tables command to query the test table information.

Test conclusion:

Metadata between MySQL can be synchronized.

Hanging the MySQL node will cause Beeline to fail to query.

Beeline reconnect cannot connect to the corresponding SparkSQL node.

Beeline connects to the SparkSQL service via a SQL URL and can connect to an available SparkSQL node after a certain number of attempts.

Scenario 2: The SparkSQL node connects to the MySQL High Availability Verifier through the HiveMetaStoreServer node.

The MetaStoreServer node is mainly used for fault tolerance when the MySQL node fails. Each MetaStoreServer node corresponds to one MySQL node, and each SparkSQL node is configured with multiple MetaStoreServer nodes. Verify that the metadata is successfully synchronized and that the MySQL node failure can be automatically switched.

The test steps are as follows:

1. Modify the configuration

XPON WIFI ONU

Wi-Fi Internet access can be simply understood as wireless Internet access. Almost all smart phones, tablets and laptops support Wi-Fi Internet access, which is the most widely used wireless network transmission technology today. In fact, it is to convert the wired network signal into a wireless signal, just like the introduction to everyone at the beginning, use a wireless router for the relevant computers, mobile phones, tablets, etc. that support its technology to receive. If the mobile phone has Wi-Fi function, when there is Wi-Fi wireless signal, it can go online without going through the network of China Unicom, saving the data charge.
Wireless network wireless Internet access is more commonly used in big cities. Although the wireless communication quality transmitted by Wi-Fi technology is not very good, the data security performance is worse than Bluetooth, and the transmission quality needs to be improved, but the transmission speed is very fast, which can reach 54Mbps. Personal and social information needs. The main advantage of Wi-Fi is that it does not require wiring and can not be restricted by wiring conditions, so it is very suitable for the needs of mobile office users, and because the transmit signal power is less than 100mw, which is lower than the mobile phone transmit power, Wi-Fi Internet access is relatively It is also the safest and most healthy.
However, the Wi-Fi signal is also provided by the wired network, such as the ADSL at home, the broadband of the residential area, etc. As long as a wireless router is connected, the wired signal can be converted into a Wi-Fi signal. Many cities in developed countries abroad are covered with Wi-Fi signals provided by the government or large companies for residents to use. There are also many places in my country that implement "wireless city" projects to promote this technology. In pilot cities where 4G licenses have not been issued, many places use 4G to Wi-Fi for citizens to try.

XPON WIFI ONU, GEPON WIFI ONU, XPON Router ONU,WIFI ONU XPON, ONU XPON WIFI

Shenzhen GL-COM Technology CO.,LTD. , https://www.szglcom.com