OpenLooKeng, Make Big Data Simplified

OpenLooKeng is an open source and efficient data virtualization analysis engine. In the process of using openLooKeng, a member of the community encountered a data query problem that was not synchronized. In response, the friend was very helpful and offered a solution to the community. We appreciate his support and hope this blog is helpful to others.

Welcome to openLooKeng official website openLookeng. IO

Community code repository gitee.com/openlookeng

The current sharing

The problem

OpenLooKeng’s queries are out of sync and need to be fixed.

After analysis, the query reads the Cache. In Active/Active Mode, when one node modifies Metastore, the other node is not notified, so the cache is not invalidated. The cache in AA mode is not synchronized.

The solution

Currently, openLooKeng has only Guava caching mode. In asynchronous mode, data is not synchronized. Solutions:

  1. With Redis, Redis is pretty good as a distributed cache. Supports many data types and cluster mode. But this approach introduces new technologies that can make deployment difficult. Minimize deployment and try not to introduce third-party dependent services.
  2. Considering that the service already has Hazelcast, you can consider using Hazelcast as a cache. Hazelcast is now used as a distributed Cache, while retaining the old Guava Cache.

Users can choose between two sets of caches, one local and one distributed. The architectural pattern is as follows:

Read/write policy modes are as follows:

Cache Aside Pattern, Delete the existing Cache when writing the database. Cache Aside Pattern can effectively avoid concurrency problems.

When a write operation occurs, suppose that flushing the Cache as a common way of handling the Cache, there are two choices: (1) write the database first, then flush the Cache (2) write the Cache first, then write the database

Let’s assume that there are two concurrent operations, one is an update operation and the other is a query operation. After the update operation deletes the cache, the query operation does not hit the cache. The old data is read out first and then put into the cache. As a result, the data in the cache is still old, causing the data in the cache to be dirty and remain so. So this design is wrong and not recommended.

First of all, there is no operation to delete the Cache data, but first update the database data, the Cache is still valid, so the concurrent query operation is not updated data, but the update operation immediately invalidate the Cache. Subsequent queries pull the data out of the database.

Hazelcast learning

Hazelcast as a distributed mechanism, Hazelcast’s Imap can be used as a distributed cache. The important thing to note here is that since Hetumetastore stores six caches, you need to instantiate each cache. You can’t use one set.

IMap<Integer, List<String>> clusterMap1 = instance.getMap("MyMap1");
IMap<Integer, List<String>> clusterMap2 = instance.getMap("MyMap2"); .Copy the code

Learn about Hazlcast, please see another article: Hazelcast is really an interesting thing www.chkui.com/article/haz…

If this web address doesn’t work. Reference the my.oschina.net/chkui/blog/…

The github address involved: github.com/dragonetail…

This version is older and needs to be updated to the Hazelcast version. You can run it locally to see how Hazelcast works. In between, we also learned the use of Inject, an interesting injection method that is relatively fast when an interface has multiple implementation classes. I’ll talk more about the injection method later.

The development process is relatively simple, just add a set of caching mode. The cache model is consistent with Guava’s interface, as is the code, which is quite redundant if redis cache is needed later.

Inject learning

How to specify dynamically when an interface has multiple implementation classes. There are three approaches to this problem. (1) @service injection. Specify the name of the bean, like this:

@Service("s1")
public class TestServiceImpl1 implements ITestService {
    @Override
    public void test(a) {
        System.out.println(Interface 1 implementation class...); }}Copy the code
@Service("s2")
public class TestServiceImpl2 implements ITestService {
    @Override
    public void test(a) {
        System.out.println(Interface 2 implementation class...); }}Copy the code

(2) Strategy design pattern, define a Map set, and then put all the implementation classes into this set, and then perform different operations according to the current membership type. Reference code specific reference: blog.csdn.net/qq_42087460…

public class DisCountStrageService {
    Map<String,DiscountStrategy> discountStrategyMap = new HashMap<>();
    // The constructor, if you are a collection interface object, will grab all the subclasses of that interface from the Spring containerPut it into the collectionpublic DisCountStrageService(List<DiscountStrategy> discountStrategys){
        for (DiscountStrategy discountStrategy: discountStrategys) {
discountStrategyMap.put(discountStrategy.getType(),discountStrategy); }}public double disCount(String type,Double fee){
        DiscountStrategy discountStrategy =discountStrategyMap.get(type);
        return discountStrategy.disCount(fee); }}Copy the code

Inject is a small framework of Guice, which uses Binder to Inject concrete implementation classes, implements moDUL interfaces and overrides configure methods.

public class JdbcMetastoreModule
        implements Module
{
    @Override
    public void configure(Binder binder)
    {
        configBinder(binder).bindConfig(JdbcMetastoreConfig.class);
        configBinder(binder).bindConfig(HetuMetastoreCacheConfig.class);
  binder.bind(HetuMetastore.class).annotatedWith(ForHetuMetastoreCache.class)
            .to(JdbcHetuMetastore.class).in(Scopes.SINGLETON);
​
  binder.bind(HetuMetastore.class).to(HetuMetastoreCache.class).in(Scopes.SINGLET
ON);
        newExporter(binder).export(HetuMetastore.class)
            .as(generator ->
generator.generatedNameOf(HetuMetastoreCache.class)); }}Copy the code

refactoring

Reconstruction roadmap: A Cache interface needs to be provided externally, and the implementation class needs to be enabled and changed by configuration items. This Cache interface can implement Guava, can implement Hazelcast.

Overall structural changes:

Reconstruction difficulty 1:

Two sets of caches are solved using the HetuMetastoreCache interface. Solution: Use generics, use generics, use generics.

Reconstruction Difficulty 2:

The exposed interfaces need to be HetuMetastoreCache, and the declared variables need to be initialized. Solution: Be good with inheritance Extends; The Guava Cache and Hazelcast Cache inherit the HetuMetastoreCache interface. This uses Cache variables and Hetucache. There are two modes for Hetucache: Guava’s Cache and Hazelcast’s Map.

The actual implementation structure is as follows:

Now the advantage of the structure is that only HetumetastoreCache is exposed externally. This is easy to expand, if the later need to implement redis Cache, only need to implement later.

Solutions to some problems

  1. Explicit Bindings are required due to misunderstanding of Inject usage. Java Guice, if you need to use @inject, then all arguments in your constructor need to implement binding.

  2. Since Hazelcast version 4.0.3 relies on serialization, it needs to define its own serialization, which involves the serialization of Optional, but there are some problems with Option serialization. Fortunately, the 4.2 release of the official documentation has improved support for Optional. Check out the implementation of the source code, we can check it out.

conclusion

Look at the source code, to good friends and colleagues to consult, to solve the problem Angle will let a person suddenly. Refactoring really requires a high level of language mastery. In this project, I have deepened my weak Knowledge of Java.

– END –

Author: Liu Shihong

Please contact openLooKeng assistant to reprint this article