I am dish dish ๐Ÿฅฌ, a person dish addiction big Internet practitioner ๐ŸŽ‰

Hard work is not too fierce, but constant ๐ŸŽˆ

Finish this article at ๐Ÿ“–

You will be:

  • Understand how data from multiple sources are structured uniformly
  • Learn more about the template method pattern
  • Have a better understanding of simple factories (not one of the design patterns, but still used a lot)
  • Know how design patterns are used in combination

preface

The purpose of this paper is to record and share the problem of structuring and unifying data from various sources when I was doing ETL business. This article covers the factory pattern and template method pattern of 23 design patterns. For those of you who are not familiar with these two patterns, check out my previous articles on them:

๐Ÿ‘‰ Template method mode ๐Ÿ‘ˆ ๐Ÿ‘‰ Factory mode ๐Ÿ‘ˆ

A business that

In the message queue, there are material data of various platforms, but the material data of each platform may be slightly different. We need to process these data, and finally output a unified data format for downstream calculation and processing. At the same time, it should also have good expansion, so as to provide us with follow-up expansion of material sources.

I drew the following diagram to help you understand the business scenario

Data extraction

First, the result of data extraction transformation is to obtain a uniform data structure of the object, but is a variety of sources. So we can use the factory pattern for decoupling. I’m going to use the simple Factory pattern here, which is not one of the 23 design patterns, but is relatively simple to use.

Although the data are from different sources, there are more or less the same attributes. We can set these common attributes in a unified manner and process different attributes separately, which is also in line with the application scenario of our template method mode.

The concrete class is designed as follows:

The preparatory work

Introducing Maven dependencies:

<dependencies>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <version>1.18.22</version>
    </dependency>
    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.78</version>
    </dependency>
</dependencies>
Copy the code

This article focuses on using design patterns to solve real business problems, so other dependencies such as Kafka and Flink are not introduced

Function implementation

MediaEnum

package enums;

import lombok.Getter;

/ * * *@authorCooking food *@date2022/1/9 you *@descriptionMaterial enumeration */
@Getter
public enum MediaEnum {
    UNKNOWN("Unknown".0."unknown"),
    FIRST_TRANSFORM("Source one".1."first_transform"),
    SECOND_TRANSFORM("Source 2".2."second_transform"),
    THIRD_TRANSFORM("Source 3".3."third_transform");
    private String name;
    private int code;
    private String alias;

    MediaEnum(String name, int code, String alias) {
        this.name = name;
        this.code = code;
        this.alias = alias;
    }

    public static MediaEnum create(String name) {
        for (MediaEnum media : values()) {
            if (media.name.equals(name)) {
                returnmedia; }}returnUNKNOWN; }}Copy the code

KafkaData

package entity;

import com.alibaba.fastjson.annotation.JSONField;
import lombok.Data;

import java.time.LocalDateTime;
import java.util.List;
import java.util.Map;

/ * * *@authorCooking food *@date 2022/1/9 22:00
 * @descriptionKafka raw data */
@Data
public class KafkaData {
    /** ** title */
    private String title;

    /** * description */
    private String description;

    /** ** the author */
    private String author;

    /** * category */
    private List<String> tags;

    /** ** *@link enums.MediaEnum}
     */
    private String mediaName;
    
    /** * material path */
    private String ossPath;

    /** * crawl time */
    @JSONField(name = "spider_time", format = "yyyy-MM-dd HH:mm:ss")
    private LocalDateTime spiderTime;

    /** * The original field */
    @JSONField(name = "raw_data")
    private Map<String, Object> rawData;

}
Copy the code

The rawData attribute records the rawData of the material, while other attributes are actually extracted from rawData. However, data from all sources have these attributes, so they are extracted.

Material

package entity;

import lombok.Data;

import java.time.LocalDateTime;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/ * * *@authorCooking food *@date* 2022/1/9 hour@descriptionUnified structured data */
@Data
public class Material {
    /** ** title */
    private String title;

    /** * description */
    private String description;

    /** ** the author */
    private String author;

    /** ** tag */
    private List<String> tags;

    /** ** *@link enums.MediaEnum}
     */
    private Integer media;
    
    /** * material path */
    private String ossPath;

    /** * crawl time */
    private LocalDateTime spiderTime;

    /** * md5 */
    private String md5;

    /** * cover map path */
    private String cover;

    /** ** Length of material */
    private Integer duration;

    /**
     * ็ด ๆๅฎฝๅบฆ
     */
    private Integer width;

    /** * Material length */
    private Integer height;

    /** * Material format */
    private String format;

    /** ** extension */
    private Map<String, Object> expand = new HashMap<>();
}
Copy the code

TransformMetaData

package meta;

import entity.KafkaData;
import entity.Material;
import enums.MediaEnum;
import lombok.Data;

/ * * *@authorCooking food *@date 2022/1/9 22:32
 * @descriptionTransform metadata */
@Data
public abstract class TransformMetaData {
    /**
     * ่Žทๅ–็ปŸไธ€ๆ•ฐๆฎ
     */
    public final Material getMaterial(KafkaData source) {
        if(source ! =null) {
            Material material = new Material();
            material.setTitle(source.getTitle());
            material.setDescription(source.getDescription());
            material.setAuthor(source.getAuthor());
            material.setTags(source.getTags());
            MediaEnum mediaEnum = MediaEnum.create(source.getMediaName());
            material.setMedia(mediaEnum.getCode());
            material.setSpiderTime(source.getSpiderTime());
            material.setOssPath(source.getOssPath());
            // Populate the data
            this.fill(source, material);
            return material;
        }
        return null;
    }

    /**
     * ๅกซๅ……ๆ•ฐๆฎ
     */
    public abstract void fill(KafkaData source, Material material);
}

Copy the code

Here the template method pattern is used, putting the same part of the code in the getMaterialBo method of the abstract parent class, and different parts of the code in the fill method of different subclasses

FirstTransform

package meta;

import entity.KafkaData;
import entity.Material;

/ * * *@authorCooking food *@date 2022/1/9 22:52
 * @descriptionA * / source
public class FirstTransform extends TransformMetaData{
    @Override
    public void fill(KafkaData source, Material material) {}}Copy the code

SecondTransform

package meta;

import entity.KafkaData;
import entity.Material;

/ * * *@authorCooking food *@date 2022/1/9 22:52
 * @descriptionSource 2 * /
public class SecondTransform extends TransformMetaData{
    @Override
    public void fill(KafkaData source, Material material) {}}Copy the code

ThirdTransform

package meta;

import entity.KafkaData;
import entity.Material;

/ * * *@authorCooking food *@date 2022/1/9 22:52
 * @descriptionThree * / source
public class ThirdTransform extends TransformMetaData{
    @Override
    public void fill(KafkaData source, Material material) {}}Copy the code

TransformFactory

package meta;

import entity.KafkaData;
import entity.Material;
import enums.MediaEnum;
import lombok.extern.slf4j.Slf4j;

import java.io.InputStream;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.Set;

/ * * *@authorCooking food *@date 2022/1/9 22:56
 * @descriptionData conversion factory */
@Slf4j
public class TransformFactory {
    private static Map<String, TransformMetaData> map = new HashMap();

    private final static String PROPERTIES_NAME = "transform.properties";

    /** * Initial chemical class */
    static {
        Properties p = new Properties();
        InputStream is = TransformFactory.class.getClassLoader().getResourceAsStream(PROPERTIES_NAME);
        try {
            p.load(is);
            // Iterate over the Properties collection object
            Set<Object> keys = p.keySet();
            for (Object key : keys) {
                // Get value by key (full class name)
                String className = p.getProperty((String) key);
                // Get the bytecode objectClass clazz = Class.forName(className); TransformMetaData obj = (TransformMetaData) clazz.newInstance(); map.put((String) key, obj); }}catch (Exception e) {
            log.error("Data conversion class load failed", e); e.printStackTrace(); }}/** * Get the converted object ** based on the media name@return* /
    public static Material getMaterial(KafkaData source) {
        if (source == null || source.getMediaName() == null) {
            log.error("Source is empty or source contains no source,data=[{}]", source);
            return null;
        }
        MediaEnum mediaSource = MediaEnum.create(source.getMediaName());
        TransformMetaData transformMetaData = map.get(mediaSource.getAlias());
        if (transformMetaData == null) {
            log.error("Data =[{}]", source.getMediaName());
            return null;
        }
        returntransformMetaData.getMaterial(source); }}Copy the code

We then need to create a transform.properties file to initialize the factory class

first_transform=meta.FirstTransform
second_transform=meta.SecondTransform
third_transform=meta.ThirdTransform
Copy the code

Client

import entity.KafkaData;
import entity.Material;
import meta.TransformFactory;

/ * * *@authorCooking food *@date 2022/1/9 23:23
 */
public class Client {
    public static void main(String[] args) {
        // I will not set the value here
        KafkaData kafkaData = new KafkaData();
        Material material = TransformFactory.getMaterial(kafkaData);
        // This is the unified format of the data}}Copy the code

For materials from different sources, we can handle them in the fill method under the corresponding Transform class. The transform.properyties file is easily expanded by adding corresponding rules. The properties key names are known in line 60 of the TransformFactory class. Add a Transform class that inherits TransformMetaData