The business scenario

The project uses MongoDB to upload data from monitoring equipment for storage, and the frequency of data reaches the second/minute level. The monthly minute data of a single device reaches about 60 * 24 * 30= 43,200. Usually, a project can access 500 to 1000 devices, and the monthly data amount is 2KW ~ 4KW. With the operation of the project, the amount of single table data is too large, and the business operation reaches the performance bottleneck.

plan

Coarse and fine-grained data

For non-fine-grained data, such as hourly data and daily data, the data volume in a single table is estimated, and the data volume can be controlled within 100 million levels within the project life cycle. The data is not divided into tables, and the time field redundancy design is adopted. For example, there will be time fields formatted by hour,day, Month,year, etc. Create a unique index for the hourly data set based on the device ID and day. The index information is as follows:

# index information {"deviceId" : 1."day" : 1
}
Copy the code

In this way, the index amount is equal to the total number of pieces in the hourly data set /24, which greatly reduces the memory overhead of MongoDB. When we query the device code as Device01 and the time is 2021-08-12 and 18 hours of data, we get day=2021-08-12 according to the hour time. The query conditions are as follows:

# query condition {"deviceId": "device01"."day": "2021-08-12"."hour" :"The 2021-08-12 18"
}
Copy the code

The query time is applied to the index, the response speed is less than 10ms

Fine-grained data

With the production and operation of the project, the data volume of a single table is too large. According to the estimation of the reference business scenario, the monthly data volume is 2KW ~ 4KW, and the annual data volume of 1000 devices can be controlled within 500 million. However, MongoDB has problems of lock library (low version) and lock table (middle version). Even if there is a sharding for storing files, it will cost a lot in the case of index building and index query. Therefore, our goal is to control the data volume of a single table below 100 million levels and support more device project access. Therefore, we will store the data in different tables by month.

Graph TD Protocol service --> Data processing --> storage service
  • Protocol services receive devices to upload data, decode, encapsulate, and generate with data processing services
  • Data processing Services are processed for data and time fields are added (for example, hour, day…).
  • The storage service processes the data according to the data type before storage. The target table of the data is realized in different tables

Table implementation

Table is stored

Minute_data_yyyyMM example :minute_data_202108 minute_data_202108 minute_data_202108 minute_data_202108

/** * data storage (pseudo code reference implementation) */
public class SinkService {

    @Autowired
    private MongoTemplate mongoTemplate;

    public void sinkMinuteData(JSONObject data) {
        long ms = data.getLongValue("ms");
        String month = FastDateFormat.getInstance("yyyyMM").format(ms);
        String subTable = String.join("_"."minute_data", month);
        mongoTemplate.getCollection(collectionName).insertOne(newDocumet(data)); }}Copy the code

Table query

For precise query, use coarser-grained data to create indexes. Use deviceId and hour to create indexes for minute data. The query condition is time encapsulated table name, as shown in the following example:

[Precise Query] Query the data of the site with the code of device01 and the time of 2021-08-12 00:01

public class QueryDataService {

    @Autowired
    private MongoTemplate mongoTemplate;
    /** * deviceId deviceId * ms minute timestamp */
    public JSONObject findOneMinute(String deviceId, long ms) {
        String month = FastDateFormat.getInstance("yyyyMM").format(ms);
        String subTable = String.join("_"."minute_data", month);
        String hour = FastDateFormat.getInstance("yyyy-MM-dd HH").format(ms);
        String minute = FastDateFormat.getInstance("yyyy-MM-dd HH:mm").format(ms);
        Document condition = new Document();
        condition.append("deviceId", deviceId).append("hour", hour).append("minute", minute);
        Document fields = new Document().append(Constants.MONGO_ID, 0);
        return mongoTemplate.findOne(newBasicQuery(condition, fields), JSONObject.class, subTable); }}Copy the code

The query

db.minute_data_202108.find({"deviceId":"device01", "hour":"2021-08-12 00", "minute":"2021-08-12 00:01"},{"_id":0});
Copy the code

[Scope query] The site id is device01, and the time range is from 2021-07-31 23:59 to 2021-08-01 00:59

Service analysis: Minute data is installed and stored in the month table. How to query the minimum and optimal data when the time range is across monthsCopy the code
  • Scheme 1 Paging cycle query (control the amount of query data, circular call precise query method)
public class RangeQueryService {
   @Autowired
   private QueryDataService queryDataService;
   
   /** * Loop call precise query time range can use */
   public List<JSONObject> findRangeMinute(String deviceId, long startMs, long endMs) {
       Calendar calendar = Calendar.getInstance();
       calendar.setTimeInMillis(startMs);
       List<JSONObject> dataList = Lists.newArrayList();
       while (calendar.getTimeInMillis() <= endMs) {
           JSONObject data = queryDataService.findOneMinute(deviceId, calendar.getTimeInMillis());
           calendar.add(Calendar.MINUTE, 1);
           if(data ! =null) { dataList.add(data); }}returndataList; }}Copy the code
  • Scheme 2

Minute_data_202107 minute_data_202108 query the table twice, the code will not be used (ps: if it is over two months, the data volume is too large, the actual business scenario will not occur and should not occur, A single broad query should be avoided for any database)

Summary: Sub-table query can be implemented specifically according to their own business scenarios. The query business can be planned in advance and the caching technology can be appropriately combined, such as putting the latest minute data into the cache, so as to avoid all the pressure on MongoDBCopy the code