background

Then the last article ten billion level traffic real-time analysis statistics – data structure design we have designed the log structure, and then we are ready to start the masturbation code, I like this part of the link, but a up even on the masturbation code of the program is not a good programmer, or the first design flow chart? Come on then!!

The flow chart

To design a

  1. Users initiate article operations and initiate request logs
  2. The logs will be loaded by the SLB server to the log hub server.
  3. The NSA will be stored as a log collection center and can also be usedrsyncSynchronize logs on nodes to the log center.
  4. At the coreETLProcedures, will be on the log center all node data extraction conversion load.
  5. The Hbase in the figure above is easier to understand, but whyMysql? Because we need more fine-grained control over the log write time, we mainly use the log offset to record the log time, which will be described in more detail later.

Design 2

  1. Users initiate article operations and initiate request logs
  2. The logs will be loaded by the SLB server to the log hub server.
  3. Filebeat collects node logs to Kafka, which is mainly used for log peak clipping. ** or: ** usenginxWrite the log directly to Kafka becausenginxIt’s also production grade.
  4. ETL will consume Kafka data and write it to Hbase.
  5. Same as design 1

Log center

The log center storage will look like this

├ ─ ─log│ ├ ─ ─ the 2019-03-21 │ │ ├ ─ ─ 111.12.32.11 │ │ │ ├ ─ ─ 10. _01 log │ │ │ └ ─ ─ 10. _02 log │ │ ├ ─ ─ 222.22.123.123 │ │ │ ├ ─ ─ 0 _01. The log │ │ │ ├ ─ ─ 0 _02. Log │ │ │ └ ─ ─ 0 _03. Log │ │ └ ─ ─ 33.44.55.11 │ ├ ─ ─ the 2019-03-22 │ └ ─ ─ the 2019-03-23Copy the code
  1. One file is generated per node every minute.
  2. One folder a day.
  3. This design makes it easy to check errors.

The log content is as follows

{"time": 1553269361115,"data": {"type": "read"."aid":"10000"."uid":"4229d691b07b13341da53f17ab9f2416"."tid": "49f68a5c8493ec2c0bf489821c21fc3b"."ip": "22.22.22.22"}}
{"time": 1553269371115,"data": {"type": "comment"."content":"666, give me your support."."aid":"10000"."uid":"4229d691b07b13341da53f17ab9f2416"."tid": "49f68a5c8493ec2c0bf489821c21fc3b"."ip": "22.22.22.22"}}
Copy the code

Finalize plan

I chose design 1 because we came to point 5. The online business has been stable for a year, and this solution is feasible.

In the next article, we will actually begin to polish our golden code, all of which will be implemented in Scala. Do you have any questions for me? Four words: