What is the LogMiner

LogMiner is an Oracle tool that parses Redo logs and Archived Redo logs

What can LogMiner do?

There are many listed in the official documents, you can take a look.

Our current project is using LogMiner based Debezium Oracle Connector for data migration

What is the principle of Oracle LogMiner data migration?

First, there are a few concepts that you need to understand

  • Redo log: Redo logs log all changes made to data blocks. Oracle requires at least two Redo log groups
  • Archived Redo log: When a Redo log is full, a log switch occurs, and data changes are logged to the next Redo log (so make sure there are at least two Redo logs). If archive mode is enabled, Oracle archives the Redo Log that is fully written.
  • System Change Number (SCN) : indicates the internal logical timestamp of Oracle
  • “Flashback” : Query data by FlashbackSELECT ... AS OF SCNYou can query the full amount of Oracle data at a point in time

Here’s the idea:

  1. First, query the current SCN
  2. Query the full data at this time according to SCN
  3. Start_SCN is specified by Logminer to obtain incremental data

Installation and Configuration

For those of you who are not familiar with Oracle, please refer to the documentation I have compiled

  • Oralce Install (docker) : github.com/TavenYin/da…
  • Logminer:github.com/TavenYin/da…

A profound

With the environment ready, let’s open the box and try out Logminer

Conn # logMiner /password

1. Build a data dictionary

LogMiner uses a data dictionary to convert internal object identifiers and data types into normal fields and data formats

# INSERT INTO HR.JOBS(JOB_ID, JOB_TITLE, MIN_SALARY, MAX_SALARY) VALUES('IT_WT','Technical Writer', 4000, 11000); # If there is no data dictionary, Insert into "UNKNOWN"."OBJ# 45522"("COL 1","COL 2","COL 3","COL 4") values (HEXTORAW('45465f4748'),HEXTORAW('546563686e6963616c20577269746572'),HEXTORAW('c229'),HEXTORAW('c3020b'));Copy the code

There are three ways mentioned in the official document:

  • Online data dictionary: When you can access the source database where Redo was created without any changes to the table structure. Consider using an online data dictionary. This is the simplest and most effective option and is recommended by Oracle. Because online data dictionaries always store the latest structures. If a table structure change occurs and Logminer captures an older version of the data, the SQL will behave as in the above code block

  • To extract the data dictionary from the redo file, run the BEGIN dbms_logmnr_d. BUILD (options => dbMS_logmnr_d. STORE_IN_REDO_LOGS command. END; This operation occupies some database resources

  • Extract data dictionary to Flat File: Oracle maintains this option for compatibility with historical versions. This option is not used in this article

When LogMiner is started with dbMS_LOGmnr.ddL_dict_tracking specified, LogMiner automatically captures DDL to update the internal dictionary, so that even when table structure changes occur, It can also parse DDL correctly. Note: This option cannot be used with an online data dictionary

More explanation Oracle documents: docs.oracle.com/en/database…

For this step, I choose the online data dictionary, do nothing, go straight to the next step, okay

2. Add log files

SQL> select member from v$logfile; MEMBER -------------------------------------------------------------------------------- /opt/oracle/oradata/ORCLCDB/redo03.log /opt/oracle/oradata/ORCLCDB/redo02.log /opt/oracle/oradata/ORCLCDB/redo01.log # Add the redo log SQL > EXECUTE DBMS_LOGMNR. ADD_LOGFILE (- LOGFILENAME = > '/ opt/oracle/oradata ORCLCDB/redo03 log', - OPTIONS => DBMS_LOGMNR.NEW); EXECUTE DBMS_LOGMNR.ADD_LOGFILE( - LOGFILENAME => '/opt/oracle/oradata/ORCLCDB/redo02.log', - OPTIONS => DBMS_LOGMNR.ADDFILE); EXECUTE DBMS_LOGMNR.ADD_LOGFILE( - LOGFILENAME => '/opt/oracle/oradata/ORCLCDB/redo01.log', - OPTIONS => DBMS_LOGMNR.ADDFILE);Copy the code

3. START_LOGMNR

SQL> EXECUTE dbMS_logmmnr.start_logmnr (-options => dbMS_logmmnr.dict_from_online_catalog);Copy the code

Then an INSERT is performed

4. Query result

The results captured by LogMiner were obtained by querying V$LOGMNR_CONTENTS. When executing this view query, LogMiner parses Redo and Archived logs in order, all of which are a little slower

  SELECT OPERATION, SQL_REDO, SQL_UNDO
  FROM V$LOGMNR_CONTENTS
  WHERE table_name='TEST_TAB';
Copy the code

The result is as follows, and you can see the SQL we just inserted

In actual combat

Now that we’ve seen the idea of migration and how Logminer works, we’re ready to do a demo.

Due to lack of space, I will only discuss ideas and some of my ideas.

For the full code, see 👉 github.com/TavenYin/da…

1. The whole idea

Related implementation ideas refer to Debezium

We need to explain why Logminer needs to be restarted when Redo switchover occurs in step 4

  • After the Redo Log switch, a new archive is generated and we need to Add a new archive Log
  • End LogMiner can solve this problem by keeping the LogMiner session open for a long time. So the code logic needed to find a time to restart LogMiner, and the Redo switch was a good time to do it.

As I write this, I suddenly have a question

As we mentioned earlier, LogMiner only parses the Redo Log and generates views dynamically when querying V$LOGMNR_CONTENTS.

Refer to the figure above. If between steps 4 and 6, the program checks that there is no RedoLog switch and is ready to continue. If V$LOGMNR_CONTENTS is changed, the Current Redo Log is overwritten.

Since start_logminer specifies the start and end SCNS, even if a new Archived Log is added the next time it is executed, it will not read this data because the SCN has already been crossed

After I did the tests, I found that if the situation was that extreme, it would be.

So why didn’t Debezium consider this?

As I understand it, in a production environment Redo logs are not frequently switched and there must be multiple Redo groups. It’s almost impossible to get covered in such a short period of time.


2. Process the result set of V$LOGMNR_CONTENTS

At the beginning of the Debezium source code, did not pay attention to this place, in their own hands to do it again, found that the logic of this place a little trouble

Each line of V$LOGMNR_CONTENTS could be commit, rollback, DDL, DML of a transaction

What is the TransactionalBuffer mentioned above?

What happens when we read V$LOGMNR_CONTENTS is as follows, because we only read from startScn to the current Scn each time. What happens is that the transaction is not committed, but we get a portion of the DML, and we are not sure if it is committed, so we need to temporarily cache ** “half” of the transaction

If dbMS_logmnr.start_logmnr is called, you can specify a COMMITTED_DATA_ONLY option to read only committed transactions. So you don’t have to deal with the result set. But why not select COMMITTED_DATA_ONLY? Using this strategy would wait for the transaction to commit before responding to the client, which could easily result in “Out of Memory”, so this strategy is not suitable for our program.


3. Handle the migration process breakdown

Data migration is bound to be a long process, and if something happens during execution and causes a Java process to hang, do you have to start from scratch?

If we can be sure that all records prior to an SCN have been processed, we can start processing from that SCN on the next restart

The code for two places where you can confirm that the SCN has been fully processed before is as follows:

A. There is no data in the current TransactionalBuffer, which means that all transactions prior to END_SCN have been committed

B. When committing a transaction, if the Start_SCN of the current transaction is older than all transactions in TransactionalBuffer

4. SQL parsing

If you want to synchronize Oracle data to other databases (including NoSQL), the best way is to parse SQL into structured objects that can be consumed by downstream services.

Debezium, I haven’t gotten around to it yet. The current solution is to use the com. Alibaba. Druid. SQL. SQLUtils, this class can be parsed into SQL structured objects, we have some processing to these objects, can let the downstream service consumption.

DEMO

Run as follows

Making 👉 github.com/TavenYin/da…

reference

  • Oracle Redo: docs.oracle.com/cd/B28359_0…
  • Oracle Archived: docs.oracle.com/cd/B28359_0…
  • Oracle Flashback: docs.oracle.com/cd/E11882_0…
  • LogMiner: docs.oracle.com/en/database…
  • Debezium Oracle Connector: Debezium. IO /documentati…