Abstract: This article shows you how to extract JSON fields directly from MongoDB to MaxCompute using DataWorks data integration.

Data and account preparation

First you need to upload the data to your MongoDB database. Aliyun is used in this example
MongoDB version of cloud databaseIf the network type is VPC (you must apply for a public IP address; otherwise, the system cannot communicate with the default DataWorks resource group), the test data is as follows:

{
    "store": {
        "book": [{"category": "reference"."author": "Nigel Rees"."title": "Sayings of the Century"."price": 8.95}, {"category": "fiction"."author": "Evelyn Waugh"."title": "Sword of Honour"."price": 12.99}, {"category": "fiction"."author": "J. R. R. Tolkien"."title": "The Lord of the Rings"."isbn": "0-395-19395-8"."price"] : 22.99},"bicycle": {
              "color": "red"."price": 19.95}},"expensive"10} :Copy the code

Log in to the DMS console of MongoDB. In this example, the database is admin and the set is userlog. You can run db.userlog.find().limit(10) in the query window to view the uploaded data, as shown in the following figure.






In addition, you need to create a user in the database ahead of time for DataWorks to add the data source. In this example, commands are used
db.createUser({user:"bookuser",pwd:"123456",roles:["root"]})To create a user named bookuser, password 123456, and permission root.

Extract data into MaxCompute using DataWorks

  1. Add MongoDB data source
    Enter the DataWorks
    Data integrationConsole, added
    Mongo typeThe data source.






    The specific parameters are as follows. You can click to test the connectivity of the data source. In this document, MongoDB is deployed in a VPC environment, so the data source type must contain a public IP address.






    The address and port number can be accessed through the
    MongoDB Management ConsoleClick on the instance name to get it, as shown below.






  2. Create a data synchronization task
    Create a new one on DataWorks
    Data synchronization type node.






    At the same time, create a new one at DataWorks
    Build table taskTo store JSON data. In this example, the new table is named mqData.






    Table parameters can be completed through a graphical interface. In this example, the MQDATA table has only one column of type String and the column name is MQ Data.






    After the configuration is complete, you can set data synchronization parameters on the GUI, as shown in the following figure. Select the target data source name as ODPS_FIRST and the target table as mqData just created. Mongodb_userlog data source type mongodb_userlog data source we just created. After the configuration is complete, click Convert to Script to switch to script mode.






    The following is an example of the script pattern code.

    {
        "type": "job"."steps": [{"stepType": "mongodb"."parameter": {
                    "datasource": "mongodb_userlog"// Data source name"column": [{"name": "store.bicycle.color", //JSON field path, in this case extract color value"type": "document.document.string"// The number of destination fields must be the same as name. If the JSON field you select is a first-level field, as in this example, expensive, simply fill in string. }].CollectionName // collectionName: "userlog"
                },
                "name": "Reader"."category": "reader"
            },
            {
                "stepType": "odps"."parameter": {
                    "partition": ""."isCompress": false."truncate": true."datasource": "odps_first"."column"[//MaxCompute Specifies the column name of the table"mqdata"]."emptyAsNull": false."table": "mqdata"
                },
                "name": "Writer"."category": "writer"}]."version": "2.0"."order": {
            "hops": [{"from": "Reader"."to": "Writer"}},"setting": {
            "errorLimit": {
                "record": ""
            },
            "speed": {
                "concurrent": 2."throttle": false."dmu": 1}}}Copy the code

    After completing the above configuration, click Run connect. The following shows an example of a run success log.






results

In your
The business processCreate a new ODPS SQL node in.






You can enter
SELECT * from mqdata;Statement to view the data in the current MQData table. Of course, you can also directly in this step
MaxCompute clientEnter the command to run.