preface

You suddenly realize you haven’t posted in months, partly because you’ve been busy lately with a new job, and partly because you feel like you have nothing to share (or maybe laziness).

Recently, the company’s project involves the international translation of a large number of texts. Manually adding texts one by one is definitely not realistic, time-consuming and error-prone. The department already has relevant scripts, which can export the string in the project into Excel files, and then send it to professional translators for translation, and then import it into the project. However, after I used this script, I found too many problems and no one to maintain it, so I had to rewrite a new script completely. This article is shared and briefly recorded.

Method of use

The warehouse address

  1. Clone the project locally and modify the Config class configuration in the module_string_script module

  2. Run ExportStrings2ExcelScript. Kt export the string content in the target project, derived after the diagram below

  3. Lock the exported Excel first column to prevent modification, and then offer it for translation

  4. Run importFromExcelscript. kt to import the translated Excel content into the project

Implementation process and problems encountered

export

  1. Walk through each module of the project, using dom4J library to parse string.xml for each language, and store it in a LinkedHashMap< language directory (e.g. values -zh-rcn),

    >
    ,>

    private const val colHead = "name"
    private const val ROOT_TAG = "resources"
    private const val TAG_NAME = "string"
    
    /** * parse the current multilingual content < language directory (such as values -zh-rcn), 
            
             > */
            ,>
    fun collectRes(res: File): LinkedHashMap<String, LinkedHashMap<String, String>> {
        val hashMap = LinkedHashMap<String, LinkedHashMap<String, String>>()
        hashMap[colHead] = LinkedHashMap()
        val saxReader = SAXReader()
        res.listFiles().forEach { langDir ->
            val stringFile = File(langDir, "strings.xml")
            if(! stringFile.exists())return@forEach
            val data = LinkedHashMap<String, String>()
            // Collect all string names as the first column of Excel
            val names = hashMap.computeIfAbsent(colHead) { LinkedHashMap() }
            val doc = saxReader.read(stringFile)
            val root = doc.rootElement
            if (root.name == ROOT_TAG) {
                val iterator = root.elementIterator()
                while (iterator.hasNext()) {
                    val element = iterator.next()
                    if (element.name == TAG_NAME) {
                        val name = element.attribute("name").text
                        val word = element.text
                        names[name] = name
                        data[name] = word
                    }
                }
            }
            hashMap[langDir.name] = data
        }
        return hashMap
    }
    Copy the code
  2. Convert data structure, convert < language directory (such as values- zh-rcn),

    > structure into

    > structure, convenient for subsequent operations
    ,>
    ,>

  3. According to the policy, the parsed content can be processed. Strings with the same content but different names can be arranged together or de-duplicated. Config.isBaseOnWord indicates whether to de-duplicated the content based on string, and BASE_LANG indicates which language to de-duplicated the content

    /** * handle string * [source] <name, < language directory, word>> * that may appear with the same content but different keys@return<name, < language directory, word>> */
    fun processSameWords(source: LinkedHashMap<String, LinkedHashMap<String, String>>): LinkedHashMap<String, LinkedHashMap<String, String>> {
        // Since there may be strings with different keys but the same content, the export aggregates the strings with the same content together
        val haveCNKey = source.entries.first().value.containsKey(Config.BASE_LANG)
        val baseLang = if (haveCNKey) Config.BASE_LANG else Config.DEFAULT_LANG
        // Whether to base on the content in Chinese or the default language, otherwise sort the same content lines together
        return if (Config.isBaseOnWord) {
            / / to heavy
            source.entries.distinctBy {
                val baseWord = it.value[baseLang]
                return@distinctBy if(! baseWord.isNullOrBlank()) baseWordelse
                    it
            }.fold(linkedMapOf()) { acc, entry ->
                acc[entry.key] = entry.value
                acc
            }
        } else {
            // The same row together
            source.entries.sortedBy {
                val baseWord = it.value[baseLang]
                if(! baseWord.isNullOrEmpty())return@sortedBy baseWord
                else
                    return@sortedBy null
            }.fold(linkedMapOf()) { acc, entry ->
                acc[entry.key] = entry.value
                acc
            }
        }
    }
    Copy the code
  4. Write Map data to an Excel file

The import

  1. Poi reads the contents of excel files and stores them in oneMap< table name, <name, < language directory, value >>>In the
    < table name, 
            
             >> */
            ,>
    fun getSheetsData(filePath: String): LinkedHashMap<String, LinkedHashMap<String, LinkedHashMap<String, String>>> {
        val inputStream = FileInputStream(filePath)
        val excelWBook = XSSFWorkbook(inputStream)
        val map = linkedMapOf<String, LinkedHashMap<String, LinkedHashMap<String, String>>>()
        excelWBook.forEach {
        val dataMap = LinkedHashMap<String, LinkedHashMap<String, String>>()
        val head = ArrayList<String>()
        // Get the workbook
        val excelWSheet = excelWBook.getSheet(it.sheetName)
        excelWSheet.run {
            / / the total number of rows
            val rowCount = lastRowNum - firstRowNum + 1
            / / the total number of columns
            val colCount = getRow(0).physicalNumberOfCells
            // Get all language directories
            for (col in 0 until colCount) {
                head.add(getCellData(excelWBook, sheetName, 0, col))
            }
    
            for (row in 1 until rowCount) {
                // The first column is string name
                val name = getCellData(excelWBook, sheetName, row, 0)
                Log.d(TAG, "The first${row}Name =$name")
                val v = LinkedHashMap<String, String>()
                for (col in 0 until colCount) {
                    val content = getCellData(excelWBook, sheetName, row, col)
                    val text = WordHelper.escapeText(content)
                    v[head[col]] = text
                    Log.d(TAG, "lang = ${head[col]} ,value = ${v[head[col]]}")
                    }
                dataMap[name] = v
                }
    
                excelWBook.close()
                inputStream.close()
            }
            map[it.sheetName] = dataMap
        }
        excelWBook.close()
        return map
    }
    Copy the code
  2. Traverse the Map to convert the data structures in each table toMap< language directory (e.g. values -zh-rcn), <name, word>>Structure, easy to merge
    XML * [source] <name, < language directory, word>> *@return< language directory (such as values -zh-rcn), <name, word>> */
    fun revertResData(source: LinkedHashMap<String, LinkedHashMap<String, String>>): LinkedHashMap<String, LinkedHashMap<String, String>> {
        // < language directory (e.g. Values -zh-rcn), < word>>
        val resData = LinkedHashMap<String, LinkedHashMap<String, String>>()
        source.forEach { (name, value) ->
            value.forEach { (langDir, word) ->
                val langRes = resData.computeIfAbsent(langDir) { LinkedHashMap() }
                langRes[name] = word
            }
        }
        return resData
    }
    Copy the code
  3. Read the string of all modules in the project again and save it into oneMap< language directory, <name, word>>And the Map read from the Excel table
    * [newData] string < language directory, 
            
             > * [resData] project read string < language directory, 
             
              > */
             ,>
            ,>
    fun mergeLangNameString(
        newData: LinkedHashMap<String, LinkedHashMap<String, String>>,
        resData: LinkedHashMap<String, LinkedHashMap<String, String>>
    ) {
        if (Config.isBaseOnWord) {
            /** * 1. A string name in Excel matches a string name in the project * 2. Find another string * 3 in the project that has the same content as the string benchmark language. Treat these strings as identical strings and add a copy to newData */
            val baseLang = if (resData.containsKey(Config.BASE_LANG)) Config.BASE_LANG else Config.DEFAULT_LANG
            val baseLangMap = newData[baseLang]
            if(baseLangMap ! =null) {
                // Find string with the same base value
                val sameWords = baseLangMap.map { (name, newWord) ->
                    valoldBaseWord = resData[baseLang]? .get(name)
                    return@mapname to resData[baseLang]? .filter {if(! oldBaseWord.isNullOrBlank()) {return@filter it.value == oldBaseWord
                        }
                        false}? .keys } sameWords.forEach { pair ->if(pair.second? .size ? :0 > 1) {
                        Log.e(TAG, "newName:${pair.first} mapping old names:${pair.second}")}valnewName = pair.first pair.second? .forEach { oldName -> newData.forEach { (lang, map) -> map[oldName] = map[newName] ? :""
                        }
                    }
                }
            }
        }
        resData.keys.reversed().forEach {
            if(! newData.keys.contains(it)) { resData.remove(it)// Language columns that do not exist in Excel are removed directly
                Log.e(TAG, "New data have no lang dir:$it, skip")}}// Iterate over the update item string
        newData.forEach { (lang, map) ->
            // exclude the first column
            if (lang == colHead)
                return@forEach
            var hasChanged = false
            // A multilingual string map in the current project
            val nameWordMap = resData.computeIfAbsent(lang) { linkedMapOf() }
            map.forEach { (name, newWord) ->
                // There are characters in the language in the project, iterating over each value, overwriting the new value in Excel
                if (name.isNotEmpty() && newWord.isNotBlank()) {
                    val oldWord = nameWordMap[name]
                    if(oldWord ! =null && oldWord.isNotEmpty()) {
                        if(oldWord ! = newWord) { hasChanged =true
                            Log.e(
                                TAG,
                                "Replace string: [name:$name, lang: $lang, the old value:$oldWord}, the new value:$newWord]. "")}}else {
                        hasChanged = true
                        Log.e(TAG, "New string: [name:$name, lang: $lang, the new value:$newWord]. "")
                    }
                    nameWordMap[name] = newWord
                }
            }
            if(! hasChanged) {// The language string content does not change, also skipped
                resData.remove(lang)
                Log.e(TAG, "lang dir $lang have no change, skip")}}}Copy the code
  4. Iterate through the merged String map and use dom4J to modify or create the corresponding String.xml file in the original project
    /** * Import excel string into the project */
    fun importWords(newLangNameMap: LinkedHashMap<String, LinkedHashMap<String, String>>, parentDir: File) {
        newLangNameMap.forEach { (langDir, hashMap) ->
            Log.e(TAG, "import lang dir $langDir")
            if (langDir.startsWith("values")) {
                val stringFile = File(parentDir, "$langDir/strings.xml")
                if (stringFile.exists()) {
                    // Modify the original DOM
                    val saxReader = SAXReader()
                    val doc = saxReader.read(stringFile)
                    val root = doc.rootElement
                    val nodeMap = linkedMapOf<String, Element>()
                    if (root.name == ROOT_TAG) {
                        val iterator = root.elementIterator()
                        while (iterator.hasNext()) {
                            val element = iterator.next()
                            if (element.name == TAG_NAME) {
                                val name = element.attribute("name").text
                                nodeMap[name] = element
                            }
                        }
                    }
                    hashMap.forEach { (name, word) ->
                        val node = nodeMap[name]
                        if (node == null) {
                            root.addElement(TAG_NAME)
                                .addAttribute("name", name)
                                .addText(word)
                        } else {
                            if(node.text ! = word) { node.text = word } } } outputStringFile(doc, stringFile) }else {
                    // Create a new DOM
                    val langFile = File(parentDir, langDir)
                    langFile.mkdirs()
                    stringFile.createNewFile()
                    val doc = DocumentHelper.createDocument()
                    val root = doc.addElement(ROOT_TAG)
                    hashMap.forEach { (name, word) ->
                        val element = root.addElement(TAG_NAME)
                        element.addAttribute("name", name)
                            .addText(word)
                    }
                    outputStringFile(doc, stringFile)
                }
            }
        }
    }
    Copy the code
    /** * output string file */
    private fun outputStringFile(doc: Document, file: File) {
        // Iterate over all nodes and remove the original newlines, otherwise the output will have extra newlines because of newlines
        val root = doc.rootElement
        if (root.name == ROOT_TAG) {
            val iterator = root.nodeIterator()
            while (iterator.hasNext()) {
                val element = iterator.next()
                if (element.nodeType == org.dom4j.Node.TEXT_NODE) {
                    if (element.text.isBlank()) {
                        iterator.remove()
                    }
                }
            }
        }
        / / output
        val format = OutputFormat()
        format.encoding = "utf-8"
        format.setIndentSize(4)
        format.isNewLineAfterDeclaration = false
        format.isNewlines = true
        format.lineSeparator = System.getProperty("line.separator")
        file.outputStream().use { os ->
            val writer = XMLWriter(os, format)
            // Whether to escape characters
            writer.isEscapeText = false
            writer.write(doc)
            writer.flush()
            writer.close()
        }
    }
    Copy the code

Problems encountered

How to handle strings with the same content but different names

Projects always contain a lot of these characters for one reason or another, and it’s usually best not to change them, but an extra piece of duplicate text in an export translation can mean extra money, so it’s best to tailor it to the content. Then, when importing, once it matches a string name in the project, it looks for other strings with the same content, and overwrites them all as new values in Excel, and the problem is solved.

Problem collecting item characters

String collection was initially done using single-line re matching, but this was obviously problematic because string tags could not be matched once they were not on the same line, so dom4J XML document parsing was changed

The output file format is incorrect

Consider using dom4j’s OutputFormat to format XML with minimal changes to the source file. If the XML is formatted from the original document and isNewlines=true, there will be many more empty lines. This is because newlines are treated as a Node in Dom4j. There was a newline after each line, but we added a new line after setting isNewlines, so we need to remove the newline

Character escape problem

Characters such as English single and double quotation marks will be ignored in string if they are not escaped and cannot be displayed on UI. Therefore, you can escape these characters in Advance when reading characters in Excel