XML files (eXtensible Markup Language)

Use this dialog to control how memoQ imports XML (eXtensible Markup Language) files.

MemoQ xml filter config dialog XML files (eXtensible Markup Language)

Invoking

In the Translations pane of Project home, click Add document as… below the document list, and in the Open dialog, locate and select an XML file. You can also get to this dialog if you go to the Filter configurations pane in the Resource console, select an XML configuration, and click the Edit link below the list.

Options

This topic describes the XML filter configuration options using the following sample document. As with any XML document, it contains some “normal” text that will need to be translated, interspersed with tags like <doc> that hold descriptive or structural information primarily. Tags can have attributes, which have values (id=”0527″). The following sections explain how these can be interpreted in memoQ.

 

<?xml version=”1.0″?>

<doc>

 <article id=”0527″>

   <updInfo>Aug-04-2006(NOT TO BE TRANSLATED)</updInfo>

   <title>XML formats</title>

   <par id=”par1″>

     This is a short tutorial for <ref target=”#89″>XML</ref> formats. 

     This <img target=”diagram1″ alt=”Diagram for illustration purposes” /> 

     diagram is provided for illustration only. All material is copyrighted 

     (&copyright;).

   </par>

 </article>

</doc>

memoQ can use the document type definition (DTD) to determine what tags and attributes can be present in the XML document. Without the DTD file, memoQ can also parse one or more reference files to discover tags and attributes. If you get to the Document import settings dialog through the Add document as… command in Project home, the files you selected for import are automatically added as reference files.

The options to configure memoQ for an XML file come in four tabs: Encoding and reference files, General, Tags and attributes, and Entities.

 
Encoding and reference files tab (see the screenshot above):

· Add file: By clicking this button, you can browse to the location of your reference files and add them to the configuration. If you select an encoding from the list on the left, you can verify that the file displays correctly. The encoding you select here will also be used as a default encoding later on, whenever you import a document that does not contain an encoding declaration.
· Remove file: Click this button to remove a previously added reference file.
· DTD/Schema text box: if a document type definition (DTD) or an XML schema definition (XSD) file is available for the XML files you are importing, you can specify it in this text box. This allows you to quickly feed the complete list of possible tags into the format configuration. The DTD or the XSD can also be associated with the new format, allowing memoQ to automatically select the format when importing documents that contain a reference to the same DTD or XSD file. To locate and select the DTD or XSD file, click the Browse button next to the text box.

General tab:

MemoQ xml filter config general tab XML files (eXtensible Markup Language)

· DTD or namespace URI: Specifies the name of the DTD file or the namespace in the XML schema that will be associated with the format. Associating a DTD or an XML namespace with the format allows memoQ to automatically select the format configuration when importing documents that contain a reference to the same DTD file or namespace.
· Import XML comments: If you enable this option, XML comments will be imported from documents as translatable text.
· Detect encoding if possible: If enabled, memoQ will attempt to detect the encoding based on information in the document to be imported.
· Input encoding if not specified: Here you can define what encoding memoQ will use when importing a document without an encoding declaration.
· Output encoding: Defines the encoding of the translated documents memoQ exports. You can choose a specific encoding or use the same encoding as the original.
· Normalize whitespace by default: If enabled, memoQ will convert sequences of tab, space or newline characters into a single space character. In addition, sequences of whitespace at the beginning and end of elements will be trimmed. Normalization is recommended when the XML document uses whitespace characters for readability only. This setting adjusts the default behavior for the XML format, but normalization can also be adjusted on the tag level.

Note: In our sample document, the text inside the <par> element contains newlines and spaces that are there for readability purposes only. These newlines and spaces hold no important information, they only make the document easier to scan when read by a human. On the other hand, these whitespace characters can be cumbersome to handle during translation so, in this case, it is recommended to enable whitespace normalization either for the whole format or the <par> element.

· Observe xml:space attribute in file: XML documents can contain attributes that prescribe whether or not whitespace should be normalized in a specific element. If this option is enabled, memoQ will follow such instructions in the document. If it is disabled, memoQ will treat whitespace according to its global and tag level settings.
· Break segments at newlines if whitespace is preserved: Check this check box if you want memoQ to treat newline characters as segment boundaries. Text in XML files can contain newline characters if you choose to preserve whitespace. memoQ preserves whitespace when the Normalize whitespace by default check box is turned off. In this case, newline characters supposedly have a meaning in the text, and most of the time each line should be translated as a separate segment. The general advice is, turn on Break segments at newlines… if you choose to turn off Normalize whitespace by default.
· Restore custom entities in export: Check this check box to export all characters included in custom entity rules in their custom entity format. (In our example, ‘©’ entered into the target side will be exported as ‘&copyright;’.) Uncheck this check box to export all characters included in custom entity rules as Unicode characters. (In our example, ‘©’ entered into the target side will be exported as ‘©’.)
· Log warnings during import: If this option is enabled, memoQ will create a list of technical irregularities encountered in the source document during import, and save that list into a text file.
· XSLT file text  box: By default, memoQ creates a preview for the XML file using the default Internet Explorer style sheet. In this text box, you can specify another XML style sheet that will be used to create the preview. To locate and specify the XSLT file, click the … button next to the text box.

Important: The XML style sheet must create HTML output. If the style sheet emits a different format (plain text, RTF or another XML), you cannot use it here.

· Remove XSLT assignment: Click this link to delete the file  name from the XSLT file text box, and returns to the default Internet Explorer style sheet.
 

Tags and attributes tab:

When first opening this tab for a new format, memoQ will display the screen below, but without any information filled in.

MemoQ xml filter config tags attributes tab XML files (eXtensible Markup Language)

Tags area (top section):

· Handled tags: This list indicates all the tags you added to the XML format configuration with specified properties.

Note: The type and properties of handled tags are indicated by abbreviations in the Info column of the Handled tags list. Tag types are: Str stands for structural; In for inline; NT for non-translated; and Req for required. Whitespace handling options: Inh stands for inherit, Pres for preserve, and Norm for normalize. Context handling and commenting options: Ctxt signifies that content is imported as context ID, and Com signifies that content is imported as comment. All of these types and options are explained below.

· Inline: Select this option to specify that the tag selected in the Handled tags list is inline. Inline tags represent markup that is imported inside segments, and is displayed as inline tags. (For more information on inline tags, refer to Formatting Tags. In other tools, inline tags are also referred to as internal.) If this option is not enabled, memoQ will handle the tag as structural. Structural tags mark elements that are blocks of content for translation. Being delimiters, structural tags never appear within text for translation after import. In other tools, structural tags are also referred to as external.

Note: In our example, it is recommended to specify the ref and img tags as inline (because they appear inside sentences), and all the others as structural.

· Not translated: Select this option to specify that the tag selected in the Handled tags list represents non-translatable text. These portions of text will not be imported for translation.

Note: If you specify that an element is non-translated, the contents of its child elements will not be imported either. Therefore, make sure you do not set elements like Body or Main to be non-translated.

Note: If an inline tag is defined as non-translated, all of its content and children will be imported into a single inline tag.

· Required: Check this option to specify that the tag selected in the Handled tags list or entered into the field under the Handled tags list is required. Required tags are special inline tags that must be kept in the translation if present in the source segment. memoQ enforces this condition and displays an error sign if a required inline tag is not copied to the target side.
· Whitespace handling: Use this option to specify how whitespace will be handled in the text content of the element. Inherit means that the element will receive the same whitespace handling setting as the parent element. The root element receives the default setting specified in the General tab. Preserve means that all whitespace will be retained and imported into the translation document. Normalize means that sequences of whitespace characters will be replaced by a single space character.
· Tag content is context ID for siblings: If this option is enabled, the content of the element will be used as the context identifier of the subsequent segment imported from the elements that are at the same level in the hierarchy.

Note: This context ID will not be applied to all segments imported from the same level of the hierarchy. Instead, memoQ will use the context ID only for the next suitable segment. Other siblings will remain without a context
ID.

· Tag content is comment for siblings: If this option is enabled, the content of the element will be used as the comment of the segment(s) imported from the elements that are at the same level in the hierarchy.

Note: The comment or context ID is only received by one element that is after the element that is used as context or comment in the original document.

· MemoQ remove button XML files (eXtensible Markup Language): Click this button to remove the selected tag from the Handled tags list.
· MemoQ add button XML files (eXtensible Markup Language): Click this button to add the tag entered into the text box to the Handled tags list.
· Clear list: Removes every tag from the Handled tags list, along with and all their settings and attributes.
· Populate: Click this button to extract all tags with all of their attributes that occur in any of the reference files or the specified DTD. After clicking the Populate button, memoQ will fill the list with the tags it finds, and attempts to automatically determine the type of the tags (inline or structural).

Note: If a tag is not present in the format configuration, memoQ uses the following default settings when finding the tag during import: the tag will be imported as structural and translatable, will inherit its whitespace settings from the parent element, and its content will not be imported as a comment or a context identifier.

Attributes area (middle section):

By clicking an item in the Handled tags list, the controls below are updated to show the attributes added for those tags. Attributes can be entered manually or filled in using the Populate button.

· Tag attributes: This list indicates all the attributes assigned to a tag you added to the XML format configuration with specified properties.

Note: The properties of handled attributes are indicated by abbreviations in the Info column of the Tag attributes list. Tr stands for translatable; Req for required; and F for filtered. NX and NY signify the conditional import options, CxC and CxS stand for the context identifier options, while CmC and CmS show the commenting options.

The options are:

· Translatable: Check this check box to specify the attribute selected in the Found attributes list or entered into the field under the Tag attributes list as translatable.
· Required: Check this check box to specify the attribute selected in the Tag attributes list or entered into the field under the Found attributes list as one that must be present in any tag inserted to the translation. A required attribute is not necessarily translatable: this property is used as a quality checking feature of memoQ to ensure the well-formedness of the translation.
· Filtered: Check this check box to specify the attribute selected in the Tag attributes list as one that should be hidden within the tags when switching to the Show filtered inline tags option.

Note: memoQ also uses filtered attributes for INX, MIF, Transit and TTX documents. Filtering makes it possible to display only those attributes that are useful for the translator and to avoid disturbingly long tags.

· Non-translation: By clicking this button you can specify conditions that make the selected tag non-translatable based on the values of the selected attribute. After clicking this button, the Non-translation settings for attribute dialog appears with a list of options and their explanation.
· Context: By clicking this button you can specify that the value of the selected attribute is imported as context information for the children or the siblings of the selected tag. After clicking this button, the Context settings for attribute dialog appears with a list of options and their explanation.

Note: In our example, the id attribute of the par element could be used as a context identifier.

· Comment: By clicking this button you can specify that the value of the selected attribute is imported as comments for the children or the siblings of the selected tag. After clicking this button, the Comment settings for attribute dialog appears with a list of options and their explanation.
· MemoQ remove button XML files (eXtensible Markup Language): Click this button to remove the selected attribute from the Tag attributes list.
· MemoQ add button XML files (eXtensible Markup Language): Click this button to add the attribute entered into the text box on the left to the Tag attributes list.

Note: If an attribute is not present in the format configuration, memoQ treats in as non-translatable, not required and not filtered. Such an attribute will not be used for non-translation conditions, or in context and comment processing.

Occurrences area (lower section):

This area at the lower part of the dialog provides further assistance to the creator of the format by showing the occurrences of the tag selected in the Handled tags list, and its attribute selected in the Handled attributes list, in the reference documents. The tags are highlighted in red, and their attributes in green.

· File: Use this drop-down list to select any of the reference files for displaying the occurrences therein.
· Instance: Use this list to select any of the occurrences in the selected reference file to be displayed, by clicking on the appropriate number.

 

Entities tab

Here you can specify how memoQ should handle entities.

MemoQ xml filter config entities tab XML files (eXtensible Markup Language)

· Entity groups: In this list, you can select standard groups of entities which should be converted during import. XML Predefined entities (‘&amp;‘, ‘&lt;’, ‘&gt;’, ‘&quot;’ and ‘&apos;’) are always handled.
· Custom entities: In this list, you can specify any non-standard entities that are specific to your document type. Custom entities can be handled in memoQ translation documents as inline tags, memoQ formatting tags or “normal” Unicode characters. You can select the desired choosing one of the three radio buttons under Entity behavior. A new entity can be added to the list by entering it into the Entity box. The settings of an existing entity can be modified by selecting the entity in the Custom entities list. In the sample document, there is one custom entity, ‘&copyright;’, which should be converted to ‘©’ for translation.
· Add/change: Click this button to add a custom entity to the Custom entities list view, or to modify the custom entity selected in the Custom entities list view with the settings specified above.

Note: In the first field under the Custom entities list, you can enter the entity appearing in your document between & and ;. Using the radio buttons, you can select whether this entity should be treated as a character or as a memoQ tag. If the entity should appear on the translation grid as a character, enter its Unicode code into the second field or enter the character into the third field.

· Remove: Click this button to remove the selected custom entity from the Custom entities list.
· Populate from files: Click this button to extract all custom entities that occur in any of the reference files. After clicking the Populate from files button, all custom entities will appear in the Custom entities list.

The result with the sample document

When some parts have been translated, the translation document should look like this:

MemoQ xml filter config sample translation XML files (eXtensible Markup Language)

Things to mark on this screenshot:

· The text Aug-04-2006(NOT TO BE TRANSLATED) is missing from the document because the updInfo attribute was designated as non-translated.
· The text Diagram for illustration purposes appears as a separate segment, and alt=”@2″ in the img tag in segment three indicates that the translatable attribute’s value can be found two segments lower in the translation document.

Note: Translatable attributes are collected and stored during the import process of the document, and inserted in the translation document at the position where the current block of content, delimited by a structural tag, ends.

· The opening tag ref was inserted in the target cell of segment 2 without the required attribute target, therefore a warning is indicated.
· The placeholder tag img is missing from the target cell of segment 3, therefore a warning is indicated.
· The entity ‘&copyright;’ has been converted to ‘©’ in segment 4.

XML files (eXtensible Markup Language)