Files auto-categorization & language detection​

Smart Project helps the PM with the file categorization.

Files with extensions: .pdf, .png, .jpg, .jpeg, .gif, .tmp, .zip, .tar, .gz are always categorized as "Source Documents."

General rules for automatic categorization are as follows:

  1. Source Document - default fallback category
  2. Source to Be Prepared - not applicable
  3. CAT Package - recognized extensions: .sdlppx, .mqout
  4. CAT Package (Return) - recognized extensions: .sdlrpx, .mqback
  5. Translated Document - not applicable
  6. Reference File - not applicable
  7. Terminology - recognized extensions: .tbx, .sdtbx, .mqtbx
  8. Translation Memory - recognized extensions: .tmx, .sqtmx, .mqtmx, .sdtmx, .sdltm
  9. CAT Analysis - ignored extension: .doc and size must be below 1 MB
  10. Bilingual Document - recognized extensions: .rtf, .sdlxliff, .mqxlz, .mqxliff, .doc, .docx, .zip
  11. Formatted Document - not applicable
  12. Segmentation Rules - content must include the following attribute: resourcetype="SegRules"
  13. Filtering Rules - content must include the following attribute: resourcetype="FilterConfigs"
  14. QA Report - recognized extensions: .xlsx and size must be below 500 KB
  15. memoQ Light Resource - content must include the following attribute: resourcetype="*"
  16. Other - not applicable

Note that .xlsx files larger than 10 MB will be categorized as Source Documents.


Since files can also be attributed with languages in a Smart Project, the system suggests languages by applying the following methods on file upload in both Home and Vendor Portals:

  • use of project or quote source/target languages
  • use of job languages
  • content parsing of XLIFF, TM and TB files

Remote files, ie. files coming from an integrated CAT tool reflect the language information as seen in the third-party software. It applies to the following file categories:

  1. Bilingual Document
  2. Translation Memory
  3. Terminology
  4. CAT Analysis

Is this article helpful for you?