Files auto-categorization & language detection
This is an old version of the article. Please check the actual version in our new Knowledge Base.
Smart Project helps the PM with the file categorization.
Files with extensions: .pdf, .png, .jpg, .jpeg, .gif, .tmp, .zip, .tar, .gz are always categorized as "Source Documents."
General rules for automatic categorization are as follows:
- Source Document - default fallback category
- Source to Be Prepared - not applicable
- CAT Package - recognized extensions: .sdlppx, .mqout
- CAT Package (Return) - recognized extensions: .sdlrpx, .mqback
- Translated Document - not applicable
- Reference File - not applicable
- Terminology - recognized extensions: .tbx, .sdtbx, .mqtbx
- Translation Memory - recognized extensions: .tmx, .sqtmx, .mqtmx, .sdtmx, .sdltm
- CAT Analysis - ignored extension: .doc and size must be below 1 MB
- Bilingual Document - recognized extensions: .rtf, .sdlxliff, .mqxlz, .mqxliff, .doc, .docx, .zip
- Formatted Document - not applicable
- Segmentation Rules - content must include the following attribute: resourcetype="SegRules"
- Filtering Rules - content must include the following attribute: resourcetype="FilterConfigs"
- QA Report - recognized extensions: .xlsx and size must be below 500 KB
- memoQ Light Resource - content must include the following attribute: resourcetype="*"
- Other - not applicable
Note that .xlsx files larger than 10 MB will be categorized as Source Documents.
Since files can also be attributed with languages in a Smart Project, the system suggests languages by applying the following methods on file upload in both Home and Vendor Portals:
- use of project or quote source/target languages
- use of job languages
- content parsing of XLIFF, TM and TB files
Remote files, ie. files coming from an integrated CAT tool reflect the language information as seen in the third-party software. It applies to the following file categories:
- Bilingual Document
- Translation Memory
- Terminology
- CAT Analysis
Customer support service by UserEcho