By: Kirti Vashee
The issue of standards continues to come up in the translation industry and most people agree that they are needed for the evolution of the translation industry. It is increasingly clear that global businesses will require various different standards to cope with the increasing volumes of translation, the decreasing timeframes to get these translations done and the increasing need to interact with external software/processes involved in content creation that could facilitate international initiatives. Business translation increasingly requires efficient interaction with a large number of stakeholders, processes and tools with much of this happening at a speed that has never been seen before.
Standards can facilitate the exchange, interoperability and integration of data between stakeholders and processes and also help to establish trusted definitions of the production process to create the final output that is clearly associated with value.
Standards are needed to scale and handle the volume of translation that will likely be done and enable greater inter-process automation as we head into a world where we continuously translate dynamic streams of content. Free online MT services have given the global enterprise a taste for what translation as a utility looks like. Now some want to see if it can be done better and in a more focused way at higher quality levels to enhance global business initiatives and expand the dialog with the global customer. (This can be done much more effectively with customized, purpose-driven MT working with and steered by skilled language professionals). Translation as a utility is a concept that describes an always-on, on-demand, streaming translation service that can translate high value streams of content at defined quality levels for reasonable rates. Data will need to flow in and out of authoring, content management, social networks, translation workflow, MT and TM systems as needed.
The discussion on quality standards in particular is often difficult because of conflation, i.e. very different concepts being equated a nd assumed to be the same. I think we have at least 3 different concepts that are being referenced and confused as being the same concept, in many discussions on “quality”.
- End to End Process Standards: ISO 9001, EN15038, Microsoft QA and LISA QA 3.1. They have a strong focus on administrative, documentation, review and revision processes, not just onthe linguistic quality assessment of the final translation. Many are skeptical about the value of these process standards, but there does seem to be a good case for using them as a means to create a “continuous improvement” culture and disciplined and efficient production environment. The key to success with any process improvement is an effective and accurate measurement that is repeatable and objective, and can thus monitor change in a useful way.
- Automated SMT System Output Translation Quality Metrics (TQM): BLEU, METEOR, TERp, F-Measure, Rouge and several others that only focus on rapidly scoring MT output by assessing precision and recall and referencing one or more human translations of the exact same source material to develop this score. (Useful for MT system developers but not much else and often grossly misunderstood and improperly measured by many LSPs who are unaware of pitfalls).
- Human Evaluation of Translation Linguistic Quality: Error categorization and subjective human quality assessment, usually at a sentence level. SAE J2450, the LISA Quality Metric and perhaps the Butler Hill TQ Metric (that Microsoft uses extensively and TAUS advocates) are examples of this. The more actionable the error identification process is, the more useful it will be to create a continuous improvement culture. (Can vary greatly depending on the humans involved.)
In addition to this, there are also standards that affect data interchange between processes and workers during the translation transformation process.
Linguistic Data Interchange: These standards facilitate data exchange from content creation tools to other related tools. Thus, they enable transformation of textual data within a broader organizational data flow context than just translation, good interchange standards can ensure that fast flowing streams of content get transformed more rapidly and get to customers as quickly as possible. XLIFF and TMX are examples of “standards” here, but I think the future is likely to be more about interfacing with “real” mission-critical systems (DBMS, Collaboration, Social Media monitoring and Content Management Systems) used by companies rather than just TMS and TM systems which, in my opinion, are very likely to become less relevant and important to large scale corporate translation initiatives. Continuous and rapid translation environments should not require separate TM and MT tools and it is possible that these get tightly embedded into content management systems.
To this mix you could also add the “container” standards discussions, to further obfuscate matters. These include TMX, TBX, SRX, GMX-V, xml:tm etc.. Are any of these standards, even by the much looser definition of “standard” in the software world? If you look at the discussions on quality and standards in translation around the web we can see that a real dialog is difficult and clarity on this issue is virtually impossible. While there are sincere efforts by some focused on XLIFF and open XML based standards, we have yet to see any widespread use of standards by the industry at large. The state of the normal translation production practice in 2012 is still highly problematic because of the lack of common standards and very often translators bear the brunt of these problems.
Why Do Standards Matter?
The value of standards is very clear in the physical world: electric power plugs, shipping containers, tires, CD and DVD discs etc… Life would indeed be difficult if these things were not standardized. Even in communications we have standards that enable us to easily communicate: GSM, TCP/IP, HTTP, SMTP and the whole set in the OSI layers. Even regular people care and know about some of these. These standards make many things possible: exchange, interoperability, integration into larger business processes, evolving designs and architecture. In the software world it gets murkier, standards are often de-facto (RTF, SQL?, PDF, DOC?, DOCX?) or just really hard to define. In software it is easier to stray, so MP3 becomes WMA and AIFF and there is always a reason, usually involving words like better and improved to move away from the original standard. The result: You cannot easily move your music collection from iPod to Zune or vice versa, or to a new better technology without some pain. You are stuck with data silos or a significant data conversion task.
Real standards make life easier for the whole eco-system, i.e. the content creators, LSPs, translators in the professional translation community, the content consumers and everybody else who interacts, transforms or modifies valuable content along the way. Standards matter if you are setting up translation production lines and pushing translation volumes up. At AGIS2010, Mahesh Kulkarni, C-DAC, made a comment about standards in localization. He called them traffic rules that ease both user and creator experience (and of course these rules matter much more when there is a lot of traffic) and he also said that standards evolve and have to be tested, and need frequent revision before they settle. It is interesting to me that the focus in the non-profit world is on studying successful standards development in other IT areas, in contrast to what we see at TAUS and GALA where the modus operandi seems to be to create separate new groups, with new missions and objectives, though they both claim to be in the interest of “everyone”. Many translators ask how they will get a voice in this and this non-representation remains a problem. This problem comes about to some extent because it is necessary to get the standards development effort funded and it is very hard to get everybody to agree on a single approach, goal and focus. It seems quite possible and even likely that other outside forces will resolve this issue and establish standards that will become de-facto standards. We can already see this happening with some cloud-based initiatives where none of the tools that imprison linguistic data are used.
To understand what the impact of a good data interchange standard might be, it is useful to look at what happens in the CMS world. A user can edit a document downstream with an application that did not create the original data and send it on to others who can continue the editing in other preferred applications. I think this is the future, as data flows more freely in and out of organizations. This is a common use scenario with products like Adobe TCS2 in contrast to the pain experienced by Trados and other TM tool users.
Another major benefit would be to liberate translators from involved file and format conversion manipulations that are often necessary to do translation work. Good interchange standards would allow translators to work in the environment that makes most sense to them and focus on translation and linguistic problem solving rather than data manipulation.
The closest we have to a standard in the translation world is TMX 1.4 which was developed around 1990, and with all due respect to the good folks at LISA and GALA, it is not an effective “standard” mostly because it is not standard, and mostly because some vendors choose to break away from the original specification. It does sort of work but is far from transparent and robust. SDL has its own variant and so do others, and data interchange and exchange is difficult without some kind of normalization and conversion effort even amongst SDL products! Besides, data exchange among tools usually means at least some loss in data value. Translation tools often trap your data in a silo because the vendors WANT to lock you in and make it painful for you to leave or use other tools. To be fair, this is the strategy that IBM, Microsoft and especially Apple follow too. Remember that a specification is not a standard – it has to actually be USED as a matter of course by MANY to really be a standard.
In a world with ever increasing amounts of data, the enhanced and filtered data is more important than the application that created it.
For most people it is becoming more and more about leveraging data and using it wherever it is useful. That is where the long-term value is. As tools evolve, users want to be able to take their data to new and better applications easily. I want my data to be in a state where it does not matter if I change my application tool, and all related in-line applications can easily access my data and further process it as needed. I want to be able to link my data up, down, backwards and forward in the business process chain I live in, and I want to be able to do this without asking the vendor(s). I care about my data, not the vendor or the application I am using. If better tools appear, I want to be able to leave with my data, intact and portable.
Meanwhile, others are figuring out what XML based standards can do. XBRL is set to become the standard way of recording, storing and transmitting business financial information. It can be used throughout the world, whatever the language of the country concerned, for a wide variety of business purposes. It will deliver major cost savings and gains in efficiency, improving processes in companies, governments and other organizations. Check out this link to see how powerful and revolutionary this already is and will continue to be.
In looking at the initial documentation produced by GALA, it appears that they are looking at such a huge scope, that one wonders what will be possible with the scarce resources available to execute. Hopefully this initial focus is narrowed down into something that matters to everybody and has the broadest support. There are at least two standards (that are well defined and used by many) that I think would really be helpful to make translation as a utility happen:
- A linguistic quality rating that is at least somewhat objective, can be easily reproduced/replicated and can be used to indicate the relative linguistic quality of BOTH human translated and various MT systems output. This would be especially useful to LSPs to understand post-editing cost structures and help establish more effective pricing models for this kind of work that is fair to both customers and translators.
- A robust, flexible yet simple data interchange standard that protects linguistic assets (TM, terminology, glossary) but can also easily be exported to affiliated processes (CMS, DMS, Web Content). The Linport and XLIFF 2.0 initiatives are two approaches to this. My first impression is that Linport though well-intentioned is a solution to yesterday’s problem, and XLIFF has more potential but needs all stakeholders to contribute minimum “must-have” requirements and enforcement rules. In the end somebody has to fund involvement. This is difficult in the language industry as multiple viewpoints need to be funded. Who will fund the translator requirements and viewpoint?
There are clearly some skeptics who see nothing of substance coming from these new standards initiatives. Ultan O’Broin points out how standards tend to stray, how expensive this is for users, and also raises some key questions about where compliance might best belong. However, I think it is worth at least trying to see if there is some potential to channel this new energy into something that might be useful for the industry. I too, see some things that need to be addressed to get forward momentum on standards initiatives which I suspect get stalled because the objectives are not that clear. There are three things at least, that need to be addressed.
1) Involve Content Creators – Most of the discussion has focused only on translation industry related players. Given the quality of the technology in the industry I think we really do need to get CMS, DBMS and Collaboration software/user perspectives on what really matters for textual data interchange if we actually are concerned with developing meaningful standards. We should find a way to get them involved, especially for data interchange standards, as they have much more experience on how to do this. I would expect that a standard that will matter in the future will understand and/or facilitate all of the following:
- Translate once and distribute widely (Print, PC, Tablet, and Mobile) which will require translated data to be abstracted from its format characteristics and presentation
- HTML5, CSS3 and Javascript to enable rapid data flow
- Rapid translation of huge volumes of small chunks of data.
2) Produce Standards That Make Sense to Translators – The whole point of standards is to ease the data flow from creation to transformation to consumption. Translators spend an inappropriately huge amount of time in format related issues, rather than with translation and linguistic issue management. Standards should make it easier for translators to ONLY deal with translation related problems and allow them to build linguistic assets that are independent of any single translation tool or product. A good standard should enhance translator productivity and let them focus on linguistic work rather than format conversions and communications to clarify basic job requirements.
3) Having Multiple Organizations Focused on the Same Standards is Unlikely to Succeed – By definition standards are most effective when there is only one. Most standards initiatives in the information technology arena involve a single body or entity that reflects the needs of many different kinds of users. It would probably be worth taking a close look at the history of some of these to understand how to do this better. The best standards initiatives have hard core techies who understand how to translate clearly specified business requirements into a fairly robust technical specification that could evolve but leaves some core always untouched.
The path to standards that will enable this is long and arduous, and it seems unlikely that this is possible to do without deep collaboration with real corporate sponsors, CMS/web developers and others outside the translation industry. This means that it is useful to include people who have no idea of what localization means, but understand the value of translation to build international market momentum. Hopefully the dialog continues to expand and helps to improve collaboration with managers, who are the primary drivers of international business initiatives.



