Could you please start by telling us what KantanMT is?
KantanMT is a cloud-based MT platform. It helps our clients translate more words per day, improving their project turn-around times and helping to improve their margins. Many of our clients are mid-sized LSPs that wish to integrate MT into their localization workflows and who had previously found this to be an expensive, complex and time-consuming undertaking. Many of our clients can build their first MT engine within hours of registering on the KantanMT.com platform. This means they can experience the benefits of MT within their organizations in a very short time-period. This is important to our clients as they work in a fast paced, highly competitive industry.
Can you tell us why a potential MT user should choose KantanMT?
The vast majority of our clients are mid-sized LSPs that understand how MT can benefit their business, however, cannot afford the traditional costs, understand the complexity or technical know-how to deploy and manage a Statistical MT deployment. Most MT solutions are sold using expensive consultancy led sales organizations. This leads to lengthy deployment times and costly implementations. KantanMT appeals to mid-sized LSPs that want to access MT without the cost, complexity and lengthy implementation times of traditional vendors. Our clients only need a browser to access the KantanMT.com platform. There is no special hardware, or special software needed, only a simply account name and password. There is no costly upfront fees, just a simply monthly subscription. It’s a simple pay-as-you-go model which appeals to most mid-sized LSPs.We’ve had enormous success so far with this approach as we currently have over 1000+ members registered and experimenting with MT since the BETA release in September 2012.
And will KantanMT remain free?
No. We plan to launch a commercial version of the platform starting in Q2 2013. Our members will pay a small monthly fee to access their account.
KantanMT is advertised as a high-quality MT. Is then quality a selling proposition also for KantanMTas for most LSPs?
KantanMT.com allows our members to build domain specific MT engines for their clients. The key to building a domain specific KantanMT engine is to ensure that the training data is of high quality within the clients’ domain. An excellent starting point for this are Translation Memory files, something of which our members have an abundance of. However, the real gain is that domain specific engines generally produce higher quality translations which results in less post-editing effort. This gain in productivity is the key advantage many of our members are looking for. They want to process more words per day and KantanMT.comcan help them achieve this.
KantanMT users are admittedly localization service providers and companies that want to use machine translation to increase productivity, improve quality and earn more money. Do you think KantanMT could prove suitable and useful also for freelancers? If so, how?
KantanMT.com has been primarily designed with small to medium sized LSP’s in mind, however that is not to say that freelance translators would not experience value from using the product.
The raw materials for building high quality KantanMT engines are Translation Memories. To help freelance translators that may not have sufficient Translation Memory training data, KantanMT has recently introduced stock engines. These can be used as foundation training data for engines and then supplemented with client specific TMs. This will help freelance translators, who may feel they have insufficient training data, build their own custom KantanMT engines and integrate MT into their daily localization workflow.
The EU Commission is adopting SMT to speed up translations. The EU Commission has long used Systran, until the notorious licensing issues arose. Do you think that the switch to SMT is a result of this litigation or is it due to technological (and financial) choices?
The adoption of SMT by the EU Commission been very widely spoken about the last few weeks. It is my opinion that, like many other organizations, the EU chose to introduce SMT in a bid to reduce costs and increase productivity. As SMT is becoming more widely adopted, it seems reasonable that this trend should spread across many types of organizations looking to increase productivity in an increasingly Globalized environment.
And now, a few technical questions. Why using TMX translation memories instead of corpora? How are these files validated and cleaned to be used as training data?
The KantanMT.com platform supports both TMX and Corpora as training data. All training data is cleansed using a 12-step data cleansing process. This ensures that only highly cleansed training data is used to build a KantanMT engine. By focusing on clean-data, rather than big-data, a KantanMT engine will produce higher quality translations, faster than traditional methods.
A statement on KantanMT website puzzled us a bit: BLEU score is a measure of the quality of your KantanMT engine. As far as we know, the BLEU metric scores the proximity of a machine translation output to that of a human translator could supposedly produce. Human translation is not necessarily always of excellent quality.
BLEU (Bilingual Evaluation Understudy) is internationally recognised and is the most widely used automated measure of quality for machine generated translations. We use this in combination with F-Measure and TER (Translation Error Rate) to automatically score translations generated from a KantanMT engine.
The reference data used to generate a BLEU score is taken from the original training data supplied by our members. Therefore the accuracy and quality of KantanMT BLEU score is relative to the quality of the original training material.
You have long worked for several Irish localization companies or in the localization departments of international companies: what’s your view of translation and the translation industry?
The localization industry has a number of significant challenges that it must overcome in the next decade to ensure mid-sized LSPs can continue to thrive and be profitable.
There are only two certainties; price compression and margin erosion. This is coupled with a client base that is constantly striving for faster project turn-around times. These inwards challenges can only be addressed by streamlining their supply-chains, reducing or eliminating production bottlenecks, provision of new services and introducing translation automation.
What about confidentiality? Many users are reluctant to ‘donate’ their own linguistic data even to train and maintain their ‘own’ machine translation engine. What use KantanMT will be making of users’ data? We couldn’t find any statement that KantanMT will expressly use this data only to build and train the user’s engine.
How will KantanMT ensure that this data will not be used for any other of their clients?
KantanMT.com promises its members that none of their data will be re-published, re-tasked or re-purposed. All our members’ accounts are username and password protected and all their data is stored, fully encrypted using Amazon’s cloud services.
What is, in your opinion, the pricing model that could win this customer?
Later this year KantanMT will be introducing a monthly subscription payment option for our members. We feel that this fresh approach to pricing will appeal to our members and hope that as we simplify the MT buying process more organizations will experiment with our innovative new platform and signup as members.