The human evaluation analysis of MT runs from the IWSLT 2016 evaluation campaign is discussed in depth in this paper:
Neural versus phrase-based MT quality: An in-depth analysis on English-German and English-French.
by L. Bentivogli, A. Bisazza, M. Cettolo, M. Federico. Computer Speech & Language (2017), http://dx.doi.org/10.1016/j.csl.2017.11.004
Data and tools concerning experiments of the paper are available to download here.