xerces-c-3.0.1/2.7.0 can not deal with high GBK file
There is a bug inside util/Transcoders/IconvGNU/IconvGNUTransService.cpp.
1027 for (size_t cnt = 0; cnt < maxChars && srcLen; cnt++) { 1028 size_t rc = iconvFrom(startSrc, &srcLen, &orgTarget, uChSize()); 1029 if (rc == (size_t)-1) { 1030 if (errno != E2BIG || prevSrcLen == srcLen) { 1031 ThrowXMLwithMemMgr(TranscodingException, XMLExcepts::Trans_BadSrcSeq, getMemoryManager()); 1032 } 1033 } 1034 charSizes[cnt] = prevSrcLen - srcLen; 1035 prevSrcLen = srcLen; 1036 bytesEaten += charSizes[cnt]; 1037 startSrc = endSrc - srcLen; 1038 toReturn++; 1039 }
If a huge file is passed to xerces, partial text will be passed to IconvGNUTranscoder, and an incomplete multibyte sequence will been encountered in the input. errno EINVAL is for that. But the errno of EINVAL is unchecked.
Regards, Kirby Zhou from SOHU-RD +86-10-6272-8261
Thanks for the bug report. will see what we can do with it.
On Fri, Jul 23, 2010 at 01:41, Kirby Zhou kirbyzhou@sohu-rd.com wrote:
xerces-c-3.0.1/2.7.0 can not deal with high GBK file
There is a bug inside util/Transcoders/IconvGNU/IconvGNUTransService.cpp.
1027 for (size_t cnt = 0; cnt < maxChars && srcLen; cnt++) { 1028 size_t rc = iconvFrom(startSrc, &srcLen, &orgTarget, uChSize()); 1029 if (rc == (size_t)-1) { 1030 if (errno != E2BIG || prevSrcLen == srcLen) { 1031 ThrowXMLwithMemMgr(TranscodingException, XMLExcepts::Trans_BadSrcSeq, getMemoryManager()); 1032 } 1033 } 1034 charSizes[cnt] = prevSrcLen - srcLen; 1035 prevSrcLen = srcLen; 1036 bytesEaten += charSizes[cnt]; 1037 startSrc = endSrc - srcLen; 1038 toReturn++; 1039 }
If a huge file is passed to xerces, partial text will be passed to IconvGNUTranscoder, and an incomplete multibyte sequence will been encountered in the input. errno EINVAL is for that. But the errno of EINVAL is unchecked.
Regards, Kirby Zhou from SOHU-RD +86-10-6272-8261
epel-devel-list mailing list epel-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/epel-devel-list
:-)
And if you decide to take ICU instead of IconvGNU in the xerces, There seems another bug:
ICUTranscoder::transcodeFrom
495 UErrorCode err = U_ZERO_ERROR; 496 ucnv_toUnicode 497 ( 498 fConverter 499 , &startTarget 500 , startTarget + maxChars 501 , (const char**)&startSrc 502 , (const char*)endSrc 503 , (fFixed ? 0 : (int32_t*)fSrcOffsets) 504 , false 505 , &err 506 );
There seems need a mutex to protect fConverter. ICULCPTranscoder::calcRequiredSize called ' XMLMutexLock lockConverter(&fMutex); ' to do it. I do not known why the coder of xerces do not do the same thing here.
Regards, Kirby Zhou from SOHU-RD +86-10-6272-8261
-----Original Message----- From: epel-devel-list-bounces@redhat.com [mailto:epel-devel-list-bounces@redhat.com] On Behalf Of Stephen John Smoogen Sent: Saturday, July 24, 2010 1:21 AM To: EPEL development disccusion Subject: Re: xerces-c can not deal with high GBK file
Thanks for the bug report. will see what we can do with it.
On Fri, Jul 23, 2010 at 01:41, Kirby Zhou kirbyzhou@sohu-rd.com wrote:
xerces-c-3.0.1/2.7.0 can not deal with high GBK file
There is a bug inside util/Transcoders/IconvGNU/IconvGNUTransService.cpp.
1027 for (size_t cnt = 0; cnt < maxChars && srcLen; cnt++) { 1028 size_t rc = iconvFrom(startSrc, &srcLen, &orgTarget, uChSize()); 1029 if (rc == (size_t)-1) { 1030 if (errno != E2BIG || prevSrcLen == srcLen) { 1031 ThrowXMLwithMemMgr(TranscodingException, XMLExcepts::Trans_BadSrcSeq, getMemoryManager()); 1032 } 1033 } 1034 charSizes[cnt] = prevSrcLen - srcLen; 1035 prevSrcLen = srcLen; 1036 bytesEaten += charSizes[cnt]; 1037 startSrc = endSrc - srcLen; 1038 toReturn++; 1039 }
If a huge file is passed to xerces, partial text will be passed to IconvGNUTranscoder, and an incomplete multibyte sequence will been encountered in the input. errno EINVAL is for that. But the errno of EINVAL is unchecked.
Regards, Kirby Zhou from SOHU-RD +86-10-6272-8261
epel-devel-list mailing list epel-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/epel-devel-list
epel-devel@lists.fedoraproject.org