使用itextsharp在c#中提取阿拉伯语文本

在此处输入图像描述 我有这个代码,我用它来获取PDF的文本。 这对于英文PDF非常有用,但是当我试图用阿拉伯语提取文本时,它会向我显示这样的内容。

“)+ n 9 n <+,+)+ $#$ + $ F%9&。<$:;”

using (PdfReader reader = new PdfReader(path)) { ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); String text = ""; for (int i = 1; i <= reader.NumberOfPages; i++) { text = PdfTextExtractor.GetTextFromPage(reader, i,strategy); } } 

我不得不改变这样的策略

 var t = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy()); var te = Convert(t); 

这个function可以反转阿拉伯语单词并保留英语

  private string Convert(string source) { string arabicWord = string.Empty; StringBuilder sbDestination = new StringBuilder(); foreach (var ch in source) { if (IsArabic(ch)) arabicWord += ch; else { if (arabicWord != string.Empty) sbDestination.Append(Reverse(arabicWord)); sbDestination.Append(ch); arabicWord = string.Empty; } } // if the last word was arabic if (arabicWord != string.Empty) sbDestination.Append(Reverse(arabicWord)); return sbDestination.ToString(); } private bool IsArabic(char character) { if (character >= 0x600 && character <= 0x6ff) return true; if (character >= 0x750 && character <= 0x77f) return true; if (character >= 0xfb50 && character <= 0xfc3f) return true; if (character >= 0xfe70 && character <= 0xfefc) return true; return false; } // Reverse the characters of string string Reverse(string source) { return new string(source.ToCharArray().Reverse().ToArray()); }