使用itextsharp(或任何c#pdf库),如何打开PDF,替换一些文本,然后再次保存?

使用itextsharp(或任何c#pdf库),我需要打开一个PDF,用实际值替换一些占位符文本,并将其作为byte []返回。

有人可以建议怎么做吗? 我已经看了一下itext文档,无法弄清楚从哪里开始。 到目前为止,我仍然坚持如何将源PDF文件从PDFReader获取到Document对象,我认为我可能以错误的方式接近这个。

非常感谢

最后,我使用PDFescape打开我现有的PDF文件,并将一些表单字段放在我需要放置字段的位置,然后再次保存以创建我的PDF文件。

http://www.pdfescape.com

然后我找到了关于如何替换表单字段的博客文章:

http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/

一切都很好! 这是代码:

public static byte[] Generate() { var templatePath = HttpContext.Current.Server.MapPath("~/my_template.pdf"); // Based on: // http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/ var reader = new PdfReader(templatePath); var outStream = new MemoryStream(); var stamper = new PdfStamper(reader, outStream); var form = stamper.AcroFields; var fieldKeys = form.Fields.Keys; foreach (string fieldKey in fieldKeys) { if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldA") form.SetField(fieldKey, "1234"); if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldB") form.SetField(fieldKey, "5678"); } // "Flatten" the form so it wont be editable/usable anymore stamper.FormFlattening = true; stamper.Close(); reader.Close(); return outStream.ToArray(); } 

不幸的是,我正在寻找类似的东西,无法弄明白。 以下就我所知,也许你可以用这个作为起点。 问题是PDF实际上并没有保存文本,而是使用查找表和其他一些神秘的魔法。 这个方法读取页面的字节值并尝试转换为字符串,但据我所知,它只能做英文并且错过了一些特殊字符,所以我放弃了我的项目并继续前进。

 string contents = string.Empty(); Document doc = new Document(); PdfReader reader = new PdfReader("pathToPdf.pdf"); using (MemoryStream memoryStream = new MemoryStream()) { PdfWriter writer = PdfWriter.GetInstance(doc, memoryStream); doc.Open(); PdfContentByte cb = writer.DirectContent; for (int p = 1; p <= reader.NumberOfPages; p++) { // add page from reader doc.SetPageSize(reader.GetPageSize(p)); doc.NewPage(); // pickup here something like this: byte[] bt = reader.GetPageContent(p); contents = ExtractTextFromPDFBytes(bt); if (contents.IndexOf("something")!=-1) { // make your own pdf page and add to cb (contentbyte) } else { PdfImportedPage page = writer.GetImportedPage(reader, p); int rot = reader.GetPageRotation(p); if (rot == 90 || rot == 270) cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height); else cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0); } } reader.Close(); doc.Close(); File.WriteAllBytes("pathToOutputOrSamePathToOverwrite.pdf", memoryStream.ToArray()); 

这取自本网站 。

 private string ExtractTextFromPDFBytes(byte[] input) { if (input == null || input.Length == 0) return ""; try { string resultString = ""; // Flag showing if we are we currently inside a text object bool inTextObject = false; // Flag showing if the next character is literal // eg '\\' to get a '\' character or '\(' to get '(' bool nextLiteral = false; // () Bracket nesting level. Text appears inside () int bracketDepth = 0; // Keep previous chars to get extract numbers etc.: char[] previousCharacters = new char[_numberOfCharsToKeep]; for (int j = 0; j < _numberOfCharsToKeep; j++) previousCharacters[j] = ' '; for (int i = 0; i < input.Length; i++) { char c = (char)input[i]; if (inTextObject) { // Position the text if (bracketDepth == 0) { if (CheckToken(new string[] { "TD", "Td" }, previousCharacters)) { resultString += "\n\r"; } else { if (CheckToken(new string[] { "'", "T*", "\"" }, previousCharacters)) { resultString += "\n"; } else { if (CheckToken(new string[] { "Tj" }, previousCharacters)) { resultString += " "; } } } } // End of a text object, also go to a new line. if (bracketDepth == 0 && CheckToken(new string[] { "ET" }, previousCharacters)) { inTextObject = false; resultString += " "; } else { // Start outputting text if ((c == '(') && (bracketDepth == 0) && (!nextLiteral)) { bracketDepth = 1; } else { // Stop outputting text if ((c == ')') && (bracketDepth == 1) && (!nextLiteral)) { bracketDepth = 0; } else { // Just a normal text character: if (bracketDepth == 1) { // Only print out next character no matter what. // Do not interpret. if (c == '\\' && !nextLiteral) { nextLiteral = true; } else { if (((c >= ' ') && (c <= '~')) || ((c >= 128) && (c < 255))) { resultString += c.ToString(); } nextLiteral = false; } } } } } } // Store the recent characters for // when we have to go back for a checking for (int j = 0; j < _numberOfCharsToKeep - 1; j++) { previousCharacters[j] = previousCharacters[j + 1]; } previousCharacters[_numberOfCharsToKeep - 1] = c; // Start of a text object if (!inTextObject && CheckToken(new string[] { "BT" }, previousCharacters)) { inTextObject = true; } } return resultString; } catch { return ""; } } private bool CheckToken(string[] tokens, char[] recent) { foreach (string token in tokens) { if ((recent[_numberOfCharsToKeep - 3] == token[0]) && (recent[_numberOfCharsToKeep - 2] == token[1]) && ((recent[_numberOfCharsToKeep - 1] == ' ') || (recent[_numberOfCharsToKeep - 1] == 0x0d) || (recent[_numberOfCharsToKeep - 1] == 0x0a)) && ((recent[_numberOfCharsToKeep - 4] == ' ') || (recent[_numberOfCharsToKeep - 4] == 0x0d) || (recent[_numberOfCharsToKeep - 4] == 0x0a))) { return true; } } return false; }