使用iTextSharp阅读PDF文件附件注释

我有以下问题。 我有一个PDF文件,其中包含一个XML文件作为注释。 不是嵌入文件而是注释。 现在我尝试使用以下链接中的代码阅读它:

iTextSharp – 如何打开/读取/提取文件附件?

它适用于嵌入式文件,但不适用于文件attachemts作为注释。

我谷歌从PDF中提取注释并找到以下链接: 使用iText阅读PDF注释

所以注释类型是“文件附件注释”

有人能展示一个有效的例子吗?

在此先感谢您的帮助

在关于iText和iTextSharp的问题中经常会出现,首先应该查看itextpdf.com上的关键字列表 。 在这里您可以找到文件附件,从iText in Action – 第2版中 提取引用两个Java样本的附件 :

  • part4.chapter16。 KubrickDvds
  • part4.chapter16。 KubrickDocumentary

类似的Webified iTextSharp示例是

  • KubrickDvds.cs
  • KubrickDocumentary.cs

KubrickDvds包含以下方法extractAttachments / ExtractAttachments来提取文件附件注释:

Java的:

 /** * Extracts attachments from an existing PDF. * @param src the path to the existing PDF */ public void extractAttachments(String src) throws IOException { PdfReader reader = new PdfReader(src); PdfArray array; PdfDictionary annot; PdfDictionary fs; PdfDictionary refs; for (int i = 1; i <= reader.getNumberOfPages(); i++) { array = reader.getPageN(i).getAsArray(PdfName.ANNOTS); if (array == null) continue; for (int j = 0; j < array.size(); j++) { annot = array.getAsDict(j); if (PdfName.FILEATTACHMENT.equals(annot.getAsName(PdfName.SUBTYPE))) { fs = annot.getAsDict(PdfName.FS); refs = fs.getAsDict(PdfName.EF); for (PdfName name : refs.getKeys()) { FileOutputStream fos = new FileOutputStream(String.format(PATH, fs.getAsString(name).toString())); fos.write(PdfReader.getStreamBytes((PRStream)refs.getAsStream(name))); fos.flush(); fos.close(); } } } } reader.close(); } 

C#:

 /** * Extracts attachments from an existing PDF. * @param src the path to the existing PDF * @param zip the ZipFile object to add the extracted images */ public void ExtractAttachments(byte[] src, ZipFile zip) { PdfReader reader = new PdfReader(src); for (int i = 1; i <= reader.NumberOfPages; i++) { PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS); if (array == null) continue; for (int j = 0; j < array.Size; j++) { PdfDictionary annot = array.GetAsDict(j); if (PdfName.FILEATTACHMENT.Equals( annot.GetAsName(PdfName.SUBTYPE))) { PdfDictionary fs = annot.GetAsDict(PdfName.FS); PdfDictionary refs = fs.GetAsDict(PdfName.EF); foreach (PdfName name in refs.Keys) { zip.AddEntry( fs.GetAsString(name).ToString(), PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name)) ); } } } } } 

KubrickDocumentary包含以下方法extractDocLevelAttachments / ExtractDocLevelAttachments来提取文档级附件:

Java的:

 /** * Extracts document level attachments * @param filename a file from which document level attachments will be extracted * @throws IOException */ public void extractDocLevelAttachments(String filename) throws IOException { PdfReader reader = new PdfReader(filename); PdfDictionary root = reader.getCatalog(); PdfDictionary documentnames = root.getAsDict(PdfName.NAMES); PdfDictionary embeddedfiles = documentnames.getAsDict(PdfName.EMBEDDEDFILES); PdfArray filespecs = embeddedfiles.getAsArray(PdfName.NAMES); PdfDictionary filespec; PdfDictionary refs; FileOutputStream fos; PRStream stream; for (int i = 0; i < filespecs.size(); ) { filespecs.getAsString(i++); filespec = filespecs.getAsDict(i++); refs = filespec.getAsDict(PdfName.EF); for (PdfName key : refs.getKeys()) { fos = new FileOutputStream(String.format(PATH, filespec.getAsString(key).toString())); stream = (PRStream) PdfReader.getPdfObject(refs.getAsIndirectObject(key)); fos.write(PdfReader.getStreamBytes(stream)); fos.flush(); fos.close(); } } reader.close(); } 

C#:

 /** * Extracts document level attachments * @param PDF from which document level attachments will be extracted * @param zip the ZipFile object to add the extracted images */ public void ExtractDocLevelAttachments(byte[] pdf, ZipFile zip) { PdfReader reader = new PdfReader(pdf); PdfDictionary root = reader.Catalog; PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES); PdfDictionary embeddedfiles = documentnames.GetAsDict(PdfName.EMBEDDEDFILES); PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES); for (int i = 0; i < filespecs.Size; ) { filespecs.GetAsString(i++); PdfDictionary filespec = filespecs.GetAsDict(i++); PdfDictionary refs = filespec.GetAsDict(PdfName.EF); foreach (PdfName key in refs.Keys) { PRStream stream = (PRStream) PdfReader.GetPdfObject( refs.GetAsIndirectObject(key) ); zip.AddEntry( filespec.GetAsString(key).ToString(), PdfReader.GetStreamBytes(stream) ); } } } 

(出于某种原因,c#示例将提取的文件放在一些ZIP文件中,而Java版本将它们放入文件系统中......哦......)