如何使.NET XML解析器不在XML中扩展参数实体?

当我尝试解析下面的xml(使用下面的代码)时,我不断收到&question;&signature;

扩展到

 Why couldn't I publish my books directly in standard SGML? — William Shakespeare. 

要么

  

由于我正在研究XML 3-way Merging算法,我想检索未扩展的&question;&signature;

我试过了:

  • 正常解析xml(这导致扩展的sgml标记)
  • 从xml开头删除Doctype会导致空的sgml标记)
  • 各种XmlReader DTD设置

我有以下XML文件:

 <!DOCTYPE sgml [      ]> &question;&signature; 

这是我尝试过的代码(多次尝试):

 using System.IO; using System.Xml; using System.Xml.Linq; using System.Reflection; class Program { static void Main(string[] args) { string xml = @"C:\src\Apps\Wit\MergingAlgorithmTest\MergingAlgorithmTest\Tests\XMLMerge-DocTypeExpansion\DocTypeExpansion.0.xml"; var xmlSettingsIgnore = new XmlReaderSettings { CheckCharacters = false, DtdProcessing = DtdProcessing.Ignore }; var xmlSettingsParse = new XmlReaderSettings { CheckCharacters = false, DtdProcessing = DtdProcessing.Parse }; using (var fs = File.Open(xml, FileMode.Open, FileAccess.Read)) { using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsIgnore)) { // Prevents Exception "Reference to undeclared entity 'question'" PropertyInfo propertyInfo = xmkReaderIgnore.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); propertyInfo.SetValue(xmkReaderIgnore, true, null); var doc = XDocument.Load(xmkReaderIgnore); Console.WriteLine(doc.Root.ToString()); // outputs  not &question;&signature; }// using xml ignore fs.Position = 0; using (var xmkReaderIgnore = XmlReader.Create(fs, xmlSettingsParse)) { var doc = XDocument.Load(xmkReaderIgnore); Console.WriteLine(doc.Root.ToString()); // outputs Why couldn't I publish my books directly in standard SGML? - William Shakespeare. not &question;&signature; } fs.Position = 0; string parseXmlString = String.Empty; using (StreamReader sr = new StreamReader(fs)) { for (int i = 0; i < 7; ++i) // Skip DocType sr.ReadLine(); parseXmlString = sr.ReadLine(); } using (XmlReader xmlReaderSkip = XmlReader.Create(new StringReader(parseXmlString),xmlSettingsParse)) { // Prevents Exception "Reference to undeclared entity 'question'" PropertyInfo propertyInfo = xmlReaderSkip.GetType().GetProperty("DisableUndeclaredEntityCheck", BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); propertyInfo.SetValue(xmlReaderSkip, true, null); var doc2 = XDocument.Load(xmlReaderSkip); // Empty sgml tag } }//using FileStream } } 

Linq-to-XML不支持实体引用的建模 – 它们会自动扩展为它们的值( 源1 , 源2 )。 根本没有为通用实体引用定义XObject子类。

但是,假设您的XML有效(即DTD中存在实体引用,它们在您的示例中执行),您可以使用旧的XML文档对象模型来解析XML并将XmlEntityReference节点插入到XML DOM树中,而不是扩展实体引用纯文本:

  using (var sr = new StreamReader(xml)) using (var xtr = new XmlTextReader(sr)) { xtr.EntityHandling = EntityHandling.ExpandCharEntities; // Expands character entities and returns general entities as System.Xml.XmlNodeType.EntityReference var oldDoc = new XmlDocument(); oldDoc.Load(xtr); Debug.WriteLine(oldDoc.DocumentElement.OuterXml); // Outputs &question;&signature; Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&question;")); // Verify that the entity references are still there - no assert Debug.Assert(oldDoc.DocumentElement.OuterXml.Contains("&signature;")); // Verify that the entity references are still there - no assert } 

每个XmlEntityReference将具有一般实体的文本值。 如果一般实体引用其他通用实体,就像你的情况一样,相应的内部XmlEntityReference将嵌套在外部的ChildNodes中。 然后,您可以使用旧的XmlDocument API比较旧XML和新XML。

请注意,您还需要使用旧的XmlTextReader并设置EntityHandling = EntityHandling.ExpandCharEntities