vendredi 27 février 2015

How to remove one indirectly referenced image from a PDF and keep all others?


Vote count:

0




I found a bit of code over at http://ift.tt/1vG5FOz that I have copied here that is an example of how to remove all images (well really all objects). The author admits it is a hack, but it worked for his purposes. The code looks like this:



// Open the PDF
PdfDocument doc =
PdfReader.Open("inputfile.pdf", PdfDocumentOpenMode.Modify);

// Loop through every page
foreach (PdfPage page in doc.Pages)
{
// Get all the resources for every page
PdfDictionary resource = page.Elements.GetDictionary("/Resources");
if (resource != null)
{
// Get all the external objects
PdfDictionary objects = resource.Elements.GetDictionary("/XObject");
if (objects != null)
{
// Loop through every item in the external objects
ICollection items = objects.Elements.Values;
foreach (PdfItem item in items)
{
// Get the reference
PdfReference reference = item as PdfReference;
if (reference != null)
{
// Get the underlying values as Pdf Object
PdfDictionary xObject = reference.Value as PdfDictionary;
// Overwrite the data of the object with 1 byte
xObject.Stream.Value = new byte[0];
}
}
}
}
}

// Save the Pdf as another file
doc.Save("outputfile.pdf");


I modified it to attempt to only delete one object:



// Get all the resources for every page
PdfDictionary resource = pageToImport.Elements.GetDictionary("/Resources");if (resource != null)
{
// Get all the external objects
PdfDictionary objects = resource.Elements.GetDictionary("/XObject");
if (objects != null)
{
logInfo.AppendLine(
string.Format("found {0} XObjects in doc {1}, page {2}",
objects.Elements.Values.Count, docNum, i + 1));
// Loop through every item in the external objects
ICollection<PdfItem> pdfitems = objects.Elements.Values;
foreach (PdfItem pdfitem in pdfitems)
{
// Get the reference
PdfReference reference = pdfitem as PdfReference;
if (reference != null)
{
// Get the underlying values as Pdf Object
PdfDictionary xObject = reference.Value as PdfDictionary;

if (xObject.Elements["/Subtype"].ToString() == "/Image")
{
logInfo.AppendLine(
string.Format("Found image XObject with W: {0} H: {1} Len: {2}",
xObject.Elements["Width"], xObject.Elements["Height"],
xObject.Stream.Length));

var hexString = new StringBuilder(400);
for (int j = 0; j < Math.Min(200, xObject.Stream.Value.Length); ++j)
{
hexString.AppendFormat("{0:x2}", xObject.Stream.Value[j]);
}

logInfo.AppendLine("Image Data:" + hexString.ToString());

if (hexString.ToString() == myHexImageData
&& (xObject.Stream.Value.Length == 190656))
{
// Overwrite the data of the object with empty array
xObject.Stream.Value = new byte[0];
}
}
}
}
}
}


This did not work, but instead still deleted all references to the image objects so that none showed up. I am assuming that this essentially is corrupting the file even though the reader still can bring it up.


After searching extensively, I have not found a solution that parses a PDF, finds an image and removes it from the PDF both the references and the XObject.


Is there a way to find all references to an image? If so, how do you remove it, and then remove the image object and keep the integrity of the PDF?


Thanks!



asked 21 secs ago







How to remove one indirectly referenced image from a PDF and keep all others?

Aucun commentaire:

Enregistrer un commentaire