3D grphique: Encoding of PDF text string

lundi 6 avril 2015

Encoding of PDF text string

Vote count:

0

I am working on Parser for PDF (text extraction), When page contents are Flate Decoded (zlib compression), My code is able to decompress content streams then I have output (stream object) something like below


BT
56.8 721.3 Td 
/F2 12 Tf
[<01>2<0203>2<04>-10<0503>2<04>-2<0506070809>2<0A>1<0B>]TJ
ET

I am interested in the string array (operand of TJ), it seems like there are multiple hex encoded strings contained in this array but corresponding hex values do not make senses instead it appears a sequence like 010203... sort of lz77 compression.

Do PDFs have multiple levels of compression? how can I get plain text from above string array?

3D grphique

lundi 6 avril 2015

Encoding of PDF text string

Vote count:

0

Aucun commentaire:

Enregistrer un commentaire

lundi 6 avril 2015

Encoding of PDF text string

Vote count: 0

Aucun commentaire:

Enregistrer un commentaire

Vote count:

0