[Mimedefang] MIME::Entity not handling Charset => 'utf-8' correctly?
Steffen Kaiser
skmimedefang at smail.inf.fh-bonn-rhein-sieg.de
Thu Feb 21 05:36:41 EST 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Wed, 20 Feb 2013, Philip Prindeville wrote:
> Awesome, that worked!
>
> I'm wondering if in MIME::Body we should take:
>
> sub as_string {
> my $self = shift;
> my $str = '';
> my $fh = IO::File->new(\$str, '>:') or croak("Cannot open in-memory file: $!");
> $self->print($fh);
> close($fh);
> return $str;
> }
>
> and have:
>
> return Encode::decode($charset, $str);
I suppose that violates the internals of the MIME:: and Mail:: namespace
functions. They are tied together very closly.
Actually, I looked into a UTF8 MIMEtools a few years back to overcome
character set problems when storing header data into a postgres database.
I thought that everything the MIME:: functions should return would be in
Perl utf8, any character set information already decoded. Anything the
functions get passed into is Perl internal utf-8 as well. I think one
would need to rewrite the whole framework anew.
> instead, but I'm not sure how we'd retrieve $charset… It would need to be stored into MIME::Body which isn't currently the case.
Encode is a tricky module by its own, perldoc Encode:
"Handling Malformed Data
The optional CHECK argument tells Encode what to do when it
encounters malformed data. Without CHECK, Encode::FB_DEFAULT ( == 0 ) is
assumed.
As of version 2.12 Encode supports coderef values for CHECK. See
below.
NOTE: Not all encoding support this feature
Some encodings ignore CHECK argument. For example,
Encode::Unicode ignores CHECK and it always croaks on error.
"
Some encodings modify the $str argument to return the characters NOT
decoded. So you'd call Encode::decode($charset, "".$str) to enforce a copy
- - but have the performance penalty.
I also got weired results with decode('latin1', $str). I guess because of
"CAVEAT: When you run "$string = decode("utf8", $octets)", then $string
may not be equal to $octets. Though they both contain the same data, the
UTF8 flag for $string is on unless $octets entirely consists of ASCII data
(or EBCDIC on EBCDIC machines)."
When I pass results of decode('latin1', $str) to LDAP or Postgres, I
sometimes get errors.
I pass all strings through a function now, that looks terrible, but since
then Web, Postgres, LDAP and text files play together.
> On Feb 20, 2013, at 6:21 PM, David F. Skoll <dfs at roaringpenguin.com> wrote:
>> Try putting "use Encode;" near the top of your test file and replacing
>>
>> utf8::upgrade($string);
>>
>> with:
>>
>> $string = Encode::decode('utf-8', $string);
In fact, I found that utf8::upgrade() works for me in order to replace
decode('latin1'), which seems to "do nothing", causing other modules, like
Net::LDAP or DBD::Pg, to pass invalid UTF8 to the services.
- --
Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEVAwUBUSX4uZ8mjdm1m0FfAQJLPAf9EPC0E+gm5cJ4PvwxQHT2MzGoTmfLz1/C
nd7kihJnCqmWHQeYLhRlETqX4D1vG/ZGS6WbaP8Fybn400Tfb4JZBs9kZafS7dri
z3r6wk70Vd0By7GM5zIPlTbovU7HqiIFBBoHrdLkaSvzGq95ZfyH5u8aZjj39D85
2nDracTpxp9VF1rsgDi9I3z2lJpRjtJsufVUTvIhynOghQoAhw0S8FEAp7CrLnOX
UHsTTW1+CPhJA3zxY7jgGKV65smNYjtB4MZ1D0cxq2Y6Op7R2NmbRZrlXfFsfMBs
ah7y6nOmlOOpJ1oG760qZY31GjAcvuHgzcliV6rBXueMb1qSM3yHyw==
=A/mV
-----END PGP SIGNATURE-----
More information about the MIMEDefang
mailing list