Discussion:
validator: email-validation not accepting german "umlaute"
Sven Schliesing
2004-04-20 09:40:38 UTC
Permalink
As for version 1.1.1 the Jakarta Commons-Validator is not accepting
german "umlaute" as parts of valid domain names like müller.de or münchen.de

Is this a known issue in validator or might this be a setting in Struts?

Thanks!

Sven Schliesing

PS: No problems with "umlaute" and struts in other parts.
Michael Davey
2004-04-20 11:11:28 UTC
Permalink
Post by Sven Schliesing
As for version 1.1.1 the Jakarta Commons-Validator is not accepting
german "umlaute" as parts of valid domain names like müller.de or münchen.de
Is this a known issue in validator or might this be a setting in Struts?
Valid domain names must contain only the characters a-z, A-Z, 0-9, "."
and "-". They must start with a letter and end with
a letter or digit. The "." symbol is used exclusively to seperate
subdomains (see RFC 1035 section 2.3.1
<http://www.ietf.org/rfc/rfc1035.txt>).

To support internationalised domain names (IDN), both the client and the
server must be punycode aware. Punycode is a fairly new standards
proposal (rfc3492) that encodes non-ascii characters into an ascii
string, prefixed with "xn--". For instance, müller.de is encoded as
xn--mller-kva.de.

<http://www.faqs.org/rfcs/rfc3492.html>
<http://www.afilias.info/cgi-bin/convert_punycode.cgi>

Commons-Validator would need to be made Punycode-aware to achieve what
you need, or alternatively, you could do the punycode translation in
your own code, before passing the string to validator.
--
Michael
Sven Schliesing
2004-04-20 11:41:30 UTC
Permalink
I wrote a test to make sure where the problem is:

public class ValidatorTest extends TestCase {
public void testEmail() {
EmailValidator emailValidator = EmailValidator.getInstance();
boolean result = emailValidator.isValid("***@müller.de");
assertTrue("invalid email", result);
}
}

Runs with success. So the address "***@müller.de" is validated by the
EmailValidator with success.

Seems that the problem is with struts. I also explicitly set the charset
in the struts-config:

<controller contentType="text/html;charset=iso-8859-1"
processorClass="org.apache.struts.action.RequestProcessor" />

No change.

Any other ideas?
Post by Michael Davey
Valid domain names must contain only the characters a-z, A-Z, 0-9, "."
and "-". They must start with a letter and end with
a letter or digit. The "." symbol is used exclusively to seperate
subdomains (see RFC 1035 section 2.3.1
<http://www.ietf.org/rfc/rfc1035.txt>).
To support internationalised domain names (IDN), both the client and the
server must be punycode aware. Punycode is a fairly new standards
proposal (rfc3492) that encodes non-ascii characters into an ascii
string, prefixed with "xn--". For instance, müller.de is encoded as
xn--mller-kva.de.
<http://www.faqs.org/rfcs/rfc3492.html>
<http://www.afilias.info/cgi-bin/convert_punycode.cgi>
Commons-Validator would need to be made Punycode-aware to achieve what
you need, or alternatively, you could do the punycode translation in
your own code, before passing the string to validator.
Sven Schliesing
2004-04-20 11:56:53 UTC
Permalink
Post by Sven Schliesing
EmailValidator with success.
never mind, wrong charset in eclipse.
thanks anyway


Sven
Michael Davey
2004-04-20 13:33:35 UTC
Permalink
Post by Sven Schliesing
public class ValidatorTest extends TestCase {
public void testEmail() {
EmailValidator emailValidator = EmailValidator.getInstance();
assertTrue("invalid email", result);
}
}
EmailValidator with success.
Seems that the problem is with struts. I also explicitly set the
<controller contentType="text/html;charset=iso-8859-1"
processorClass="org.apache.struts.action.RequestProcessor" />
No change.
Any other ideas?
So, there are at least three things going on here:

1. The testcase above should fail in validator, but it doesn't because
the validation check isn't good enough.
2. validator doesn't support punycode and doesn't support the
quoted-printable unicode encoding mechanism used in email addresses.
3. The problem you describe in your emails.

[1] could be fixed easily enough. [2] could be fixed by enhancing
validator. Your testcase shows that the problem isn't with validator,
so [1] and [2] are not really of consequence to you right now, but they
have got my interest. After re-reading your original mail and your
latest mail together, I don't understand exactly what it is you are
trying to achieve - could you demonstrate with some code or describe how
you would demonstrate the problem to me?

If you are now fairly sure that the problem lies within Struts, it may
be beneficial to post to the struts mailing list and copy me personally.

Cheers,
--
Michael
Robert Leland
2004-04-21 02:39:33 UTC
Permalink
Post by Michael Davey
1. The testcase above should fail in validator, but it doesn't
because the validation check isn't good enough.
2. validator doesn't support punycode and doesn't support the
quoted-printable unicode encoding mechanism used in email addresses.
3. The problem you describe in your emails.
[1] could be fixed easily enough. [2] could be fixed by enhancing
validator. Your testcase shows that the problem isn't with validator,
so [1] and [2] are not really of consequence to you right now, but
they have got my interest. After re-reading your original mail and
your latest mail together, I don't understand exactly what it is you
are trying to achieve - could you demonstrate with some code or
describe how you would demonstrate the problem to me?
If you look at the EmailValidator.java stripComments() there was an
attempt to strip email comments based on a very succesfull perl
script(Mail::RFC822::Address),
unfortunately it's only a partial translation. If you are familar with
perls ~= and can translate that to Java using ORO and javascript then
you'll have part of solution [2].
Post by Michael Davey
If you are now fairly sure that the problem lies within Struts, it may
be beneficial to post to the struts mailing list and copy me personally.
Cheers,
-Rob

Loading...