View Full Version : Ampersand encoding broken.
Kier
Tue 30th Jul '02, 3:22pm
Encoded ampersands are not being handled properly. I don't know when this happened, but I suspect it is something to do with htmlspecialchars_uni() vs htmlspecialchars().
In the template editor, and the post text editor you can enter an encoded ampersand - & and when you view the same thing again it will be reduced to a single &.
This badly breaks XHTML compatability.
Kier
Tue 30th Jul '02, 3:39pm
In fact, the same is true for and < and > and « and ©... all of which are very important for template layout etc.
Allowing all these exceptions (and there are a lot more) is going to make htmlspecialchars_uni() a very big function indeed. But it's absolutely VITAL that these exceptions are made, or the templates are very soon going to be corrupted, as each time I edit a template with a do=somethng&id=9 URI, it gets fudged by htmlspecialchars_uni().
Kier
Tue 30th Jul '02, 3:52pm
Can we change this line in htmlspecialchars_uni() from this:
$text = preg_replace('/&(?![a-z0-9#]+;)/si', '&', $text); to this
$text = preg_replace('/&(?![#0-9]+;)/s', '&', $text);I think that would solve the problem, especially as html character entities are &[a-z]+; whereas unicode characters are &#[0-9]+;
Kier
Tue 30th Jul '02, 4:19pm
ARSE.
That change severely breaks the template editor and lots of other things that expect full htmlspecialchars().
So we need to have a serious discussion about the use of htmlspecialchars_uni() vs htmlspecialchars().
Dev meeting tomorrow? (wednesday)
Mike Sullivan
Tue 30th Jul '02, 6:06pm
I believe I've fixed it now by replacing the function with:
// ######################## Start htmlspecialchars_uni ###################
function htmlspecialchars_uni($text) {
// this is a version of htmlspecialchars that still allows unicode to function correctly
// this is breaking more things than it fixes, so I'm disabling it for now - Kier
//return htmlspecialchars($text);
$text = preg_replace('/&(?!#[0-9]+;)/si', '&amp;', $text);
$text = str_replace('<', '<', $text);
$text = str_replace('>', '>', $text);
$text = str_replace('"', '&quot;', $text);
return $text;
} The regex just needed to be moved up. (Otherwise the entities were getting double replaced.) This should be committed by the time you read this. Just need to do a bit more testing on it first.
Then, to fix the templates, this search/replace regex should work:
Find: &(?![#a-z0-9]+;)
Replace: &amp;
(Hopefully this gets displayed correctly.)
Mike Sullivan
Tue 30th Jul '02, 6:29pm
The updated version is committed. Various editors worked correctly with it, so I believe this bug has now been quashed.
I haven't run the regex yet, but that should work. Basically, it will replace and & with &amp; if it's not followed by 1 or more #, a-z, or 0-9 characters then a semi-colon.
Powered by vBulletin™ Version 4.0.0 Beta 4 Copyright © 2009 vBulletin Solutions, Inc. All rights