View Full Version : XML elements 'escaped' after conversion from phpBB
colins
Thu 20th Oct '05, 2:18pm
Hi,
We've used impex to convert from phpBB 2.0.17 to vBulletin 3.0.9. It has mostly gone smoothly, except that existing posts which had BBCode 'code' blocks with XML now show the XML elements escaped in the form
<property name="transactionManager">
<ref bean="transactionManager"/>
</property>
instead of
<property name="transactionManager">
<ref bean="transactionManager"/>
</property>
The same thing applies to XML just put into the main body of the post (outside of a code block). Before it would show fine, but the converted message in vB shows it escaped.
We are going to try another conversion tonight from phpBB to vB 3.5, which is what we want to run anyway, but I'd like to get a heads up that this should normally work ok, and any suggestion if this is a known issue.
Regards,
Colin
Steve Machol
Fri 21st Oct '05, 3:53am
You may have to run the Impex tools/cleaner.php script to fix these.
colins
Fri 21st Oct '05, 12:53pm
I have confirmed that even importing to vB 3.5 the < chars come in as <. So this is expected for this scenario (importing from phpBB), with cleaner.php needing to be run to convert the escaped form to the non-escaped form?
colins
Fri 21st Oct '05, 1:53pm
My worry with just blindly using cleaner.php to replace the following entities:
< < less than > > greater than & & ampersand ' ' apostrophe " " quotation mark
is that since these are XML code samples, I may in fact accidentally replace some escaped characters that were already in escaped form in the original XML documents in phpBB.
The conversion process is the only clean place to actually handle this; I would think the importer for phpBB would be able to rely on the fact that anything which needs to be escaped is already escaped. If the importer is adding it's own escaped characters, there is no way to tell those ones apart from any that are originally escaped...
Steve Machol
Fri 21st Oct '05, 3:51pm
I'm moving this to the Import forum so Jerry will see it.
colins
Fri 21st Oct '05, 4:54pm
Some more info...
It would appear that just < gets escaped, but not other chars like an & or quotes. So
<bean id="emailAdvice"
class="org.springframework.samples.jpetstore.domain.logic .SendOrderConfirmationEmailAdvice">
<property name="mailSender" ref="mailSender"/>
</bean>
<!-- try to do a bean factory ref, which needs an 'and' sign
also use an embedded escaped quote -->
<bean id="myBean" class="...">
<property name="myProp" ref="&myFactoryBean"/>
<property name="myProp2" value"some text "quoted text" some more"/>
</bean>
becomes
<bean id="emailAdvice"
class="org.springframework.samples.jpetstore.domai n.logic.SendOrderConfirmationEmailAdvice">
<property name="mailSender" ref="mailSender"/>
</bean>
<!-- try to do a bean factory ref, which needs an 'and' sign
also use an embedded escaped quote -->
<bean id="myBean" class="...">
<property name="myProp" ref="&myFactoryBean"/>
<property name="myProp2" value"some text "quoted text" some more"/>
</bean>
and
if (x > 10 && y < 20)
doSomething();
a = b & c;
String str = "Hello";
char chr = 'c';
becomes
if (x > 10 && y < 20)
doSomething();
a = b & c;
String str = "Hello";
char chr = 'c';
So it would seem to be safe to tell cleaner.php to just convert < to <. That said, It would be better if impex does a proper conversion without the escaping, in the first place.
colins
Fri 21st Oct '05, 6:20pm
One other thing I've noticed is that quotes in a signature come in escaped with a backslash.
So
My "Quote Containing" Signature
converts as
My \"Quote Containing\" Signature
Jerry
Sat 22nd Oct '05, 5:29pm
My worry with just blindly using cleaner.php to replace the following entities:
< < less than > > greater than & & ampersand ' ' apostrophe " " quotation mark
is that since these are XML code samples, I may in fact accidentally replace some escaped characters that were already in escaped form in the original XML documents in phpBB.
The conversion process is the only clean place to actually handle this; I would think the importer for phpBB would be able to rely on the fact that anything which needs to be escaped is already escaped. If the importer is adding it's own escaped characters, there is no way to tell those ones apart from any that are originally escaped...
You have to narrow it down so the wrong things don't get changed, i.e. :
"<property" => "<property"
Opposed to :
"<" => " <"
colins
Sat 22nd Oct '05, 10:27pm
I understand what you're saying, but that's completely unrealistic...
People are posting large amounts of arbitrary XML, with many different kinds of elements, some of them unknown. This is in thousands of messages.
On top of that, there is Java code as in the example above, where standalone & characters get converted to <.
So realistically, the only thing I can put in cleaner.php is a conversion from a simple '<' to '&'.
We're doing this tomorrow, and I've already tried this in a test conversion, and it's going to work in the majority of cases, and it's also going to bite us in the *ss in a few, as a few XML snippets here and there do have existing < escapes where an & sign needed to be insie an attribute.
As I said, it seems to me the only place this can be realistically handled properly is inside Impex. And I think Impex has all the data it needs. If phpBB can display the post properly, Impex should be able to get that post data in the proper form and put it that way into vBulletin. So hopefully you'll improve this area in the future for other people.
Regards,
Colin
Jerry
Mon 24th Oct '05, 3:30pm
Well that's what cleaner.php if for, though I've added the str_replace to the phpBB2 system so try downloading the latest impex and trying again.
Powered by vBulletin™ Version 4.0.0 Beta 4 Copyright © 2009 vBulletin Solutions, Inc. All rights