PDA

View Full Version : Cleaner vs. Impex 070 Question HTML



tommythejoat
Tue 29th Jul '08, 11:16am
I have been contemplating the cleanup pass over our message board following the conversion from bbv2 (the Perl script). The forums are coming across with html rather than bbcode in the posts. I had thought the text of the posts would contain bbcode tags rather than the html tags.

In looking at Impex, I noticed that the quote tag got converted and surprisingly the bold tag. However, img and url references do not get converted and there are lots of break tags in the text.

I was thinking I would clean these up with cleaner, but in experimenting with the board, I noticed that if I turned on the flag in admin cp to allow html code in a forum, everything displayed properly and the html got interpreted instead of displayed.

We don't want to leave html enabled because of perceived danger from malicious or foolish html code. I experimented with this idea a bit and noticed that vBulletin converted html to bbcode tags. I was very happy to see that and thought maybe my answer was to just turn on html.

Unfortunately, it appears that during posting and editing vBulletin only translates html that it recognizes. In particular, the iframe tag gets passed unchanged into the text body and I suspect other tags do also. Since iframes can destroy the formatting of the board, and I was concerned that more malicous html could be passed, I turned the html back off and I am back to where I started.

I wish it were the case that vBulletin deleted all html that it could not convert to either formatting or bbcode tags. In that case I would be happy to just turn on the html.

If Impex used the same logic that vBulletin uses when posting to process the posts from my bbV2 board, the html would not be coming through the import process. Is there some way I can turn on that behavior?

If I cannot use the import process itself, I will have to use Cleaner to process the posts. The Cleaner array is getting pretty complex and it looks like the great bulk of the translations will need to use preg_replace with regular expressions rather than simple substitutions.

tommythejoat
Fri 1st Aug '08, 11:02am
Since I had no response to the post above, I decided I needed to investigate it myself.

I discovered that the bbBoardV2 ImpEx uses its own html processing bbBoardv2_html (in 000.php) rather than html_2_bbcode as found in ImpexFunction.php.

The resulting imports have essentially all the html code still in place. It would appear that if I modify 007.php to call the standard function, everything will work properly.

I would guess that Jerry was working from the wrong specification for bbBoardV2. Since it is a Tier 2 system, that is not really surprising.

I would like to confirm that the long sequence of preg_replace operations in html_2_bbcode will take a text block containing standard working html and convert it to the equivalent bbcode format.

Is this true?

I don't like to modify distributed code if I can avoid it.

Jerry
Fri 1st Aug '08, 2:38pm
You can always modify it to call the default html function as well as the system specific one :

so line 134 of 007.php :



$this->vb_parse_url($this->bbBoardv2_html($post_details['content'])));


to



$this->vb_parse_url($this->bbBoardv2_html($this->html_2_bbcode($post_details['content']))));

tommythejoat
Fri 1st Aug '08, 4:06pm
Does the nesting order matter on this? I suppose since it will not see the bbcoded strings in the outer call, it does not really matter which one is inside or outside. Or does it?

Jerry
Fri 1st Aug '08, 4:17pm
Does the nesting order matter on this? I suppose since it will not see the bbcoded strings in the outer call, it does not really matter which one is inside or outside. Or does it?

The thinking behind it is that html_2_bb will get all the generic and common HTML that is in most of the import systems, then the {$system}_html() function is outside as that is a catch all for things that are missed or specific to that system.

tommythejoat
Fri 1st Aug '08, 4:47pm
Thank you. I think that makes me good to go.

Now one of my advisory committee members is giving me a hard time about having two domains resolve to the same ip address. I was/am planning to do that with the 404 handler to take care of moving domains.

If anyone has any data I can quote on why search engines will not put you on their ban list for doing that, I would appreciate it. I cannot imagine they care.:rolleyes: