tommythejoat
Tue 29th Jul '08, 11:16am
I have been contemplating the cleanup pass over our message board following the conversion from bbv2 (the Perl script). The forums are coming across with html rather than bbcode in the posts. I had thought the text of the posts would contain bbcode tags rather than the html tags.
In looking at Impex, I noticed that the quote tag got converted and surprisingly the bold tag. However, img and url references do not get converted and there are lots of break tags in the text.
I was thinking I would clean these up with cleaner, but in experimenting with the board, I noticed that if I turned on the flag in admin cp to allow html code in a forum, everything displayed properly and the html got interpreted instead of displayed.
We don't want to leave html enabled because of perceived danger from malicious or foolish html code. I experimented with this idea a bit and noticed that vBulletin converted html to bbcode tags. I was very happy to see that and thought maybe my answer was to just turn on html.
Unfortunately, it appears that during posting and editing vBulletin only translates html that it recognizes. In particular, the iframe tag gets passed unchanged into the text body and I suspect other tags do also. Since iframes can destroy the formatting of the board, and I was concerned that more malicous html could be passed, I turned the html back off and I am back to where I started.
I wish it were the case that vBulletin deleted all html that it could not convert to either formatting or bbcode tags. In that case I would be happy to just turn on the html.
If Impex used the same logic that vBulletin uses when posting to process the posts from my bbV2 board, the html would not be coming through the import process. Is there some way I can turn on that behavior?
If I cannot use the import process itself, I will have to use Cleaner to process the posts. The Cleaner array is getting pretty complex and it looks like the great bulk of the translations will need to use preg_replace with regular expressions rather than simple substitutions.
In looking at Impex, I noticed that the quote tag got converted and surprisingly the bold tag. However, img and url references do not get converted and there are lots of break tags in the text.
I was thinking I would clean these up with cleaner, but in experimenting with the board, I noticed that if I turned on the flag in admin cp to allow html code in a forum, everything displayed properly and the html got interpreted instead of displayed.
We don't want to leave html enabled because of perceived danger from malicious or foolish html code. I experimented with this idea a bit and noticed that vBulletin converted html to bbcode tags. I was very happy to see that and thought maybe my answer was to just turn on html.
Unfortunately, it appears that during posting and editing vBulletin only translates html that it recognizes. In particular, the iframe tag gets passed unchanged into the text body and I suspect other tags do also. Since iframes can destroy the formatting of the board, and I was concerned that more malicous html could be passed, I turned the html back off and I am back to where I started.
I wish it were the case that vBulletin deleted all html that it could not convert to either formatting or bbcode tags. In that case I would be happy to just turn on the html.
If Impex used the same logic that vBulletin uses when posting to process the posts from my bbV2 board, the html would not be coming through the import process. Is there some way I can turn on that behavior?
If I cannot use the import process itself, I will have to use Cleaner to process the posts. The Cleaner array is getting pretty complex and it looks like the great bulk of the translations will need to use preg_replace with regular expressions rather than simple substitutions.