[osflash] Find/replace xml tags - best approach?

Glen Pike postmaster at glenpike.co.uk
Mon Oct 8 10:31:00 PDT 2007


Hi,

    This is not AS3, but a PHP option as you have mentioned PHP in a 
previous message, thought it may help...

    I wrote a "scraper" in PHP based on an example in the Pear 
XML_HTMLSax package to rewrite bad HTML into XHTML.

    http://pear.php.net/package/XML_HTMLSax/

    This was a server side thing that read pages from the NYTimes site 
and pulled out segments of HTML between known points and cleaned up 
unclosed paragraphs, bad <br /> tags.

    I guess this maybe a similar application.  You will require the PHP 
Pear libraries installed, which can be a fiddle to do if you don't have 
shell access, but there is a pear installer which runs off an HTML page 
that you can visit.

    Worth a try if you have used Pear before.

    Also, someone once recommended not using Regex's to parse other 
people's XML / XHTML / HTML because it breaks so easily.  If you are 
using Regex's on your own code, it may suffice - if your XML is very 
strict and does not break the "rules".

    I uploaded a ZIP file with the classes in for you to look at.  It is 
well commented so you may be able to see what's going on and if it is 
useful.

    http://glenpike.co.uk/sd/HTML/HTMLHandler.zip

    HTH

    Glen  

Alias™ wrote:
> Yeah, funny, I've come to the same conclusion myself. I'm looking into
> the php libxslt right now, actually.
>
> Now, all I have to do is learn XSLT again... not touched it for years.
>
> Cheers!
> Alias
>
> On 08/10/2007, Peter Hall <peter.hall at memorphic.com> wrote:
>   
>> The first two choices are pretty much the same thing, just different
>> ways of selecting the nodes in the first place. Once you have them,
>> it's still a bit of a pain to replace nodes.
>>
>> The best solution is probably XSLT, depending on how complex the
>> transform that you actually want to do. It could just be overkill. I
>> am planning to build a full XSLT implementation at some point, but it
>> won't be any time in the next few months unless there are
>> volunteers...
>>
>> Peter
>>
>>
>> On 10/8/07, Alias™ <alias at proalias.com> wrote:
>>     
>>> Hi guys,
>>>
>>> I'm wondering if anyone has any opinions on this. I'm faced with the
>>> need to search and replace a bunch of XML tags in an AS3 project. The
>>> tags are going to be nested, and will probably be basic HTML elements,
>>> and replacing them with other html elements. For various reasons it
>>> seems that this is necessary because of the project's localisation
>>> goals.
>>>
>>> I'm currently considering the following options:
>>>
>>>  - native E4X
>>>      pros:built in, simple
>>>      cons:not really powerful enough without writing a lot of code
>>>  - the memorphic xpath library (http://www.memorphic.com/news/?page_id=16)
>>>      pros:xpath is nice and what I'm used to
>>>      cons:might be using a sledgehammer to crack a nut
>>>  - native regex:
>>>      pros:built in, lots of prewritten magic regexes which could do the job
>>>      cons: lots of prewritten magic regexes which could do the job,
>>> but might also mysteriously fail further down the line
>>>
>>> Has anyone had any experiences with this that they'd like to share?
>>>
>>> Thanks in advance,
>>> Alias
>>>
>>> _______________________________________________
>>> osflash mailing list
>>> osflash at osflash.org
>>> http://osflash.org/mailman/listinfo/osflash_osflash.org
>>>
>>>       
>> _______________________________________________
>> osflash mailing list
>> osflash at osflash.org
>> http://osflash.org/mailman/listinfo/osflash_osflash.org
>>
>>     
>
> _______________________________________________
> osflash mailing list
> osflash at osflash.org
> http://osflash.org/mailman/listinfo/osflash_osflash.org
>
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://osflash.org/pipermail/osflash_osflash.org/attachments/20071008/5fcc5d2c/attachment.html 


More information about the osflash mailing list