There are two ways to strip HTML tags from a string
1) Try to extract everything between end html tag(>)and next starting html tag (<)
2) First approach becomes complicated when you have nested tags. So the solution is to replace everything that is between (<) and (>)
SQL below replaces all the tags with space.
Select Regexp_Replace('<td>source string <b>bold<\b> with html tags<\td>', '<[^>]+>','') from dual
This is good, but how do you replace special characters of html tags like  , " and many more.
ReplyDeletesince  ," are constants, you can either use replace or regexp_replace
ReplyDeleteregexp_replace(source_string,'[ ]|["]','')
Hi There ,
ReplyDeleteThe information you shared is pretty helpful however how should i modify my statement
'SELECT
REGEXP_REPLACE(PPRO.DESCRIPTION, '<[^>]+>', '') "Dup_DESCRIPTION"
FROM WMSPT_PROJECTS_PROJECTS ppro'
to remove html tags Plus constants also in the same statement i.e what should be the pattern '<[^>]+>' ??
Ohh I got it now,It should be :
ReplyDeleteSELECT PPRO.PROJECT_ID,
REGEXP_REPLACE(PPRO.DESCRIPTION, '<[^>]+>|\&(nbsp;)|(amp;)', ' ') "Dup_DESCRIPTION"
FROM WMSPT_PROJECTS_PROJECTS ppro