Remove HTML tags from string with RegEx

June 06, 2011 - 09:02

If you have done a scrape of some data from a webpage, then the obtained string could contain lots of HTML tags such as HREF entries or font formatting. This may not be desirable when that information is displayed on your own site.

By using a Regular Expression function this can be removed.

<%@ Import Namespace="System.Text" %>

Dim mystring as string = "<A Href='test.aspx'>the link</a>"

Dim RegExStr As String = "<[^>]*>"
Dim R As New Regex(RegExStr)

mystring = R.Replace(mystring,"")

The mystring string will contain the text, "the link" only without any of the text shown beween any "<" and ">" characters.

© 2011