'CH HTMLConvert' is a software component that enables you to parse HTML strings and extract the plain text content.
'CH HTMLConvert' is freeware. You may freely use and distribute it with any commercial software. The source code of this component is also available. See your website for details: www.ch-software.de/htmlconvert
There a two versions of this component:
chHtmlConvert32.dll: a Win32 dynamic link library written in C++
chHtmlConvert.dll: a .NET assembly written in C#
public static bool ExtractPlainText(string stHtml, out string stText);
public static bool ExtractPlainText(string stHtml, out string stText, bool bPreferASCII);
public static bool ExtractPlainText(string stHtml, out string stText, bool bPreferASCII, out string stCharset, out int nCodePage);
The character string to be converted.
stText
The character string that receives the output text.
bPreferASCII
Specifies whether some often used unicode characters will be translated to similar ASCII characters, or not. See table below:
Unicode character
ASCII character
‑ 8209 - 45 – 8211 - 45 — 8212 - 45 ’ 8217 ' 39 “ 8220 " 34 ” 8221 " 34 • 8226 * 42
stCharset
The character string receiving the charset that has been specified in the HTML input.
nCodePage
The integer variable that receives the codepage number of the HTML input. For example, if the HTML input specifies the charset "Windows-1252" the codepage number will be 1252.
If the function succeeds, the return value is true.
The return value is false if the required codepage is not installed on the system. Therefore the input text can not be converted to the correct unicode representation. The output text is unlikely to contain any readable characters.
THIS SOFTWARE AND THE ACCOMPANYING FILES ARE DISTRIBUTED 'AS IS' AND WITHOUT WARRANTIES AS TO PERFORMANCE OF MERCHANTABILITY OR ANY OTHER WARRANTIES WHETHER EXPRESSED OR IMPLIED. NO WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE IS OFFERED. TEST THE PROGRAM THOROUGHLY WITH NON-CRITICAL DATA BEFORE RELYING ON IT. THE USER MUST ASSUME THE ENTIRE RISK OF USING THE PROGRAM.