2025-11-27 16:46:48 +09:00

49 lines
5.5 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{\rtf1\ansi\ansicpg1252\deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}}
{\colortbl ;\red0\green0\blue255;\red0\green128\blue0;}
{\stylesheet{ Normal;}{\s1 heading 1;}{\s2 heading 2;}{\s3 heading 3;}{\s4 heading 4;}{\s5 heading 5;}{\s6 heading 6;}}
{\*\generator Msftedit 5.41.15.1507;}\viewkind4\uc1\pard\s1\sb100\sa100\cf1\kerning36\b\f0\fs48 COXParser and COXHTMLParser Overview\cf0\par
\pard\s6\sb100\sa100\kerning0\fs20 Copyright \'a9The Code Project 1996-2005, All Rights Reserved\fs15\par
\pard\sb100\sa100\fs24 COXParser \b0 is a simple class that allow the parsing of XML-style formatted files that contain nested tags, comments, markup and text. Tags are delimited using the usual <...>, </...>format, or <.../> for an empty tag, and each tag can contains one or more attributes (eg <TAG number="1" value="a">. Comments delimited by <!-- and -->delimiters are suppored, as is literal text. An example would be: \par
This would be parsed into the following structure: \par
Each element can contain a collection of further Elements, and a collection of attribute. Once a file or text buffer has been parsed, the Root Element ("[Root]") of the \b COXParser \b0 object will contain the collection of all elements (in \b COXParserElement \b0 objects) each with their set of attributes (in \b COXAttribute \b0 objects). Each element in the tree will contain further subElements (\b COXParserElement \b0 objects), Comments (\b COXParserObject\b0 ) and text (\b COXParserObject\b0 ). All objects contained within the tree are derived from \b COXParserObject\b0 . To determine the type of object, use \i COXParserObject::GetType()\i0 . This will return UNKNOWN, ELEMENT, PLAINTEXT, COMMENT, MARKUP or ATTRIBUTE. Using \b COXParser\b0 : To use this class simply declare a variable of type \b COXParser\b0 , then call \i COXParser::Parse() \i0 or \i COXParser::ParseFile()\i0 . The parser object will then contain the tree structure of the text buffer or file. The easiest way to traverse the tree is to use a recursive function:\par
\cf2 void DisplayElement(COXParserElement* pElement) \{\cf0\par
\cf2\~if (pElement->GetType() != CParseObject::ELEMENT) \cf0\par
\cf2\~ return; \cf0\par
\cf2 printf("%s\\n", pElement->GetName()); \cf0\par
\cf2 for (int i = 0; i < pElement->NumAttributes(); i++) \{ \cf0\par
\cf2 printf("Name = %s ", pElement->Attribute(i)->GetName());\cf0\par
\cf2\~if (pElement->Attribute(i)->GetAttributeType() == COXAttribute::ATTR_STRING) \cf0\par
\cf2\~ printf("Value = %s\\n", pElement->Attribute(i)->GetStringValue()); \cf0\par
\cf2 else if (pElement->Attribute(i)->GetAttributeType() == COXAttribute::ATTR_INTEGER) \cf0\par
\cf2 sprintf("Value = %d\\n", pElement->Attribute(i)->GetIntValue()); \cf0\par
\cf2\} \cf0\par
\cf2 for (i = 0; i < pElement->NumObjects(); i++)\{\cf0\par
\cf2\~CParseObject* pObject = pElement->Object(i); \cf0\par
\cf2 switch (pObject->GetType()) \{\cf0\par
\cf2\~case COXParserObject::CDATA: \cf0\par
\cf2 case COXParserObject::PLAINTEXT: \cf0\par
\cf2\~ printf("Text = %s\\n", pObject->GetText()); \cf0\par
\cf2\~ break; \cf0\par
\cf2 case COXParserObject::COMMENT: \cf0\par
\cf2\~ printf("Comment = %s\\n", pObject->GetText()); \cf0\par
\cf2\~ break; \cf0\par
\cf2 case COXParserObject::MARKUP: \cf0\par
\cf2\~ printf("Markup = %s\\n", pObject->GetText()); \cf0\par
\cf2\~break; \cf0\par
\cf2 case COXParserObject::PROCINSTR: \cf0\par
\cf2\~ printf("Processing Instruction = %s\\n", pObject->GetText()); \cf0\par
\cf2\~ break; \cf0\par
\cf2 case COXParserObject::ELEMENT: \cf0\par
\cf2\~ DisplayElement((COXParserElement*)pObject); break; \cf0\par
\cf2\~ \}\cf0\par
\cf2\~\} \cf0\par
\cf2\} \cf0\par
Then: \par
\cf2 COXParser parser;\cf0\par
\cf2\~parser.ParseFile("file.txt"); \cf0\par
\cf2 DisplayElement(parser.Root()); \cf0\par
COXParser also allows you to hook in your own error reporting callback. Simply call \i COXParser::SetErrorRptFunction(ReportErrorFn, dwData); \i0 where ReportErrorFn is a function of type \i BOOL ReportErrorFn(UINT nErrorCode, LPCTSTR szError, int nLine, int nColumn, DWORD dwData), \i0 nErrorCode contains an error code. szError contains an error message for line nLine, col nColumn, and dwData is specifed in SetErrorRptFunction and passed to the callback. The return value for this function is not used as yet (In future it will be used to provide control over whether or not processing should be halted). You can use \i LPCTSTR COXParser::TranslateErrorCode() \i0 to get a literal representation of the errorcode. TODO: (These are supported as Markup elements - but further processing could be added) Parsing of Element Type Declarations ie Parsing of Attribute List Declarations ie Entity Declarations ie , or Parameter Entities ie ( then is the same as Notation Declaration ie (This is already supported as a PI - but further processing could be added) Add in support for the prolog ie ' Name validation (eg only certain chars allowed in names and tokens) \par
\pard\s3\sb100\sa100\b\fs27 COXHTMLParser\par
\pard\sb100\sa100\b0\fs24 COXHTMLParser class allows us to parse HTML formatted files, that contain nested tags, comments, markup and text. It can be useful in HTML editors, redactors, viewers and so on. This class\~is derived from COXParser class. All useful functions are declared as virtual so it will be easy to\~derive your own class.\~To parse HTML file simple call method ParseFile() of COXParser, if you want to parse HTML text already allocated in the memory call COXParser::Parse() To use this class refer to COXParser overview.\~ \par
}