TestsTested | ✗ |
LangLanguage | CC |
License | Custom |
ReleasedLast Release | Dec 2014 |
Maintained by Unclaimed.
ObjectiveGumbo is a set of classes that make it easier to interact with Gumbo, Google's HTML5 parsing library (written in C), from Objective-C.
Examples were originally packaged with ObjectiveGumbo, however in order to avoid it getting bloated I've moved them into a separate repository. For examples for iOS and OSX please now go to my OG-Demos repository.
To add ObjectiveGumbo to your Mac/iOS project do the following:
Fetch all of the links from the Hacker News homepage and log them (see the Hacker News example for a more advanced method):
OGNode * data = [ObjectiveGumbo parseDocumentWithUrl:[NSURL URLWithString:@"http://news.ycombinator.com"]];
NSArray * tableRows = [data elementsWithClass:@"title"];
for (OGElement * tableRow in tableRows)
{
if (tableRow.children.count > 1)
{
OGElement * link = tableRow.children[0];
NSLog(@"%@", link.attributes[@"href"]);
}
}
Get the body text of BBC News:
OGNode * data = [ObjectiveGumbo parseDocumentWithUrl:[NSURL URLWithString:@"http://bbc.co.uk/news"]];
OGElemet * body = [data elementsWithTag:GUMBO_TAG_BODY];
NSLog(@"%@", body.text);
This has been written with object-orientation and Cocoa in mind to make it a lot easier to interact with from Objective-C, which also gains the benefits of not having to worry about C-style pointers and releasing memory. Furthermore, it also uses 'native' Objective-C paradigms such as dictionaries and arrays rather than the Vector implementation provided Gumbo. It also reduces the amount of code you have to write by allowing you to quickly fetch tags based on tag, ID or class (like jQuery).
This class should be used for parsing HTML from NSStrings, NSURLs or NSData. Please note that, like Gumbo, ObjectiveGumbo only supports UTF8 web pages.
All tags and pieces of text get converted to an OGNode (and its subclasses). So the following HTML would be parsed as follows:
<p>This is a paragraph. <b>This is bold</b></p>
Paragraph (OGElement)
'This is a paragraph. ' (OGText)
Bold (OGElement)
'This is bold'
OGNode provides a variety of utility functions such as fetching the plaintext content and finding child nodes quickly:
A subclass of OGNode that provides an array of child elements, a dictionary of attributes and an array of classes.
A subclass of OGElement that includes DOCTYPE information. If you use the parseDocument* functions in ObjectiveGumbo you will receive this back as the root element.
A subclass of OGNode that represents some plain text. Note that all OGNodes have a -(NSString*)text function, however OGElement recursively searches its children.
If you are using ObjectiveGumbo in your projects or want to ask specific support questions feel free to email at programmingthomas [at] gmail [dot] com. Submitting issues is an equally good way to reach me