HCMessageParser 0.1.0

HCMessageParser 0.1.0

TestsTested
LangLanguage SwiftSwift
License MIT
ReleasedLast Release Dec 2015
SPMSupports SPM

Maintained by Leo.



HCMessageParser

HCMessageParser parses message and returns a dictionary with mentions, emoticons, and links within it. If blockDone is passed, asynchronized call(s) will be executed and the titles of all the web links will be filled in the result dictionary and sent to blockDone then all the web links are processed. If there are any errors, a dictionary with URL strings as keys and NSErrors as values will be passed to the block as well.

Usage

  • To run the example project, clone the repo, and run pod install from the Example directory first.
  • Run Command+R to see the results of the exercise in console.
  • Test Command+U to run the tests, which contains length test, multi-language support, etc.

Requirements

Swift 2.0 and iOS 8.0 are required to use this pod.

Installation

HCMessageParser is available through CocoaPods. To install it, simply add the following line to your Podfile:

pod "HCMessageParser"

Example

Test message:

message = "@bob @john (success) such a cool feature; https://twitter.com/jdorfman/status/430511497475670016 but http://i.amnothing.com is down (frown)"
let result1 = HCMessageParser.parse(message) {
    (result2, errors) in
    print("Second result of '\(message)':\n\(result2)\nError(s): \(errors)\n")
}
print("First result of '\(message)':\n\(result1)\n")

Returns:

["links": (
        {
        url = "https://twitter.com/jdorfman/status/430511497475670016";
    },
        {
        url = "http://i.amnothing.com";
    }
), "mentions": (
    bob,
    john
), "emoticons": (
    success,
    frown
)]

Callback with result dictionary:

["links": (
        {
        title = "Justin Dorfman on Twitter: \"nice @littlebigdetail from @HipChat (shows hex colors when pasted in chat). http://t.co/7cI6Gjy5pq\"";
        url = "https://twitter.com/jdorfman/status/430511497475670016";
    },
        {
        url = "http://i.amnothing.com";
    }
), "mentions": (
    bob,
    john
), "emoticons": (
    success,
    frown
)]

And error dictionary:

["http://i.amnothing.com": Error Domain=NSURLErrorDomain Code=-1003 "A server with the specified hostname could not be found." UserInfo={NSUnderlyingError=0x7ff5b95324a0 {Error Domain=kCFErrorDomainCFNetwork Code=-1003 "(null)" UserInfo={_kCFStreamErrorCodeKey=8, _kCFStreamErrorDomainKey=12}}, NSErrorFailingURLStringKey=http://i.amnothing.com/, NSErrorFailingURLKey=http://i.amnothing.com/, _kCFStreamErrorDomainKey=12, _kCFStreamErrorCodeKey=8, NSLocalizedDescription=A server with the specified hostname could not be found.}]

Discussion

I'm using HipChat account (e.g. superarts.hipchat.com/account) to test rules about username, and the HipChat web app to test links etc. There are a lot of details need to be discussed, and I've highlighted some of them as below.

Mention

  • Format like @user1@user2 is allowed.
  • Other than latin and numeric characters, Chinese / Japanese / Korean characters are allowed
  • Emoji characters are not allowed.

Potential problems I found about HipChat mention/username:

  • About length: although it's stated as The mention name must be between 0 and 50 characters, the valid length is actually 1 to 49.
  • About punctuation in CJK: Chinese, Japanese, Korean
    • Characters like ,。! are wrongly allowed, which will cause problem for messages like @老板,你来吗 since 老板 should be the mentioned string instead of rest of the whole sentence. Please check Tests.swift for details.
    • My point is that since Hi @Leo, are you OK? is supported, and CJK characters are supported in username, it would be good to take this into account as well.
    • In HCMessageParser however, regex \p{L} is able to make the right call, so I would suggest that HipChat should not allow such characters in username. Not sure if 0-9 should be replaced by \p{N} though.
    • Besides all these, HipChat is having some problem with some CJK IME (input method engines), for example Baidu Pinyin. Additional characters are entered into the message box while typing Chinese characters. Again, if CJK characters are supported, it would be good if they are supported properly.

Emoticons

  • In HipChat, emoticons are case insensitive, which makes (lol) and (LOL) are treated in the same way. It's not mentioned in the requirement that whether the emoticons should be converted to lowercase strings though, so I've implemented it in HCMessageParser since I think it's a better design. This behaviour can be overriden via a flag HCMessageParser.emoticonForceLowercase which is set to true by default. Check Tests.swift for details.
  • Since it's stated as "alphanumeric strings", I would assume extended latin and CJK characters don't count, and would like to discuss with the team to see whether it's acceptable.

URLs / links

  • The requirement says "URLs... along with the page's title" which makes it sounds like only http(s) protocols are interested, otherwise terms like "page" and "title" are not relevant. However, in HipChat a lot of URL schemes are supported (schemes like mailto are not supported though), so this will be the default behaviour of HCMessageParser.
    • This behaviour can be overriden.
    • Use HCMessageParser.urlSupportedProtocols.append("ftp") to add a supported protocol.
    • Use HCMessageParser.urlSupportedProtocols = HCMessageParser.urlDefaultProtocols to reset supported protocols.
    • A better approach might be allowing all protocols by default, including the ones that are not currently supported. More discussion is expected.
  • URLs like http://www.test.com/测试 is supported by HCMessageParser while it is not supported by HipChat.
  • URLs like http://موقع.وزارة-الاتصالات.مصر/ is supported by HCMessageParser while it is not supported by HipChat.
  • When getting title, HipChat removes quotes and in the requirement the title is truncated as well. I'm not sure what exactly the rules are here so I assume they are:
    • All quotations should be removed. Private array urlTitleFilter contains the blacklisted characters.
    • Title longer than 50 characters should be truncated. Private flag urlTitleMax is set to 50 and private string urlTitleEllipsis is set to ...
    • Even these actions are taken, the title of the twitter is still different. I don't think I should make any new rules anymore.

Optimisation

Normally this kind of job should be executed one server side, so I assume that the library runs on users phone when a message is sent, so that it does some parsing job for the server. In this case, the app does not process a lot of data in a short period of time, so I'm using Swift to do the work. However, if it turns out to be a different case, the library should be rewritten using C to achieve maximum performance.

About

Author

Leo, [email protected]

License

HCMessageParser is available under the MIT license. See the LICENSE file for more info.