1. Products
  2.   HTML
  3.   Swift
  4.   Kanna
 
  

Free Swift Library to Parse and Manipulate HTML Data

Open Source HTML/XML Parsing Swift Library That Provides a Clean Swift API for Parsing and Traversing HTML and XML Documents With Ease.

What is Kanna?

If you have ever built a Swift application that needs to extract data from web pages, parse API responses in XML, or scrape structured content, you have likely run into the challenge of working with raw HTML and XML strings. Swift's standard library does not include a native, ergonomic parser for these formats, which is where Kanna steps in to save the day. At its core, Kanna provides a clean Swift API for parsing and traversing HTML and XML documents. Under the hood, it delegates the heavy lifting to libxml2, which means you get rock-solid parsing performance without re-implementing complex grammar rules yourself.

Kanna stands out in the Swift ecosystem for its simplicity and power. With its intuitive API, you can navigate and manipulate HTML/XML documents using familiar querying methods like XPath 1.0 and CSS3 selectors. Whether you're building a web scraper, extracting data from RSS feeds, or processing XML configuration files, Kanna provides the tools you need. Cross-platform support is one of Kanna's most attractive qualities. It runs on macOS, iOS, tvOS, watchOS, and even Linux. XPath 1.0 and CSS3 selectors give you two powerful, industry-standard ways to query documents. Its dual support for both CSS3 selectors and XPath 1.0 makes it flexible enough to handle everything from simple HTML extraction to complex, namespaced XML processing. Whether you are building an iOS app that consumes an RSS feed, a Vapor server that scrapes competitor pricing, or a command-line tool that transforms XML data, Kanna gives you the tools to get the job done reliably and elegantly.

Previous Next

Getting Started with Kanna

The recommended and easiest way to install Kanna is using CocoaPods. Please use the following command a smooth installation.

Install Kanna via Carthage

// add the following line to your Podfile:
use_frameworks! 
pod 'Kanna', '~> 5.2.2'

// Then run:
$ pod install

Install Kanna via CocoaPods


// If you prefer Carthage, add this line to your Cartfile:
github "tid-kijyun/Kanna" ~> 5.2.2

You can also install it manually; download the latest release files directly from GitHub repository.

Parsing HTML Documents via Swift

The open source Kanna library has included support for loading and parsing HTML documents inside Swift applications. The most common use case for tis parsing HTML, whether downloaded from the web or constructed programmatically. Here is a simple example that shows how software developers can use the HTML function that accepts a raw HTML string, encode it and return a document object containing properties like .title and .body that give them quick access to common elements without needing a selector.

How to Parse HTML using UTF-8 Encoding via Swift Library?

 import Kanna

// A sample HTML string representing a webpage
let htmlString = """

  
    My Tech Blog
  
  
    

Welcome to Swift Parsing

Kanna makes HTML parsing easy and elegant.

View on GitHub """ // Parse the HTML string using UTF-8 encoding if let doc = try? HTML(html: htmlString, encoding: .utf8) { // Access the document title directly via a convenience property print(doc.title ?? "No title found") // Output: My Tech Blog // Access the full body text print(doc.body?.text ?? "No body") }

CSS3 Selector Querying Support

The Kanna library has included full support for CSS3 Selector enables developers to use familiar CSS syntax to find elements. The Kanna's CSS selector support allows you to use the same selector syntax you might write in a stylesheet or JavaScript's querySelectorAll. This makes it very intuitive for anyone coming from a web development background. The following example shows how users can select HTML elements and iterate over multiple elements inside Swift applications.

How to Select Multiple HTML Elements & Extract Text from It via Swift Library?

import Kanna

let htmlString = """


  
  • Swift Book $39.99
  • Xcode Guide $24.99
  • iOS Fundamentals $49.99
""" if let doc = try? HTML(html: htmlString, encoding: .utf8) { // Select all elements with class "product" inside a "ul" for product in doc.css("ul.product-list li.product") { // Extract text from nested span elements using child selectors let name = product.css("span.name").first?.text ?? "Unknown" let price = product.css("span.price").first?.text ?? "N/A" // Access HTML attributes using subscript notation let productId = product["data-id"] ?? "No ID" print("[\(productId)] \(name) — \(price)") } }

XPath 1.0 Querying Support

The Kanna library has include support for XPath 1.0 querying inside Swift apps. XPath provides a more powerful and flexible way to navigate through elements and attributes in an XML/HTML document. It's particularly useful for complex document structures. XPath expressions can traverse documents in ways CSS selectors cannot, such as finding elements based on their position, parent-child relationships, or complex text matching. It's indispensable for XML parsing and complex HTML scraping.

XML Parsing and Validation via Swift

Beyond basic HTML parsing, the open source Kanna library offers robust XML handling capabilities. This includes proper XML structure validation and attribute management. When working with structured data, you need more than just element extraction. Kanna provides full XML document access, including comments, processing instructions, and document metadata.

How to Perform XML Parsing inside Swift Apps?

import Kanna

let xmlString = """


  
    Swift Developer News
    
      Kanna 6.0 Released with Swift 6 Support
      Alex Johnson
      Mon, 26 Jun 2024 00:00:00 GMT
    
    
      New Features in Swift 5.10
      Maria Garcia
      Tue, 15 Apr 2024 00:00:00 GMT
    
  

"""

if let doc = try? Kanna.XML(xml: xmlString, encoding: .utf8) {
    
    // Define a namespace mapping: prefix -> URI
    // The prefix you use here ("dc") can be anything; it maps to the namespace URI
    let namespaces = [
        "dc": "http://purl.org/dc/elements/1.1/"
    ]
    
    // Query elements in the "dc" namespace using the prefix
    for item in doc.xpath("//item", namespaces: namespaces) {
        let title = item.at_xpath("title")?.text ?? "No title"
        let creator = item.at_xpath("dc:creator", namespaces: namespaces)?.text ?? "Unknown"
        let pubDate = item.at_xpath("pubDate")?.text ?? "No date"
        
        print("Title: \(title)")
        print("Author: \(creator)")
        print("Published: \(pubDate)\n")
    }
}