GDOM
Good day. GNOME has many beautiful APIs. For manipulating XML, many projects use the powerful libxml2. However, its API is not consistent with GNOME conventions. It uses different data types, and does not benefit from good integration with GLib, missing out on libraries like GIO.
GDOM is an effort to implement the DOM Level 1 Core API for GLib.
- GObject API and GLib integration
- All objects, such as Nodes, Documents, and Attributes, will be GObjects. They'll use GErrors for exception handling, handle GIO's GInputStreams, etc.
- Functionality via libxml2
- We are not implementing the API from scratch. In fact, almost all its functionality is coming from libxml2, which is already robust and has good performance.
- Vala
- GDOM will be implemented in Vala which should help minimise the work necessary to complete the API.
I really want to hear your thoughts on XML and GNOME. I am at a disadvantage of never having had to write anything too large or intensive that used XML, just parsing and building simple files for my own projects. If you're involved with a project that already uses libxml2 (or another XML API), let me know. I'd like to see how you use it and how GDOM might make life better for you.
GSoC and Me
This is a Google Summer of Code 2011 project, and my mentor is Alberto Ruiz. Me? I'm a Masters student at the University of Guelph. I fell in love with GNOME about 7 years ago, but never got around to writing much code until 2009. My Masters has kept me busy recently, but thanks to Google, I can justify focusing on GNOME for a while :)
There is a single purpose xml reader/writer here: http://code.google.com/p/xnoise/source/browse/#hg%2Flibxnoise-utils
ReplyDeleteIt is written in vala.
Maybe the API for the Nodes is interesting for you. There are convenient iterators with vala syntax support for the xml nodes.
That sounds like a lot of GObjects... be careful with that. Premature optimization is the root of many evils but it is useful to understand the cost of using GObject in massive amounts. See e.g. http://zee-nix.blogspot.com/2008/09/think-before-you-create-gobjects.html
ReplyDeleteNow, Zeenix wrote that years ago so the numbers are not relevant to GObject of today but the idea is. To understand why especially properties are/have been such a big performance hit, see ebassis GPorperty proposal: http://blogs.gnome.org/ebassi/2011/05/18/this-is-why-we-fight/
Hi richard,
ReplyDeleteI'm the maintainer of 2 libraries that could use GDOM.
Lasem, a mathml and svg rendering library, which already implements a DOM API ( http://git.gnome.org/browse/lasem/ ).
Aravis, a vision library for GigE Vision cameras, which has to parse a xml file (Genicam standard) for the definition of the control registers of the cameras ( http://git.gnome.org/browse/aravis/ ).
The way this 2 librarys could benefit from a standard gobject based DOM api are different.
Aravis could use an implementation of a DOM tree, and extend the xml element class in order to implement the specific behaviour of the Genicam elements. Memory use is not an issue in this case, as the Genicam files are relatively small.
Lasem, trying to be efficient memory wise, would only use the definition of a DOM interface and implement itself the data storage and the node relationship. We don't want to have libxml2 as data storage.
In my opinion, an interesting gdom library would:
- Provide a gdom interface conform to the DOM core 1 specifications.
- Provide an implementation of the gdom interface for xml documents in a set of gxml classes.
- Implement a SAX parser that builds the DOM tree.
Lasem will implement the gdom interface.
Aravis will use gxml_element as the base class for the Genicam elements.
Both library would benefit of the xml parser, be based on the same DOM interface, be accessible from other languages using the
gobject-introspection data generated in gdom.
Lasem could also use the unit tests in gdom for testing its gdom
implementation.
Jussi,
ReplyDeletegobject use may not be an issue. In lasem, where xml elements are stored in gobjects, memory usage is quite reasonnable.
For example, rendering the file in this bug report:
https://bugzilla.gnome.org/show_bug.cgi?id=625838
takes 80MB with lasem (librsvg uses 500MB, batik crashes).
I would not use gobjects for the storage of each attribute though. Nor use element gobject properties.
Interesting project indeed and once ready (mature enough) I would definitely move my projects (GUPnP/Rygel) to use this instead of libxml2 directly. Having said that, libxml2's DOM API has a significant enough memory cost and if GDOM relies on that, it will not only inherit that issue but extend it even further.
ReplyDeleteI had a very similar idea in mind for an year now but the main difference being that it will not use libxml2 and neither provide DOM nor low-level parsing API. Instead the idea will be to provide an API thats somewhere in between: asynchronous and on-demand loading of nodes/trees. If you happen to be interested, please do contact me and we can discuss the details. I hang out on irc by nick 'zeenix'.
Jussi, Things have changed significantly after that blog entry AFAIK. :) While that principle might still apply, its not so significant any more.
ReplyDeleteFor incremental parsing and for significant memory savings the Expand() function on XML (text) reader instances is very useful. In the C API the function is called xmlTextReaderExpand (see http://xmlsoft.org/html/libxml-xmlreader.html#xmlTextReaderExpand for the docs), but it's available from bindings as well.
ReplyDeleteThe idea is to mix DOM an SAX parsing by using event based (SAX-like) parsing and expanding a subtree into a DOM tree on demand. The DOM tree is freed after use. This makes a convenient DOM API available for large documents with many repeated elements (e.g. a database file with millions of records).
Wouter, interesting! I didn't know of that API. However, libxml2 has another disadvantage *AFAIK*: It doesn't really take advantage of gio/glib and hence you can't implement a truely asynchronous reading/loading of documents/nodes (from possibly remote locations) on-demand.
ReplyDeletePerhaps async reading/loading can be built on top of xmlReaderForIO or xmlReaderNewFd?
ReplyDelete(FWIW, I don't check back here, so please mail me at wbolster gnome org if you want me to read replies.)