- the autotools build system has improved
- documentation is more complete and more accurate
- many bugs were flushed out and fixed (e.g. attribute syncing between underlying libxml2 xmlNodes and GXmlElements)
- it has a mailing list (firstname.lastname@example.org)
- new stuff
- document child management, node cloning
- new memory tests
- new error handling model
- new memory handling model (fixing leaks and improving performance!)
- improved API compliance
- bug-fix release (0.3.2) without API breaks
- imminent 0.4.0 with API breaks (pending some updated patches for XPath, Serialization, etc)
Look forward to 0.4.0 imminently, and happy hacking.
GXml's performance versus pure libxml2One question people have had is the difference in performance between libxml2 and GXml, since GXml currently wraps it. Things should be worse, as there's typically more code for each operation, but how large will the penalty be and will it matter for you?
TestsI created a simple test suite with the four following tasks:
- loading a file from disk
- loading a file from memory
- stringifying a document
- saving a document to disk
I've run it on a Lenovo ThinkPad Twist S230u with the following configuration
- Intel® Core™ i5-3317U CPU @ 1.70GHz × 4
- 4GB RAM, SODIMM DDR3 Synchronous 1333 MHz (0,8 ns)
- 500GB HD @ 5400 RPM (HGST HTS725050A7)
- /home, including test files
- 24GB SSD (Samsung MZMPA024)
- everything outside of /home, including libraries
- Fedora 19, x86_64
- GXml from git HEAD
Test DataThe test data was based on my updateinfo.xml files from yum, in particular the one found at: /var/cache/yum/x86_64/19/updates/gen/updateinfo.xml. It contained 98743 different nodes over 11,136kB. I created smaller and larger versions of it, resulting in
|test3.xml||22 276||2 784|
|test4.xml||47 707||5 568|
|test5.xml||98 743||11 136|
|test6.xml||197 484||22 268|
|test7.xml||394 966||44 536|
MeasurementsThree values were measured. One was time taken to complete a task (like load a file), using g_get_monotonic_time, which reports in microseconds. One was memory used by the task after it completed, using mallinfo, in particular the uordblks field (total allocated space), and one was memory leaks (also using mallinfo, after we freed memory).
ProcedureI ran the tests once averaged over 10 trials for each combination of test and file, and then again over 25 trials. Ways the procedure could be improved includes better isolation on the system from other processes, or providing more detail than the averaged scores, so we can detect any exceptional anomalies (e.g. some other process causes a file load to be delayed by hogging I/O).
ResultsKeep in mind that GXml wraps libxml2 for most functionality, so we don't expect it to be faster than libxml2, rather we want to see what penalty a GObject wrapper (written in Vala) causes.
Memory LeaksGXml was leaking memory like a sieve before the summer. (0.3.2 includes memory leak fixes without the API breaks!), so I wanted to know what memory was left after these tasks from both libxml2 and GXml. Luckily, neither had any in the cases tested. (That does not mean there aren't any! Kudos to those who find them (and more to do who patch them)).
Discussionloading documents from disk
When it comes to loading a file from the disk, we compared xmlReadFile versus gxml_document_new_from_path (which uses xmlParseFile).
Memory usage differences are consistently ~14% higher.
Time-wise, on smaller files, GXml tasks up to 50% longer than using libxml2. I'm not sure why test4.xml is miraculously lower from this run. You can see that the larger the file, smaller the difference, which makes sense, since most of the hardwork is done by libxml2 anyway.
loading documents from memory
With memory, again, we see a consistent increase between ~14-16%.
Time-wise, again GXml oddly performs better on test4.xml. Elsewise, we see the same trend: there is little difference with larger files.
saving to disk
We don't report memory differences because GXml's save functionality cleans up its use of xmlSaveCtxt before it exits, so we can't (easily) see how much we used. Neither leak, so there is nothing to see there.
Time-wise, it seems to take about the same length of time, but GXml may be trending to more. This could be due to tasks like synchronising data that is initially stored just in GXmlNodes and needs to be copied into the xmlDoc of libxml2 to make it to disk.
Memory-wise, we typically see an increase of ~10-15%. Note that they failed to handle the stringification of the largest file, test7.xml, which requires further investigation. Stringification was done with xmlDocDumpFormatMemory.
Time-wise, the increase was ~16-20%.
ConclusionRegarding memory usage, if you use GXml for cases such as these, you can expect around a 15% increase in memory usage. That makes sense, as GObjects are used instead of the light C structures libxml2 typically does. One benefit in hwrapping libxml2 is that we don't actually create a GXmlNode for every xmlNode in a document, only the ones we use, so a pure GObject implementation might use more memory.
Regarding time usage, the difference for some operations is small, a couple percent, and for others, the difference is larger with smaller files, as big as 50% when loading a smaller file. Larger files in those cases (such as loading documents) see less and less of a penalty.
I feel as though for many common applications, these don't represent a significant penalty (time taken in loading large documents is still a few dozen milliseconds), and can be worth the benefits in using a GObject API.
Going forwardIf you're interested in more about GXml's performance, the test suite will be in gxml/tests/performance/. Feel free to submit new tests and test files.
Regarding GXml, HEAD will be pushed out in a new feature release including the API changes, fancy new features, and contributions from others, including Daniel Espinosa, Adam Ples, Simon Reimer, and others.