[GNOME] Final Report for GXml in the 2013 Google Summer of Code

The Google Summer of Code has ended, and GXml is spoiled with the fruits of labour:
  • the autotools build system has improved
  • documentation is more complete and more accurate
  • many new examples across most classes, especially for C and JavaScript
  • many bugs were flushed out and fixed (e.g. attribute syncing between underlying libxml2 xmlNodes and GXmlElements)
  • it has a mailing list (gxml-list@gnome.org)
  • new stuff
    • document child management, node cloning
  • new memory tests
  • new error handling model
  • new memory handling model (fixing leaks and improving performance!)
  • improved API compliance
  • bug-fix release (0.3.2) without API breaks
  • imminent 0.4.0 with API breaks (pending some updated patches for XPath, Serialization, etc)
I've talked about those before (near the start and while at GUADEC) so for my report I'm going to focus on the outcome in terms of performance.

Look forward to 0.4.0 imminently, and happy hacking.

GXml's performance versus pure libxml2

One question people have had is the difference in performance between libxml2 and GXml, since GXml currently wraps it.  Things should be worse, as there's typically more code for each operation, but how large will the penalty be and will it matter for you?


I created a simple test suite with the four following tasks:
  1. loading a file from disk
  2. loading a file from memory
  3. stringifying a document
  4. saving a document to disk
The test suite is highly modular, and it's easy to add new tests.  For each test, you define a setup function, a test function (the measured test), and a cleanup function.  So if you'd like to see anything else in particular tested, let me know.


I've run it on a Lenovo ThinkPad Twist S230u with the following configuration
  • Intel® Core™ i5-3317U CPU @ 1.70GHz × 4 
  • 4GB RAM, SODIMM DDR3 Synchronous 1333 MHz (0,8 ns)
  • 500GB HD @ 5400 RPM (HGST HTS725050A7)
    • /home, including test files
  • 24GB SSD (Samsung MZMPA024)
    • everything outside of /home, including libraries 
  • Fedora 19, x86_64
  • libxml2-2.9.1-1.fc19
  • GXml from git HEAD

Test Data

The test data was based on my updateinfo.xml files from yum, in particular the one found at: /var/cache/yum/x86_64/19/updates/gen/updateinfo.xml.  It contained 98743 different nodes over 11,136kB.  I created smaller and larger versions of it, resulting in

namenodessize (kB)
test3.xml22 2762 784
test4.xml47 7075 568
test5.xml98 74311 136
test6.xml197 48422 268
test7.xml394 96644 536
This testing could be improved by using diffferent types of files with different types of data.  Flatter ones versus deeper ones, for instance.  The different sizes were done by either duplicating the content within the root element or by deleting the second half of nodes inside the root element.  test5.xml represents the original updateinfo.xml


Three values were measured.  One was time taken to complete a task (like load a file), using g_get_monotonic_time, which reports in microseconds.  One was memory used by the task after it completed, using mallinfo, in particular the uordblks field (total allocated space), and one was memory leaks (also using mallinfo, after we freed memory).


I ran the tests once averaged over 10 trials for each combination of test and file, and then again over 25 trials.  Ways the procedure could be improved includes better isolation on the system from other processes, or providing more detail than the averaged scores, so we can detect any exceptional anomalies (e.g. some other process causes a file load to be delayed by hogging I/O).


Keep in mind that GXml wraps libxml2 for most functionality, so we don't expect it to be faster than libxml2, rather we want to see what penalty a GObject wrapper (written in Vala) causes.

Memory Leaks

GXml was leaking memory like a sieve before the summer.  (0.3.2 includes memory leak fixes without the API breaks!), so I wanted to know what memory was left after these tasks from both libxml2 and GXml.  Luckily, neither had any in the cases tested.  (That does not mean there aren't any!  Kudos to those who find them (and more to do who patch them)).


load disk
test3.xml 20814019236675841,1371
test4.xml 42604277484771521,1378
test5.xml 86151738980652171,1383
test6.xml 1722616571961260661,1385
test7.xml 3444835593922412801,1386
test3.xml 37547565131,5051
test4.xml 66747637970,9558
test5.xml 1442341610241,1164
test6.xml 2844882879111,0120
test7.xml 5614065649041,0062
load mem
test3.xml 24988568288660151,1552
test4.xml 51434229595238411,1573
test5.xml 1041920431206655881,1581
test6.xml 2083567302413307371,1583
test7.xml 3437910093915640271,1390
test3.xml 44199538601,2186
test4.xml 84215716950,8513
test5.xml 1729201847351,0683
test6.xml 3471573599091,0367
test7.xml 5726275555190,9701
test3.xml 25610245130,9572
test4.xml 52908491750,9294
test5.xml 96449983081,0193
test6.xml 1921971962951,0213
test7.xml 3843433951941,0282
test3.xml 273533931361921,1465
test4.xml 569649662877761,1038
test5.xml 11394656125928001,1051
test6.xml 22789264251855521,1051
test3.xml 22873267491,1695
test4.xml 46166545371,1813
test5.xml 932051113121,1943
test6.xml 1989882356451,1842


loading documents from disk
When it comes to loading a file from the disk, we compared xmlReadFile versus gxml_document_new_from_path (which uses xmlParseFile). 

Memory usage differences are consistently ~14% higher. 

Time-wise, on smaller files, GXml tasks up to 50% longer than using libxml2.  I'm not sure why test4.xml is miraculously lower from this run.  You can see that the larger the file, smaller the difference, which makes sense, since most of the hardwork is done by libxml2 anyway.

loading documents from memory

With memory, again, we see a consistent increase between ~14-16%.

Time-wise, again GXml oddly performs better on test4.xml.  Elsewise, we see the same trend: there is little difference with larger files.

saving to disk

We don't report memory differences because GXml's save functionality cleans up its use of xmlSaveCtxt before it exits, so we can't (easily) see how much we used.  Neither leak, so there is nothing to see there.

Time-wise, it seems to take about the same length of time, but GXml may be trending to more.  This could be due to tasks like synchronising data that is initially stored just in GXmlNodes and needs to be copied into the xmlDoc of libxml2 to make it to disk.


Memory-wise, we typically see an increase of ~10-15%.  Note that they failed to handle the stringification of the largest file, test7.xml, which requires further investigation.  Stringification was done with xmlDocDumpFormatMemory.

Time-wise, the increase was ~16-20%. 


Regarding memory usage, if you use GXml for cases such as these, you can expect around a 15% increase in memory usage.  That makes sense, as GObjects are used instead of the light C structures libxml2 typically does.  One benefit in hwrapping libxml2 is that we don't actually create a GXmlNode for every xmlNode in a document, only the ones we use, so a pure GObject implementation might use more memory.

Regarding time usage, the difference for some operations is small, a couple percent, and for others, the difference is larger with smaller files, as big as 50% when loading a smaller file.  Larger files in those cases (such as loading documents) see less and less of a penalty.

I feel as though for many common applications, these don't represent a significant penalty (time taken in loading large documents is still a few dozen milliseconds), and can be worth the benefits in using a GObject API.

Going forward

If you're interested in more about GXml's performance, the test suite will be in gxml/tests/performance/.  Feel free to submit new tests and test files.

Regarding GXml, HEAD will be pushed out in a new feature release including the API changes, fancy new features, and contributions from others, including Daniel Espinosa, Adam Ples, Simon Reimer, and others.


No comments:

Post a comment


#General #Microblog friends #Technology life gnome music google iaido guelph fedora vegan bugs food school linux technology #GNOME #School jodo blogger gxml #Budo #Photos work web nature happy vala firefox art Flesherton android anime internet home open source stress travel kendo kosmokaryote writing birthday dad science security animals canada computers environment future cookies development german language photos programming reading sick sleep snow video winter GUADEC cell phones css fun learning love me movies people phone picasaweb ta time christmas evolution vegetarianism #Vegan Toronto ai git gsoc identity society speech vancouver vlogbrothers adventure birds dreams facebook google+ gseta happiness libgdata netflix new zealand night responsibility skedge stars tea tv video games wind mobile Nintendo baking cake communication consumerism design fedora 17 javascript memories nlp organisation photography quote tablet uoguelph Josh Ritter animalia blogging books bug encryption family humanity magic meaning memory money pidgin rain recipes speechdispatcher sushi weather #Reading Spain TAing The Frames cat chocolate cold cycling death film flight genderguesser gitorious halloween health knowledge languages liv mail new years nightmares politics productivity psychology swords the legend of zelda ubuntu web development xml xorg youtube Thanksgiving acer bc busy conversation cooking duolingo emacs emusic fedora 18 galaxy nexus gay rights gmail japan libxml2 martial arts materialism mozilla nerdfighteria nostalgia privacy rhythmbox software sound space university upgrade valentines wahoo walking water Con-G Europe John Green Scott Pilgrim age animal welfare apple autumn bash blog brain brave breath of fire II calm camera canada day change clothing comments confidence conservation creativity culture dance dataloss djaqua duplicity e-mail emotion english errors feminism gdom germany goals google reader gtk humour intelligence japanese laundry law light math morning moving ottawa peterborough pets philosophy pie quality research sei do kai shopping spring style summer value village vday vonage web design website x11 #Life New York alone anime north anxiety argument backup budo buffy business cats computer science concert copyright data loss diy eating economy education energy exercise failure fedora 19 feelings file systems flowers freedom french games gdata greyhound growth habits heat history house html ice cream im information java joy koryu laptop living lost microsoft mood moon muffins mystery news nz pain photo php physics pirates pizza play poverty preupgrade progress purple python rae spoon reality reflection religion rss self serialisation sharing skating social sun synergy tachi uchi testing themes thesis thinking thoughts transit turtles veggie challenge velociraptors violin weekend weird yum zellers API Air Canada Empathy Grimes Hank Green Hugo Jane Austen Lord of the Rings Nexus One OCUS Sudbury Trick or Eat arboretum audible autonomous automobiles beauty bike blogs browsers camping cancer canoeing celebration charity chrome cleaning colour community content corporations crafts decay decor depression depth disaster drawing epic equality experience faery fest farmer's market fedora 12 fedora 16 fedora 20 fedora 22 fedup fireworks friend gender ghetto ghosts glib gnome blog gnome shell google talk green hair hobocore hungry icarus instant messaging interest introspection jobs last exile luks macbook mail-notification mario meat in vitro mind mom moon festival motivation mtp ninjas oh the humanity pagans pants papers past performance perl phones picnics pitivi plastic pride pumpkin pumpkin pie quiet thrill receipts rogers rpm seminar sewing simple simplicity sleep deprivation smells soy milk speech dispatcher sports stories story telling strange streamlines swimming telephone temperature texting thought thrift stores time management time travel tragedy truth understanding united states urban ecosystems usb veganism voice volunteering webschwerver wild wireless working world yojimbo zoology Avatar: The Last Airbender Blassreiter CIS*2750 CIS*6890 Czech Republic Diablo Dresden Codak Dunedin Dutch Blitz Electric Networked Vehicle Elliott Brood Ender's Game France Fringe GNOME 3 HTC Hayao Miyazaki Mario Kart Montréal Network Manager Newfoundland Nintendo Switch Ontario Ouran Host Club Richard SVC Samsung Samurai Champloo Santa Claus Studio Ghibli TCAF US academics adb aeroport algonquin amusing animal agriculture apartment ask automation awkward bad movies banana bats battery beard belladonna beta bicycle book branding breakfast brno bus buses buy nothing day cabin calgary candy cards cars catastrophe celebrate celtic chat cheap cheese childhood china chinese calendar cities clarity clean clock comics compassion compiler computer conspiracy theorists consumption context convention cookie cool cornerstone cosplay cottage country court creation cthulhu cupcakes curiosity cute dancing dark themes dbus definition deja-dup democracy despair detachment dinosaurs discomfort dns dodgeball dragon dress dust dystopia earth earth day efficiency eggs elections email enhanced history ethics evil exhausted expectations exploring ext3 ext4 fail fair trade fall fashion favourite feedly ferry focus formal free friendship fruit fudge full moon furniture gaelic game boards garden gardening gee generosity genetics gimp gir gobject good google hangouts google wave government grading gratitude green roofs groups gsec guerilla gardening haircut hakama help homosexuality honesty howl hp human rights humanitarianism humility hypocrisy ice images imaqua instagram integration intellectual property internet explorer jabber jazz jelly bean jokes kernel keyboard knife labs last exile: fam the silver wing laurena lazy letters library libxml livejournal lizzie bennet loneliness loss lovely lyrics maps maturity meditation melancholy metadata microbes microfinancing microwaves moon cake morality mother music concert muso jikiden eishin ryu myth namespaces nasa nerdfighter neural networks nintendo 3ds normal normality notes obsolescence oceans open open souce open standards panasonic paper parties patches peanut butter perception personal perspectives philanthropy plants pleasant poem politeness potluck preparation problems ptp pulseaudio quidditch racism recreate redundancy relationships relax repairs resizing richard's room roomba roses rsync running sad sadness salsa samurai sanity scary schwarting seasons self-esteem self-navigating car selinux semiformal senility sensitivity sentimental sheep ships silicon motion sleeping in sms social justice software engineering solitude solutions songs soup speed spelling ssh star wars strangers stupid success sunset surreality survival skills suspense sustainability sweet sympathy symphony tardigrades tasks teaching technical communication and research methods test tests thrift tim tams time and space tired tools tracker tradition tranquillity transience trees trust tumblr twitter update usability utopia via vihart vlog waffles warmth waste waterloo wave web comic webkit wii wiki winter is coming wizard wonder woods words xmpp yoga youth zoo #Gaming #Wishlist #anime #general 1. is anyone reading this? 1602 1984 2. you win a prize! 2008 2014 24fps 3. gimme a call to collect 404 A Short Hike All My Children Andy Griffith Argentina Armstrong House Avatar: The Legend of Korra BarTab Beach House Boston Boston Summit British Columbia Businesses C CIS*6050 Cambridge Christopher Plummer Claymore Creatures Darker than Black David Attenborough Dear Wendy Docking Station Dollhouse Earthbound England Excalibur February Fergus Final Fantasy IX Fire Emblem GError GSA Go Google Play Music Hunger Games I am not okay with this I believe in a thing called love I'm a wizard IRC Ikea Ireland JRR Tolkien King Arthur Lost Lagoon MIT Mac OS X Madrid March Massachusetts Matlock McGuinty Melodies of Life Merlin Michael Cera Mother Mother Mr. Tumnus Narnia Neil Gaiman New York Philharmonic Nick and Norah's Infinite Playlist Nintendorks Norns North Korea NotesFromNewYork Olympic OpenShot Orphen Orson Scott Card Oscars PEAP Pauline Johnson Pete Peterson Planet Fedora Porco Rosso Questionable Content R ROM Rent S SIM Wireless Sauble Beach Sega Sega Genesis Selenium Shakespeare She-Ra Snakes and Lattes Splatoon Star Trek Steve Grand Stranger Things ThanksLiving The Darkness The Devil is a Part-Timer The Fifth Estate The Guild The Hobbit The Stand Tianjin Tim Hortons Tolkien UI UK UX VPN Will Grayson Will Grayson Wolves in the Wall WordPerfect Xiki [General] abrt absolutism abuse academia accessibility active activism activity addiction adreama adrift adulthood advertisement advertising air airport express airship ajax al gore alarm clock aldiko alice in wonderland alien alistair summerlee amateur amazon ambience ambition amy winfrey anaconda and imperfection angle angry birds anhosting animation anon anonymity ant apache apology appearances appreciation aqualab arcade architecture arduino arrogance assassins assignments association analysis astrid asus eee top asynchronous ati attachment attitude attribution audio aural abuse authentication authenticity automake automarker avatars awesome b43 backpain backtrack3 backyard bounty bad bagel bandwidth banjo banks barbarians barefoot baseball bathroom beaches beautiful bed bees beetles being belief bellaqua benedict cumberbatch berlin bertrand russell bill gates biofabrication biology biometrics bit rot bitcoin black and white blame blockbuster bloomberg blue board games bohemian bold bon thé place bonds border boredom botany boxing day boy brain scoop brickworks broadcom broccoli browsing bubbles bubbly buildings bunnies burn bus stops butterflies buttons c# c++ cafe calendaring calligraphy camel camera obscura cameras canadian english canopy capitalism captivity careless caring cast causality cbc cedar row cello censorship certainty cgi chalk challenger changing locks chaos theory charm cherry blossoms chickadee chickens chivalry choir chopsticks chores christchurch christianity chudan church cijf cinnamon classes clif clorox clorox green works cloud cloud atlas clubs cname coca cola codeine codeviz coincidence coins color comfort commons communism competence competition competitive coughing completeness compliments conference configuration conflicted confusion consciousness consent conservatives conservativism console construction contagion contest contributing convenience corpses cough suppressants coughing coupons courageous crashes crates crayons crazy creative commons criminals crisps criticism crosscanada crowd crtc cry crying cryptic cryptozoology csh cuddles cult currency current tv curse customer service customisation cvs daily grind data data mining databases dating david bowie dconf debate debug symbols debugging delicious design patterns desktop desktop summit destiny dftba diet difficult digimon digital receipts disabilities disappointment discordianism discoverability dispute dissection kit distraction diyode dnf doctor who doctors documentation dokuwiki doubt doughnut dpkg drab drano drano prevention dream dreaming drinking drm drowning dryers drying dtwydt ducks dvds dying dynamic typing ease easter easy ebony jewelwing ebooks ecards economics editors eeetop el paso elder neglect electronic receipts elements elitism ellen page embarrassment emily graslie emptiness empty enchant end of enterprising environmental science symposium eog epiphany eplugin equipment essentialism ether euphoria evoaqua experiment experimenting expertise extensions extortion facades faith falafel familiarity fan fancy fantasy fascism faun favicon fears fedora 11 feed me feedback festival fibonacci fiction fiddler crab field guide field identification figment figures of speech file formats finances fire fish fitness fixing flac flash light flesherton fling flexibility flour flow flying fonts footprints forceps forgottotagit fork fortunate fortune found fragaria frameworks fraud fred penner free time freezing french fries fresh friday friend's wedding frog fspot funding funerals funny fury fuse gargoyles gdb geek geeks gf3 gi gifts gio gitlab gjs glass globalnewtgames glory gloves glue gluten gm gmo gnome keyring gnome software gnome-control-center go ninja go go transit goat gods goodbye goodfella's google assistant google books google calendar google wallet gp2x gqe grad graffiti grammar graphing graphviz grass green beaver grey county groceries growing up gtest gtg guts gypsies habit hal halls hard hard drive hard drives hardship hardware harry potter hdtv heart heart break heaven 17 hemlock grove hewlett packard hijinx hiking hoaxes hobbies holidays homelessness homework honey badgers honour horatio hornblower horror hostels hosting hot house of cards hp lovecraft hugs humblebundle humbleness hunting hyperlinking hyrule i am a carpet ibm thinkpad x41 icalendar ice cream sandwich ice rain icthyology ignorant ill image image editing imagination impermanence inadequacy inaturalist inconvenience independence india individuals industry infinity ingrid michaelson inhumanity injuries ink innovation insects installation intel interactivity interlocutor internet tv invertabrates io irish irony isolation it it is indigo james bond java 13 jedi jikiden joke journalism journey judgement julian assange julie thiel justice kata kayak keys ki-ai killme kim taylor kinder kindness kirby kitchen kitzl kiva knights knots kodak koodo kung fu labels landau sacamoto late laundromat led legend lending lenovo lessons letsencrypt letstrace letter writing liberalism liberals libnotify libreoffice librpm lifehacker lilo limericks limits linksys liquid lists live wallpapers livecd liveusb loans local local food local install login london losher lots of hugs mac mini machine learning machine vision madness mae magic school bus magical maintainership majesty malaria malls mantis shrimp marine life marketing marking massages matrices maturation may seminar meat media medicine mel's diner memory leaks mental health meow mercy messaging metacity metaphor methodology mezzo forte micropayments mild mild weather military milk mindhacks minimalism misanthropy miscellany misery misfortune missed the boat missing mlp modelling moisture mold molly parker monitors monologue more cats mosquitoes moss mother's day mounting mouse moxies muffin muffinfilms mundane murder museum mushishi mushroom soup mushrooms musicals mutual funds my slumbering heart mysql nameservers nanowrimo national treasure natural language processing naturalism nausicaa navigating necessity neighbours nervous netgear network new new users newspaper hat next year ninja turtles nodelist nointernet noise noisy nominate non-root norse noses not really dying notebooks notification-daemon novels november fair nuclear war numbers numix obama obligation obliviousness obscure ocz ogg oggenc olap olive omote open formats open music openness openoffice optimisation optimism orcas orchestra oreo oreos org-mode origami oscar otr overheat owen sound package management packagekit packing paint shedding pan pancakes panda parallelism paranoia passport patents patience pattern recognition pdo peace peaceful pen pence pender penguins penmanship perfection pet rocks physical piano pickman's model picnik pidgin plugins pikmin pintsize pipelight pirate festival pizza hut plagiarism planning plans playground playlists plumbing plushies podcast poetry points pokemon polls pomplamoose positions posse post posters postmodernism potatoes potlucks power ppc practise prejudice premier pressure pretty pride and prejudice priorities private processes professionalism projects promise protest proud purchases qt quarantine rad radeon railroad randall munroe raop rats reagan recursion recycling redhat reductionism refactoring refrigerators regret relativism release renew renfrew repetition report resolutions resolve resumes reuse reuters reviews revolution rhino rhps ricola risk road trips roar robots rockwood rot rover rtm ruby day ryu safety sanctuary sand satisfaction savages scary movies scheduling schneier scholarships scooters scp screenshots script seals search secret world of arrietty secrets seitei self-interest self-respect self-sufficiency self-worth semesters senescence sessions setbuilder settlers of catan sftp shame sheepo pistachio sheila patek shell shells sherlock holmes shipping shogun shotwell shoulder bag sigh sim city simafort simpsons sincerity singing sjr skill skunks sky slackware slashdot sliver small smartphones smiling snails sneezing snowboarding soccer social dance social media socis soft solemn someonesmotherwantstoadoptme song sony sophistication sorbet sorrow sparklers speed river spell spellchecking spies spilt milk splendid splendor splinter spoilers sponges sql squaresville sr ssd sshd stanley park starry night starving steampunk storage strawberries strength structured information struggle stuff stylus suburi sucks sugar super mario super mario land 3d superiority superstition surprise surprises surreal sushi surrender swings systemd systems tabs tachi uchi no kurai tail coats tameshigiri tarot taxes tears technocracy teddy bears tedtalk term termcap terror the duke the fault in our stars the hulk the human league the irregular at magic high school the onion theatre theory thingsidon'twanttodo tim berners-lee tim mcgraw timber timbre timeliness tin tin toaster todo toilets tolerance tonight toomuch touch screen touchpack tour tourniquet towels toys trac trailer translation travel buddy treestyle view trex triumf triumph trivia trouble tweak twist tx2500 tx2617 typing ugly logos umbrellas un dinaru underwold unemployment universe unlimited blade works updates upgrades uploading urban agriculture urban ecology urchins user experience vagrancy vagrant vague but exciting valadoc validation values vampires vanilla ice variety vegetables velvet burger verb version control vi vinegar violence voip vpnc vulnerable waf wandering wanting war warm weapons web hosting webcomic webcomics webfonts werewolves whales what a wonderful town whatsbetter whic are also lazer powered white spot wifi wii u wikisource will williams wings wisdom wishes wizardry wolf wonderland wordplay world cup world water day writing voice xenophobia xephyr xinput xkcd xpath yahoo yay yyz z-index

Blog Archive