2013-09-25

[GNOME] Final Report for GXml in the 2013 Google Summer of Code

The Google Summer of Code has ended, and GXml is spoiled with the fruits of labour:
  • the autotools build system has improved
  • documentation is more complete and more accurate
  • many new examples across most classes, especially for C and JavaScript
  • many bugs were flushed out and fixed (e.g. attribute syncing between underlying libxml2 xmlNodes and GXmlElements)
  • it has a mailing list (gxml-list@gnome.org)
  • new stuff
    • document child management, node cloning
  • new memory tests
  • new error handling model
  • new memory handling model (fixing leaks and improving performance!)
  • improved API compliance
  • bug-fix release (0.3.2) without API breaks
  • imminent 0.4.0 with API breaks (pending some updated patches for XPath, Serialization, etc)
I've talked about those before (near the start and while at GUADEC) so for my report I'm going to focus on the outcome in terms of performance.

Look forward to 0.4.0 imminently, and happy hacking.

GXml's performance versus pure libxml2

One question people have had is the difference in performance between libxml2 and GXml, since GXml currently wraps it.  Things should be worse, as there's typically more code for each operation, but how large will the penalty be and will it matter for you?

Tests

I created a simple test suite with the four following tasks:
  1. loading a file from disk
  2. loading a file from memory
  3. stringifying a document
  4. saving a document to disk
The test suite is highly modular, and it's easy to add new tests.  For each test, you define a setup function, a test function (the measured test), and a cleanup function.  So if you'd like to see anything else in particular tested, let me know.

Environment


I've run it on a Lenovo ThinkPad Twist S230u with the following configuration
  • Intel® Core™ i5-3317U CPU @ 1.70GHz × 4 
  • 4GB RAM, SODIMM DDR3 Synchronous 1333 MHz (0,8 ns)
  • 500GB HD @ 5400 RPM (HGST HTS725050A7)
    • /home, including test files
  • 24GB SSD (Samsung MZMPA024)
    • everything outside of /home, including libraries 
  • Fedora 19, x86_64
  • libxml2-2.9.1-1.fc19
  • GXml from git HEAD

Test Data

The test data was based on my updateinfo.xml files from yum, in particular the one found at: /var/cache/yum/x86_64/19/updates/gen/updateinfo.xml.  It contained 98743 different nodes over 11,136kB.  I created smaller and larger versions of it, resulting in


namenodessize (kB)
test3.xml22 2762 784
test4.xml47 7075 568
test5.xml98 74311 136
test6.xml197 48422 268
test7.xml394 96644 536
This testing could be improved by using diffferent types of files with different types of data.  Flatter ones versus deeper ones, for instance.  The different sizes were done by either duplicating the content within the root element or by deleting the second half of nodes inside the root element.  test5.xml represents the original updateinfo.xml

Measurements

Three values were measured.  One was time taken to complete a task (like load a file), using g_get_monotonic_time, which reports in microseconds.  One was memory used by the task after it completed, using mallinfo, in particular the uordblks field (total allocated space), and one was memory leaks (also using mallinfo, after we freed memory).

Procedure

I ran the tests once averaged over 10 trials for each combination of test and file, and then again over 25 trials.  Ways the procedure could be improved includes better isolation on the system from other processes, or providing more detail than the averaged scores, so we can detect any exceptional anomalies (e.g. some other process causes a file load to be delayed by hogging I/O).

Results

Keep in mind that GXml wraps libxml2 for most functionality, so we don't expect it to be faster than libxml2, rather we want to see what penalty a GObject wrapper (written in Vala) causes.

Memory Leaks

GXml was leaking memory like a sieve before the summer.  (0.3.2 includes memory leak fixes without the API breaks!), so I wanted to know what memory was left after these tasks from both libxml2 and GXml.  Luckily, neither had any in the cases tested.  (That does not mean there aren't any!  Kudos to those who find them (and more to do who patch them)).

Results


datalibxml2gxmldiff
load disk
memory
test3.xml 20814019236675841,1371
test4.xml 42604277484771521,1378
test5.xml 86151738980652171,1383
test6.xml 1722616571961260661,1385
test7.xml 3444835593922412801,1386
time
test3.xml 37547565131,5051
test4.xml 66747637970,9558
test5.xml 1442341610241,1164
test6.xml 2844882879111,0120
test7.xml 5614065649041,0062
load mem
memory
test3.xml 24988568288660151,1552
test4.xml 51434229595238411,1573
test5.xml 1041920431206655881,1581
test6.xml 2083567302413307371,1583
test7.xml 3437910093915640271,1390
time
test3.xml 44199538601,2186
test4.xml 84215716950,8513
test5.xml 1729201847351,0683
test6.xml 3471573599091,0367
test7.xml 5726275555190,9701
save
time
test3.xml 25610245130,9572
test4.xml 52908491750,9294
test5.xml 96449983081,0193
test6.xml 1921971962951,0213
test7.xml 3843433951941,0282
stringify
memory
test3.xml 273533931361921,1465
test4.xml 569649662877761,1038
test5.xml 11394656125928001,1051
test6.xml 22789264251855521,1051
time
test3.xml 22873267491,1695
test4.xml 46166545371,1813
test5.xml 932051113121,1943
test6.xml 1989882356451,1842




Discussion

loading documents from disk
 
When it comes to loading a file from the disk, we compared xmlReadFile versus gxml_document_new_from_path (which uses xmlParseFile). 

Memory usage differences are consistently ~14% higher. 

Time-wise, on smaller files, GXml tasks up to 50% longer than using libxml2.  I'm not sure why test4.xml is miraculously lower from this run.  You can see that the larger the file, smaller the difference, which makes sense, since most of the hardwork is done by libxml2 anyway.

loading documents from memory

With memory, again, we see a consistent increase between ~14-16%.

Time-wise, again GXml oddly performs better on test4.xml.  Elsewise, we see the same trend: there is little difference with larger files.

saving to disk

We don't report memory differences because GXml's save functionality cleans up its use of xmlSaveCtxt before it exits, so we can't (easily) see how much we used.  Neither leak, so there is nothing to see there.

Time-wise, it seems to take about the same length of time, but GXml may be trending to more.  This could be due to tasks like synchronising data that is initially stored just in GXmlNodes and needs to be copied into the xmlDoc of libxml2 to make it to disk.

stringification

Memory-wise, we typically see an increase of ~10-15%.  Note that they failed to handle the stringification of the largest file, test7.xml, which requires further investigation.  Stringification was done with xmlDocDumpFormatMemory.

Time-wise, the increase was ~16-20%. 

Conclusion

Regarding memory usage, if you use GXml for cases such as these, you can expect around a 15% increase in memory usage.  That makes sense, as GObjects are used instead of the light C structures libxml2 typically does.  One benefit in hwrapping libxml2 is that we don't actually create a GXmlNode for every xmlNode in a document, only the ones we use, so a pure GObject implementation might use more memory.

Regarding time usage, the difference for some operations is small, a couple percent, and for others, the difference is larger with smaller files, as big as 50% when loading a smaller file.  Larger files in those cases (such as loading documents) see less and less of a penalty.

I feel as though for many common applications, these don't represent a significant penalty (time taken in loading large documents is still a few dozen milliseconds), and can be worth the benefits in using a GObject API.

Going forward

If you're interested in more about GXml's performance, the test suite will be in gxml/tests/performance/.  Feel free to submit new tests and test files.

Regarding GXml, HEAD will be pushed out in a new feature release including the API changes, fancy new features, and contributions from others, including Daniel Espinosa, Adam Ples, Simon Reimer, and others.

Cheerio!

Labels

#General #Microblog friends life #Technology gnome music google iaido guelph fedora vegan bugs food school linux technology #GNOME #School jodo gxml #Budo blogger #Photos work nature web happy vala art firefox Flesherton android anime internet open source stress travel home kendo kosmokaryote writing birthday dad science animals computers environment future security canada cookies development german photos programming reading sick sleep snow video winter GUADEC cell phones css fun language learning me people phone picasaweb ta time christmas evolution love movies vegetarianism Toronto gsoc identity society speech vancouver vlogbrothers #Vegan adventure ai birds dreams git google+ gseta happiness libgdata new zealand night responsibility skedge stars tea wind mobile Nintendo baking cake consumerism fedora 17 memories netflix nlp photography quote tablet uoguelph Josh Ritter books bug communication design facebook family humanity javascript magic meaning memory money organisation pidgin rain recipes speechdispatcher sushi tv weather Spain TAing animalia cat chocolate cold cycling death encryption film flight genderguesser halloween health knowledge languages liv mail new years nightmares politics productivity psychology swords the legend of zelda ubuntu video games web development xml xorg youtube Thanksgiving The Frames acer bc blogging busy conversation cooking duolingo emacs emusic fedora 18 galaxy nexus gay rights gitorious gmail japan libxml2 martial arts materialism nerdfighteria privacy rhythmbox software sound space university upgrade valentines wahoo walking Con-G Europe John Green Scott Pilgrim age animal welfare apple autumn bash brain brave breath of fire II calm camera canada day change clothing comments confidence conservation creativity culture dance dataloss djaqua duplicity e-mail errors feminism gdom germany goals google reader gtk humour intelligence japanese laundry law light math morning mozilla nostalgia ottawa peterborough pets philosophy pie quality research sei do kai shopping spring style summer value village vday vonage water web design website x11 #Life New York alone anime north anxiety argument backup blog budo buffy business cats computer science concert data loss diy eating economy education energy english exercise failure fedora 19 file systems flowers freedom french games gdata greyhound growth habits heat history house html ice cream im information joy koryu laptop living lost microsoft moon moving muffins mystery news nz pain photo php physics pirates pizza play poverty preupgrade progress purple python reality reflection religion rss self serialisation sharing skating social sun synergy tachi uchi testing themes thesis thinking thoughts transit turtles veggie challenge velociraptors violin weekend weird yum zellers API Air Canada Empathy Hank Green Hugo Jane Austen Lord of the Rings Nexus One OCUS Sudbury Trick or Eat arboretum audible autonomous automobiles beauty bike blogs browsers camping cancer canoeing celebration charity chrome colour community content copyright corporations crafts decay decor depression depth disaster drawing emotion epic equality experience faery fest farmer's market fedora 12 fedora 16 fedora 20 fedora 22 fedup feelings fireworks friend gender ghetto ghosts glib gnome blog gnome shell google talk green hair hobocore hungry icarus interest introspection java jobs last exile luks macbook mail-notification mario meat in vitro mind mom mood moon festival motivation mtp ninjas oh the humanity pagans pants papers past performance perl phones picnics pitivi plastic pride pumpkin pumpkin pie quiet thrill rae spoon receipts rogers rpm seminar sewing simple simplicity sleep deprivation smells soy milk speech dispatcher sports stories story telling strange streamlines swimming telephone temperature texting thrift stores time management time travel tragedy truth understanding united states urban ecosystems usb veganism voice volunteering webschwerver wild wireless working world yojimbo zoology Avatar: The Last Airbender Blassreiter CIS*2750 CIS*6890 Czech Republic Diablo Dresden Codak Dunedin Dutch Blitz Electric Networked Vehicle Elliott Brood Ender's Game France Fringe GNOME 3 Grimes HTC Hayao Miyazaki Mario Kart Montréal Network Manager Newfoundland Ontario Ouran Host Club Richard SVC Samsung Samurai Champloo Santa Claus Studio Ghibli TCAF US academics adb aeroport algonquin amusing animal agriculture apartment automation awkward bad movies banana bats battery beard belladonna beta bicycle book branding breakfast brno bus buses buy nothing day cabin calgary candy cards cars catastrophe celebrate celtic chat cheap cheese childhood china chinese calendar cities clarity clean cleaning clock comics compassion compiler computer conspiracy theorists consumption context convention cookie cool cornerstone cosplay cottage country court creation cthulhu cupcakes curiosity cute dancing dark themes dbus definition deja-dup democracy despair dinosaurs discomfort dns dodgeball dragon dress dust dystopia earth earth day efficiency eggs elections email enhanced history ethics evil exhausted expectations exploring ext3 ext4 fail fair trade fall fashion favourite feedly ferry focus formal free friendship fruit fudge full moon furniture gaelic game boards garden gardening gee generosity genetics gimp gir gobject good google hangouts google wave government grading gratitude green roofs groups gsec guerilla gardening haircut hakama help homosexuality honesty howl hp human rights humanitarianism humility hypocrisy ice images imaqua instant messaging integration internet explorer jabber jazz jelly bean jokes kernel keyboard knife labs last exile: fam the silver wing laurena lazy letters library libxml livejournal lizzie bennet loneliness loss lovely lyrics maps maturity meditation melancholy metadata microbes microfinancing microwaves moon cake morality mother music concert muso jikiden eishin ryu myth namespaces nasa nerdfighter neural networks nintendo 3ds normal normality notes obsolescence oceans open open souce open standards panasonic paper parties patches peanut butter perception personal perspectives philanthropy plants pleasant politeness potluck preparation problems ptp pulseaudio quidditch racism recreate redundancy relationships relax repairs resizing richard's room roomba roses rsync running sad sadness salsa samurai sanity scary schwarting seasons self-esteem self-navigating car selinux semiformal senility sensitivity sentimental sheep ships silicon motion sleeping in sms social justice software engineering solitude solutions songs soup speed ssh star wars strangers stupid success sunset surreality survival skills suspense sustainability sweet sympathy symphony tardigrades tasks teaching technical communication and research methods test tests thought thrift tim tams time and space tired tools tracker tradition tranquillity transience trees trust tumblr twitter update usability utopia via vihart vlog waffles warmth waste waterloo wave web comic webkit wii wiki winter is coming wizard wonder woods words xmpp yoga youth zoo #Wishlist 1. is anyone reading this? 1602 1984 2. you win a prize! 2014 24fps 3. gimme a call to collect 404 All My Children Andy Griffith Argentina Armstrong House Avatar: The Legend of Korra BarTab Beach House Boston Boston Summit Businesses C CIS*6050 Cambridge Christopher Plummer Claymore Creatures Darker than Black David Attenborough Dear Wendy Docking Station Dollhouse Earthbound England Excalibur February Fergus Final Fantasy IX GError GSA Go Google Play Music Hunger Games I believe in a thing called love I'm a wizard IRC Ikea Ireland JRR Tolkien King Arthur MIT Mac OS X Madrid March Massachusetts Matlock McGuinty Melodies of Life Merlin Michael Cera Mother Mother Mr. Tumnus Narnia Neil Gaiman New York Philharmonic Nick and Norah's Infinite Playlist Nintendorks Norns North Korea NotesFromNewYork Olympic OpenShot Orphen Orson Scott Card Oscars PEAP Pete Peterson Planet Fedora Porco Rosso Questionable Content R ROM Rent S SIM Wireless Sauble Beach Selenium Shakespeare Snakes and Lattes Splatoon Star Trek Steve Grand Stranger Things ThanksLiving The Darkness The Devil is a Part-Timer The Fifth Estate The Guild The Hobbit The Stand Tianjin Tim Hortons Tolkien UK UX VPN Will Grayson Will Grayson Wolves in the Wall WordPerfect Xiki abrt absolutism abuse academia accessibility active activism activity addiction adreama adrift adulthood advertisement advertising air airport express airship ajax al gore alarm clock aldiko alice in wonderland alien alistair summerlee amateur amazon ambience ambition amy winfrey anaconda and imperfection angle angry birds anhosting animation anon anonymity ant apache apology appearances appreciation aqualab arcade architecture arduino arrogance ask assassins assignments association analysis astrid asus eee top asynchronous ati attitude attribution aural abuse authentication authenticity automake automarker avatars awesome b43 backtrack3 backyard bounty bad bagel bandwidth banjo banks barbarians barefoot baseball bathroom beaches beautiful bed bees beetles being belief bellaqua benedict cumberbatch berlin bertrand russell bill gates biofabrication biology biometrics bit rot bitcoin black and white blame blockbuster bloomberg blue board games bohemian bold bon thé place bonds border boredom botany boxing day boy brain scoop brickworks broadcom broccoli browsing bubbles bubbly buildings bunnies burn bus stops butterflies buttons c# c++ cafe calendaring calligraphy camel camera obscura cameras canopy capitalism captivity careless caring cast causality cbc cedar row cello censorship certainty cgi chalk challenger changing locks chaos theory charm cherry blossoms chickadee chickens chivalry choir chopsticks chores christchurch christianity chudan church cijf cinnamon classes clif clorox clorox green works cloud cloud atlas clubs cname coca cola codeine codeviz coincidence coins color comfort commons communism competence competition competitive coughing completeness compliments conference configuration conflicted confusion consciousness consent conservatives conservativism console construction contagion contest contributing convenience corpses cough suppressants coughing coupons courageous crashes crates crayons crazy creative commons criminals crisps criticism crosscanada crowd crtc cry crying cryptic cryptozoology csh cuddles cult currency current tv curse customer service customisation cvs daily grind data data mining databases dating david bowie dconf debate debug symbols debugging delicious design patterns desktop desktop summit destiny detachment dftba diet difficult digimon digital receipts disabilities disappointment discordianism dispute dissection kit distraction diyode dnf doctor who doctors documentation dokuwiki doubt doughnut dpkg drab drano drano prevention dream dreaming drinking drm drowning dryers dtwydt ducks dvds dying dynamic typing ease easter easy ebony jewelwing ebooks ecards economics editors eeetop el paso elder neglect electronic receipts elements elitism ellen page embarrassment emily graslie emptiness empty enchant end of enterprising environmental science symposium eog epiphany eplugin equipment essentialism ether euphoria evoaqua experiment experimenting expertise extensions extortion facades faith falafel familiarity fan fancy fantasy fascism faun favicon fears fedora 11 feed me feedback festival fibonacci fiction fiddler crab field guide field identification figment figures of speech file formats finances fire fish fitness fixing flac flash light flesherton fling flexibility flour flow flying fonts footprints forceps forgottotagit fork fortunate fortune found fragaria frameworks fraud fred penner free time freezing french fries fresh friday friend's wedding frog fspot funding funerals funny fury fuse gargoyles gdb geek geeks gf3 gi gifts gio gjs glass globalnewtgames glory gloves glue gluten gm gmo gnome keyring gnome software gnome-control-center go ninja go go transit goat gods goodbye goodfella's google books google wallet gp2x gqe grad graffiti grammar graphing graphviz grass green beaver grey county groceries growing up gtest gtg guts gypsies habit hal halls hard hard drive hard drives hardship hardware harry potter hdtv heart heart break heaven 17 hemlock grove hewlett packard hijinx hiking hoaxes hobbies holidays homelessness homework honey badgers honour horatio hornblower horror hostels hosting hot house of cards hp lovecraft hugs humblebundle humbleness hunting hyperlinking hyrule i am a carpet ibm thinkpad x41 icalendar ice cream sandwich ice rain icthyology ignorant ill image image editing imagination impermanence inadequacy inaturalist inconvenience independence india individuals industry infinity ingrid michaelson inhumanity injuries ink innovation insects instagram installation intel intellectual property interactivity interlocutor internet tv invertabrates io irish irony isolation it it is indigo james bond jedi jikiden joke journalism journey judgement julian assange julie thiel justice kata kayak keys ki-ai killme kim taylor kinder kindness kirby kitchen kiva knights knots kodak koodo kung fu labels landau sacamoto late laundromat led legend lending lenovo lessons letstrace letter writing liberalism liberals libnotify libreoffice librpm lifehacker lilo limericks limits linksys liquid lists live wallpapers livecd liveusb loans local local food local install login london losher lots of hugs mac mini machine learning machine vision madness mae magic school bus magical maintainership majesty malaria malls mantis shrimp marine life marketing marking massages matrices maturation may seminar meat media medicine mel's diner memory leaks mental health meow mercy metacity metaphor methodology mezzo forte micropayments mild mild weather military milk mindhacks minimalism misanthropy miscellany misery misfortune missed the boat missing mlp modelling moisture mold molly parker monitors monologue more cats mosquitoes moss mother's day mounting mouse moxies muffin muffinfilms mundane murder museum mushishi mushroom soup mushrooms musicals mutual funds my slumbering heart mysql nameservers nanowrimo national treasure natural language processing naturalism nausicaa navigating necessity neighbours nervous netgear network new new users newspaper hat next year ninja turtles nodelist nointernet noise noisy nominate non-root norse noses not really dying notebooks notification-daemon novels november fair nuclear war numbers numix obama obligation obliviousness obscure ocz ogg oggenc olap olive omote open formats open music openness openoffice optimisation optimism orcas orchestra oreo oreos org-mode origami oscar otr overheat owen sound package management packagekit packing paint shedding pan pancakes panda parallelism paranoia passport patents patience pattern recognition pdo peace peaceful pen pence pender penguins penmanship perfection pet rocks physical piano pickman's model picnik pidgin plugins pikmin pintsize pipelight pirate festival pizza hut plagiarism planning plans playground playlists plumbing plushies podcast poem poetry points pokemon pomplamoose positions posse post posters postmodernism potatoes potlucks power ppc practise prejudice premier pressure pretty pride and prejudice priorities private processes professionalism projects promise protest proud purchases qt quarantine rad radeon railroad randall munroe raop rats reagan recursion recycling redhat reductionism refactoring refrigerators regret relativism release renew renfrew repetition report resolutions resolve resumes reuse reuters reviews revolution rhino rhps ricola risk road trips roar robots rockwood rot rover rtm ruby day ryu safety sanctuary sand satisfaction savages scary movies scheduling schneier scholarships scooters scp screenshots script seals search secret world of arrietty secrets seitei self-interest self-respect self-sufficiency self-worth semesters senescence sessions setbuilder settlers of catan sftp shame sheepo pistachio sheila patek shell shells sherlock holmes shipping shogun shotwell shoulder bag sigh sim city simafort simpsons sincerity singing sjr skill skunks sky slackware slashdot sliver small smiling snails snowboarding soccer social dance social media socis soft solemn someonesmotherwantstoadoptme song sony sophistication sorbet sorrow sparklers speed river spell spellchecking spelling spies spilt milk splendid splendor splinter spoilers sql squaresville sr ssd sshd stanley park starry night starving steampunk storage strawberries strength structured information struggle stuff stylus suburi sucks sugar super mario super mario land 3d superiority superstition surprise surreal sushi surrender swings systemd systems tabs tachi uchi no kurai tail coats tameshigiri tarot taxes tears technocracy teddy bears tedtalk term termcap terror the duke the fault in our stars the hulk the human league the irregular at magic high school the onion theatre theory thingsidon'twanttodo tim berners-lee tim mcgraw timber timbre timeliness tin tin toaster todo toilets tolerance tonight toomuch touch screen touchpack tour tourniquet toys trac trailer translation travel buddy treestyle view trex triumf triumph trivia trouble tweak twist tx2500 tx2617 typing ugly logos umbrellas un dinaru underwold unemployment universe unlimited blade works updates upgrades uploading urban agriculture urban ecology urchins user experience vagrancy vagrant vague but exciting valadoc validation values vampires vanilla ice variety vegetables velvet burger verb version control vi vinegar violence voip vpnc vulnerable waf wandering wanting war warm weapons web hosting webcomic webcomics webfonts werewolves whales what a wonderful town whatsbetter whic are also lazer powered white spot wifi wii u wikisource will williams wings wisdom wishes wizardry wolf wonderland wordplay world cup world water day writing voice xenophobia xephyr xinput xkcd xpath yahoo yay yyz z-index

Blog Archive

About Me

My photo

I am aeronautical, vanship-style.  I am olympic and mythical.  I rest on my laurels.