programming | Robotic Librarian

Z39.50

August 7, 2007

Now that the standard search interfaces of the Internet are so common, there is a lot more awareness of raw information. Sure, there’s always been some lip service intimating that knowledge is power, but people by and large still respond much more quickly to money and guns. Yet there is an obscene amount of cash money in information, and now that many of the practices of librarianship, minus any complicated overarching values, are recognizable cash cows, the terms of information dissemination are becoming quite different.

We librarians should have realized what was happening much earlier, and acted upon it. Take the NAICS for example. Previously just the Standard Industrial Classification (SIC) code for compartmentalizing business practice, by March 31, 1993 the U.S. Office of Management and Budget decided to work in tandem with Canada and Mexico to update the structure of industry classification.

Moving from a 4 digit identification number to a 5 or 6 digit number, the new parameters are now the North American Industry Classification System, or NAICS. Numerous industries were redefined in order to allow for more flexibility and specificity when compiling statistical data of related industries; for example, the SIC Transportation, Communications, and Utilities sector is now divided up into several NAICS divisions, such as Utilities and Transportation, Transportation and Warehousing and Utilities.

Amazingly, in SIC speak there was only one sector for Service Industries. For one, that hints at the major sea change society has undergone in its transition from a manufacturing, industrial and agricultural firmament. Perhaps you hadn’t set your clock to signal the date, but on Wednesday May 23rd, 2007 the human population officially became more urban than rural. Assuming that the Maya Calendar’s end date in 2012 does not mean that society as we know it will end, then according to UN estimates there will be 5 billion city dwellers by 2030.

Of course, both China and Warren Buffett have been warning us about overpopulation for years, and even more alarming to some, Buffett has been short selling against the American dollar, George Soros style, for a few years now. Nobody’s been warning us about over-city dwelling, though… Hopefully birth rates don’t force a Sprogopolis, Baby-Powered City of the Future upon us, where “The only good baby is a working baby”

So the NAICS code, while recognizing 79 new manufacturing industries, and reorganizing the Retail and Wholesale trade sectors, simultaneously adopted an entirely new sector which should have raised an unholy din, or at least convened a Committee of Concern, in the library world. The Information Sector, or area 51 (sorry, I couldn’t resist). Sector 51 is described on the NAICS website as “perhaps the most important change” in recognition, wherein it encompasses 34 industries of which 20 are wholly new. This change was descriptive and after-the-fact, and it should have acted as a final wake up call, spurring innovation and collaboration on a large scale in library-land. Instead, at least from the crow’s nest of library school, we are still six catalogs in search of an identity.

Five categories of recognition are outlined, differentiating 51 from other traditional industrial designation. They are (to paraphrase):

Unlike traditional goods, an ”information or cultural product” does not necessarily have tangible qualities
Unlike traditional services, the delivery of these products does not require direct contact between the supplier and the consumer.
The value of these products to the consumer lies in their informational, educational, cultural, or entertainment content, not in the format in which they are distributed. Most of these products are protected from unlawful reproduction by copyright laws.
The intangible property aspect necessitates that only those possessing the rights to these works are authorized to reproduce, alter, improve, and distribute them.
Distributors of information and cultural products can easily add value to the products they distribute.

I’m new to library science, but it makes me wonder about the chicken and the egg. One of our most popular catch phrases (which, I must admit, makes me vaguely nauseous at the sound of it, but that’s another post entirely) is to add value, or just value-added, as in value-added services where we go “above and beyond” the normal service interaction. I challenge you to read a single library science text from the past ten years where that phrase isn’t used ad infinitum. But there it is, in point 5…is the phrase that generic, or is it another way that we are playing catch-up with business?

The more I learn about library administrative organization, and about the lack of collective communication in the past, the more embarrassed I become. Especially when thinking about things like the review features on Amazon.com, which is something that should’ve already existed in library-land long before those cheeky upstarts. We comprise some of the most engaged, learned, passionate book lovers on the planet, and readers’ advisory is our very bread and butter on the public level, but there was no national communicative network before the age of the Internet? No library sponsored über-magazine or BBS or pamphlet or bi-annual that gave voice to our patrons through their patronage of the library?

When I bring this up with my professors, they often cite two factors, cost and resources. The concern about cost I feel I can pretty much deflate from the outset. Is it not costlier to have to fight for public monies, to have to devote higher-paid administrative time to lobbying and glad-handing rather than fostering public goodwill by offering a deeper level of involvement? Did not most libraries already take part in OCLC, or use MARC records or some other collectively managed and federated search & organization mechanism that could easily have been adapted for use by our patrons? Something that could be outfitted with images and reviews as well as cataloging records? Why are we still discussing the need to keep content separate from design? We know that reprogramming one CSS or php or Perl document is far easier than updating thousands of HTML, XHTML and XML pages. We should know about regular expressions as well, right? Why should any of that cost us anything?

Resources are a different matter entirely, and especially with current CIPA laws which potentially disenfranchise poorer library districts that try to maintain unfiltered web access, and impending filtration legislation overall, access can sometimes be compromised. That, and we are only now learning how to market ourselves, and are suffering for waiting too long to do so. It’s hard to convince the public to allocate their monies to us, harder still if we have not maintained a good relationship. Forget about ROI — do not neglect a return on emotional investment, okay? (A tip of the pen to Bill Crowley) Not that we should adopt business models, for there is no way that libraries can reasonably compete with billion dollar business interests, and I reject the idea of talking about the users of a public library as customers. Why does everything have to be reduced to Capitalism? Do we live in Sprogopolis already?

I for one am very excited about efforts to foster the collective wisdom of librarians, and any collaboration between libraries, museums, archives and other public guardians of knowledge is where it’s at. I am also interested in efforts to intelligently collocate “information packets” as demonstrated by Z39.50. What Z39.50 describes (and it’s not the only one, but is the first true contender) is a code for the representation of languages for information interchange, meaning a method for gathering information in disparate bundles that may not have the same method of organization.

Think about your normal web search, and how a basic search without boolean values for, say, Siouxsie and the Banshees will turn up 1,480,000 web pages in Google, 838, 312 in Gigablast and 8,354 blog posts in Technorati. By using an internationally recognized protocol that accounts for placement of semantic and syntactic strings in documents, the accuracy of particular search can be both broadened in scope and narrowed in accuracy simultaneously. When you search for a record using a proprietary interface as you might find at your public library, each page is generated on the fly, using your search designations and delimiters to collocate an appropriate response. The thing is, they already have fields specified for title, author, publisher, serial etc., and are able to produce a very accurate pool of hits, where large-scale federated web search engines do not. It isn’t that Google is inefficient, their search mechanism is possibly the most sophisticated in the world right now, using chains of associative memory to generate the most likely arena of hits in nanoseconds. What they lack is a system that recognizes fields for title, author, authority etc. across a disparate semantic and syntactic base. It doesn’t exist yet.

photo by Shirin Neshat

This is what Z39.50 is attempting to do, using baby steps today but with an eye toward systems of tomorrow, using parameters programmed into MARC 21 records, Dublin Core, SGML, XML, etc. The new lay of the land will arise in metadata schema and collective searching techniques, rather than a system of classification. It isn’t that I think the new AACR2 rules will be irrelevant once updated, nor that the Functional Requirements for Bibliographic Records (FRBR) protocols or imminent Resource Description and Access (RDA) standards will also be useless. I don’t think we should abandon main entries and added entries, either.

Rather, I am wondering whether or not the most effective method for collocating information will come from the design of collaborative search functions, rather than through a rigid semantic and syntactic language. A quick glance at RefWorks, and all of the output styles just for bibliographic records, shows hundreds of record making systems. It seems naive, to me, to try to impose a lone standard system of cataloging as well.

Leave a Comment » | information, library, programming | Permalink
Posted by Vaucanson's Duck

Film Preservation

July 11, 2007

For several nights now I have been fast at work programming a web resource for film archivists. This is an assignment for a class I am taking on Internet fundamentals, and once the work is done it will go live on the web. I am exhausting myself in an attempt to streamline my CSS and XHTML, and provide solid content. Keeping this blog has not lessened my appreciation of what it means to go live, as I thought the ease of it might do, but instead I feel a greater sense of responsibility, to myself and to the library profession.

I think back to my days as a bookseller, and I can feel the weight of the published word, nearly stifling in volume and density. The number of books which would pass through my hands as the head receiver seemed large, if I let myself forget about all the other book stores in the world. My bosses kindly let me take home all of the catalogs of upcoming publications and backlist titles, ostensibly for my collage making, but really I had a strong desire to know just what the larger environment of book publishing was like. Reading thousands of book catalogs each year can be inspiring, anesthetizing, deeply informative, and even heartbreaking. There are just so many books published every year, so many left over from last year, and years before, just so many voices…an atomic accumulation of paper-bound mycelial axons, slowly communicating with the global mind.

As I research more and more about the state of film preservation today I am beginning to feel a familiar sensation. Reel after reel of film is degrading and hard decisions are being made in the triage of our cinematic heritage. There are significant concerns about access and viability which need to be decided first. A majority of the films which are in need of restoration are so-called orphan films, meaning films for which no clear copyright or provenance is established.

“Orphan films make up the overwhelming majority of our cinematic heritage, and are a vital part of the culture and cultural record of the twentieth century. Indeed, the Library of Congress declared that it is in the task of restoring these orphan films that ‘the urgency may be greatest.’ They include a vast treasure trove of newsreels, documentaries, anthropological films, portraits of minority life in the U.S., instructional films, and even some Hollywood studio productions. While it is both a tragic shame and an unnecessary loss to our culture that scholars and citizens are hampered in making use, for example, of
orphan books and musical scores, the difficulty of access to orphan films is a matter of crisis because these works are literally disintegrating. At a time when digital technologies allow for more sophisticated and cheaper restoration and distribution of old films, uncertainty about copyright status has impeded restoration efforts. Worse still, in most cases the films are completely unavailable to the public even for simple viewing.”

Quoted from the Center for the Study of the Public Domain paper Access to Orphan Films

The situation is similar in other nations as well. Even Hollywood, with its awe-inspiring reserve of liquid assets, has been unable to preserve much of their history. “Of the tens or hundreds of thousands of movies made before 1950, fully 50% are already irretrievably lost. For films made before 1929, the loss rate is far worse: over ten years ago, the Library of Congress estimated that 80% of films from the 1920s, and 90% of films from the 1910s had already decayed beyond any hope of restoration.” (See The Silent Era: Lost Films for a small sample)

Digital restoration or archiving is an increasingly standard solution. Digital preservation is believed to be a cure for many photographic and archival ills, but already there are unpredictable digital dustclouds lying dormant before the breeze. David S. Cohen, in his Digital Proves Problematic article from April 20th of this year, echoes a sentiment familiar to agencies already invested in digital preservation. Apparently “the Academy of Motion Picture Arts & Sciences’ Science and Technology Council warned in 2005 that within just a few years films shot with digital cameras could be lost.” He cites Andy Maltz of the Academy’s Sci-Tech Council that “they had found archival tapes unreadable just 18 months after they were made.”

Fortunately there is a significant amount of work already being done by remarkably dedicated associations and individuals. The George Eastman House, the Library of Congresses National Film Preservation Board, and the Association of Moving Image Archivists are all wonderful agencies in the United States working toward reversing daily losses. Internationally UNESCO has gone a long way toward establishing standards of practice, and much of the work being accomplished today can be attributed to several reports they have issued since the late seventies. Just as incredible, and perhaps the most significant foreign association is The International Federation of Film Archives. Not only is their list of member organizations a formidable networking tool, but they have developed a beautiful Code of Ethics. Just look at these two principles:

“1.4. When copying material for preservation purposes, archives will not edit or distort the nature of the work being copied. Within the technical possibilities available, new preservation copies shall be an accurate replica of the source material. The processes involved in generating the copies, and the technical and aesthetic choices which have been taken, will be faithfully and fully documented.
1.5. When restoring material, archives will endeavour only to complete what is incomplete and to remove the accretions of time, wear and misinformation. They will not seek to change or distort the nature of the original material or the intentions of its creators.”

I admire FIAF very much and I am thrilled to see so many countries represented on their membership roll. Who could imagine that organizations from the US, China, North & South Korea, and Iran would all subscribe to a Code of Ethics like that? They do say film is a universal language. (But then, they also say that math, English, cookies, glossolalia, Google Translator and love are universal languages, so maybe they’re not to be trusted)

I will post a link to my site once it goes live, and hopefully I’ve done enough researching and vetting of information to be of use to someone, somewhere. There is such an accumulation of noise in our informational substrate, and I hope to sustain my focus while navigating through it. I don’t want to become another monkey plugging away at reauthoring the Library of Babel.

Leave a Comment » | archives, library, media, mirror, open source, programming | Permalink
Posted by Vaucanson's Duck

Diving for Perls

June 16, 2007

In order to keep things honest here, I would like to post something with substance behind the observations and gripes. After reading a fair number of blogs about tech services, programming and libraries, I’ve noticed several common threads that could be addressed with a focus on basics. I would like to offer readers of Robotic Librarian quality for their time. Lifelong learning for us as well.

Ideas are valuable when able to be put into practice. A continual theme of these posts call for an active role in encoding library services, and to that end we need to know the basics. For example, a common complaint about the ALA’s digital face is that the website is unappealing, and even archaic. As Norma says in a response to a post on Free Range Librarian, “I was never a member, but look in from time to time. I loved the smaller, deeper professional library organizations, but ALA seemed so out of touch. Still I was surprised by the ugly website comment. It seems to be a library afflication [sic]. The poorest websites with clunky, chunky links seem to be run by libraries. Doesn’t speak well for the profession.”

Some suggest outsourcing, or bringing in professional web designers. Perhaps that is the answer, but I would prefer to see us accelerate our own learning on the matter. A good beginning would be to start with some basic, accepted frameworks and develop it from there. Or perhaps start with designer supplied open source coding that is already dressed up a bit. Best of all would be to learn the CSS, Java, XML, Perl and Ruby on Rails to do it ourselves. Google, one of the most sophisticated of all net aggregators, often uses remarkably simple XML programming to achieve its aims. That very simplicity is a hallmark of new collaborative web environments. Just look at the O’Reilly Web 2.0 article from the last post for an in-depth look at it. For samples of working code that anyone can use, check out The Code Project, with online free source code boasting 4,222,461 members and growing.

To get more specific, there are wonderful resources for libraries as well. Check out the code4lib blog to hear about advances in the tech side of library service. By becoming a member you are able to get advice, code development help, and emotional support every time your code seems to creatively reimagine your data. If you want to connect with other library systems, and see exactly what they are up to today, you can go to the Library Weblogs index, which hosts feeds from Antigua, Egypt, Belarus, Kuwait, Singapore, and…well, just check it out if you’re interested. Amazing, isn’t it? If you’re interested, send in an email and you can get your library related blog feed posted to the list.

On a more personal note, you can subscribe to Blisspix, Fiona Bradley’s Sydney based blog about “Open access, technology and social futures.” Sometimes a bit technical, but a fine review of ongoing questions and problems with a practical focus.

As I come across problems while learning coding I will try to post suggestions, difficulties and pleas for help along the way. Hope this little guide helps even one of you on your way as well.

“And yet relation appears,

A small relation expanding like the shade

Of a cloud on sand, a shape on the side of a hill.”

(Wallace Stevens, “Connoisseur of Chaos”)

Leave a Comment » | design, library, programming, Web 2.0 | Permalink
Posted by Vaucanson's Duck

	Cory Strine on At the Bindery
	Claudio Boyenga on Rewind, Passaic
	Winter Merando on At the Bindery
	Holly Unterburger on At the Bindery
	cartoon network on Degradation of the Image

Robotic Librarian

Z39.50

photo by Shirin Neshat

Film Preservation

Quoted from the Center for the Study of the Public Domain paper Access to Orphan Films

Diving for Perls

Access Automata

Meta

The Madness of Crowds

! Newest Family Member

Bibliophilia

Cinematheque

Dirty Politics

Information and Publishing

Libraries

Musique

Oddiments

Philosophy and Linguistics

Social Networking

Technology

Visual Feasts

Historical Iterations

Robotic Librarian

Z39.50

photo by Shirin Neshat

Film Preservation

Quoted from the Center for the Study of the Public Domain paper Access to Orphan Films

Diving for Perls

Access Automata

Reference Mechanique

Meta

The Madness of Crowds

! Newest Family Member

Bibliophilia

Cinematheque

Dirty Politics

Information and Publishing

Libraries

Musique

Oddiments

Philosophy and Linguistics

Social Networking

Technology

Visual Feasts

Historical Iterations