visit ontolawgy™ LLC’s main site

2012-07-16

http://l4w.us free legal URL shortener now shortening links to the CFR

Last month, with little (rather, no) fanfare, ontolawgy™ LLC launched L4w.us ("law for us"), a free URL shortener* to access non-commercial sources of law.

Currently, it works for Bills in Congress (back to the 104th Congress), U.S. Public Laws (ditto), the U.S. Code, the Federal Register (to 1995 for text, 1994 for tables of contents), and as of today, the Code of Federal Regulations.

These documents are available through the the Library of Congress's legislative information site, THOMAS, the U.S. Government Printing Office's "FDSys",  The Legal Information Institute's U.S. Code and Code of Federal Regulations, and the online (though unfortunately, still unofficial) Federal Register.

If they're free sources, and the sites are easily searchable, why is an URL shortener necessary? Well, these sites, thankfully, are loaded with other features that make it difficult for them to offer clean, simple, and short URLs to access legal text.

For example, to get to Public Law 111-148, you need to go to:

http://www.gpo.gov/fdsys/pkg/PLAW-111publ148

That's not too complicated, and if you can remember the scheme, not such a big deal. What if you want to send someone a link to the text? You have to send them this:

http://www.gpo.gov/fdsys/pkg/PLAW-111publ148/html/PLAW-111publ148.htm

Not so easy. But why go through trying to remember that (or making a custom shortcut in your browser, or setting up a raft of unnecessary bit.ly links) when you can just type this: 

http://l4w.us/PublicLaw/111-148 for the landing page or

http://l4w.us/PublicLaw/text/111-148 for the full text or

http://l4w.us/PublicLaw/pdf/111-148 for the PDF.

Alternatively, if you're used to legal citations or you can handle spaces in your URLs:

http://L4w.us/Pub. L. 111-148
http://L4w.us/Pub. L. 111-148 text, and
http://L4w.us/Pub. L. 111-148 pdf

will work too, with or without spaces. Hopefully, those are self-explanatory enough.

There are more examples at http://L4w.us for other legal materials.

So why has no one done this before? Honestly, I have no idea. Is it really necessary? Unless you're Rain Man, I think it can't hurt, especially if you want to send or show people links to primary (or close-to-primary) legal source material.

My main idea was to let people use L4w.us links as easy-to-remember shortcuts to public versions of our laws, so instead of going to the destination sites and searching, or trying to remember the multiple URL schemes, if you know what you're looking for (sorry, no search yet), just type it in and go.

I also put it together to help simplify ontolawgy™ LLC's bulk data downloading programs. The overhead on the server is minimal, so I thought I'd share. Feel free to use it, and tell people how it's going; use L4w.us links in blog posts, articles, theses, student projects, non-profit citizen-engagement applications/campaigns, etc. Just don't bake it into a commercial software or hardware (e.g., a book) application without permission.

Currently, the site doesn't offer any fancy features, and we don't plan to. But it also doesn't track you. We maintain basic access logs with IP addresses to make sure people aren't abusing the service (e.g., DDOS-ing it for some unknown reason or incorporating the service into commercial activity without a license), but other than that, whatever laws you look at are between you, the site providing the content, and whoever is monitoring your internet traffic.

Suggestions for how to improve the site which resources to add are most welcome here in the comments, on Twitter, or by email.

© 2012 Alex M. Hendler. All Rights Reserved.

*If you haven't picked up on it by now, technically, L4w.us is not really an URL-shortener, it's more like an API for people to access other (simple) people-facing APIs, but without scaring them away by calling it an API.

2012-05-15

How to eat legislative sausage


Here's a two-part post, after a very long absence from the blog. I have been busy with several other projects, but I am gearing up to participate in the "legal hacks" event (see http://legalhacks.org) very soon, and as a result, am revisiting some issues related to organizing code-like legal materials. 

The first part of this post discusses organizing materials once they have been obtained, and the second part discussed some of the challenges in obtaining them. A bit backwards, perhaps, but the first part is likely more relevant to the upcoming event. 

Part I: Organizing code-like legal materials, or, eating legislative sausage

To rehash the old adage that making laws is like making sausage, organizing them after the fact is very much like eating it, and below is my general method for doing that, for whatever it's worth. As always, I welcome any feedback, questions, or suggestions.

I read with great interest two recent blog posts by Thomas Bruce of the Legal Information Institute at Cornell and Grant Vergottini of legix.info about challenges in organizing legislative/legal data using different types of identifying information. Generally, Mr. Bruce's post describes the functions that identifying information can serve. These are summarized below:

a) “Unique naming”, i.e., assigning a specific name to a legal provision within a system

b) “Navigational reference”, which is similar to navigating a filesystem

c) “Retrieval hook/container label”, i.e., to use a citation as a placeholder to aggregate lower-level content that is stored in other locations/records

d) “Thread tag/associative marker”, i.e., grouping of related documents in “threads”; one example he uses is a “captive search” URI, but in my view, this is mainly another way to get at a retrieval hook

e) “Process milestone”, i.e., inferring some meaning from the official status of a document, e.g., if a bill has been assigned a Public Law number, it has presumably been enacted into law.

f) “Proxy for provenance”, e.g., the existence of a bill number means that legislation has been officially noticed in some way.

g) “Popular names, professional terms of art, and other vernacular uses”, e.g., the Social Security Act, the Stark Law, the Anti-Kickback Statute (to use some of the examples with which I am most familiar).

Mr. Vergottini goes into the issues surrounding selecting frameworks to be used to actually implement those kinds of identifiers, e.g., via a URN or URL-based system, and discusses some of the difficulties inherent in selecting and implementing a system to capture relevant data in a machine-readable way. He also identifies problems with viewing different portions of text, as well as tracking text that gets amended or redesignated.

Common problems Messrs. Bruce and Vergottini both discuss include documents/provisions with identical names/identifiers in an official classification system (e.g., the two subparagraphs 42 U.S.C. § 1320a-7b(b)(3)(H) that coexisted for seven years until fixed by Pub. L. 111-148 § 3301), or how to store temporally different versions of text.

I started building the ontolawgy™platform (a web-based legal analysis system) about 6 years ago for my regulatory practice, and I ran into the problems discussed above quite early. Here are some of the approaches I have taken to address them:
  1. Treat every textual division as a unique document, and allow it to be accessed via a unique URL based on its location in the government taxonomy (a - c in Mr. Bruce's overview).
  2. Store each descriptive element about that document in a tag/field. This includes official and unofficial “popular names” (e.g., the Social Security Act), section numbers within those popular names, section numbers of the U.S. Code, Public Law enacting provisions, etc. (c - g)
  3. Allow users to query on any of those elements. (a - g)
  4. Track duplicates and give them distinct records that are still retrieved in an appropriate way using their descriptive tags/fields. (a - d, g)
  5. Track each provision using its current designation, but maintain a full locative, temporal, and ontological history within the record and the system. (a - e, g) For example, 42 U.S.C. § 1320a-7b(b)(3)(I) used to be the second 42 U.S.C. 1320a-7b(b)(3)(H) that was enacted by Pub. L. 108-173 § 431 (the first subparagraph (H) was enacted by § 237 of the same Public Law); the system tracks all that information and allows users to query it and, e.g., gather together all historical versions of subparagraph (H) to track how it has changed over time. 
As for the mechanics, when I started building the system, my main goal was to get up and running quickly with a free, open-source, off-the-shelf system. The system is extremely flexible, has a very active development community, and still works quite well. While it does not currently use any sort of (proposed) standard like URN:lex or Akoma Ntoso, it does use inline markup, and thus, should be easily convertible to a legal markup standard once one is in place. 

I can't go into much more detail here, but please contact me to get access to my demo system if you would like to see it in action.

Part II: Obtaining legal source materials, or, how the government makes sausage even messier

All that said, one significant challenge I still face is getting rational raw data from official sources. Indentation can be highly relevant semantically, depending on the subject matter, but official sources either just do away with indentation altogether (I'm looking at you, Code of Federal Regulations) or present it in such an inconsistent format that it might as well not be there (U.S. Code).

Back to the sausage. Essentially, we pay the government to make legal sausage, cook the sausage, and serve it to us, but just before they serve it, they mash it up, smear it around the plate, then take away our silverware and tie our hands behind our backs. I spend much more time than should be necessary simply ensuring that the materials I work with are properly indented to accurately reflect their meaning. I've written several small programs to do about 95% of the work, but that remaining 5% can be almost maddening, particularly when dealing with multiple levels of unenumerated flush text. The materials are certainly drafted with visible indentation (take a look at Public Laws: all the indentation is there and correct), but all this useful information gets stripped out at some point in the publication process, and it is not at all clear to me why this happens. The U.S. Code uses “bell codes” for typesetting print documents, but this doesn't excuse the lack of indentation in electronic publications.

The C.F.R. is even more maddening: This document claims that the XML format of the C.F.R. “is a complete and faithful representation of the Code of Federal Regulations, which
matches most closely to the author's original intent... [and] fully describes the structure of the Code of Federal Regulations, including the large structure (chapters, parts, sections, etc.), the document structure (paragraphs, etc.), and semantic structure” then goes on to explain that the SGML indentation for subsections, paragraphs, subparagraphs, clauses, etc. have all been collapsed to a the same single tag. This means that every last bit of indentation/separation (except for line breaks) within each section—“sections” can be very long and complex, with multiple nested levels of semantically-relevant indentation—has been completely stripped from all publicly-available electronic materials. How is this supposed to help the public? 

The LII has generally addressed indentation issues in its publication of the U.S. Code (See http://www.law.cornell.edu/uscode/text/42/1395ww for an example), and content is freely available for viewing and non-commercial re-publication.

LII's new Code of Federal Regulations (CFR) system, the result of a close collaboration with the government, also does an excellent job of organizing and indenting CFR data the way it was meant to be read: it is the only freely-available resource of which I am aware that does this.

While the LII's sites offer a valuable public service, they do not solve the underlying problem: Properly indented content is not freely available to the public from the government for commercial re-use, even though these government works are in the public domain. Why is this a problem? Because official platitudes notwithstanding, government publications significantly obscure or corrupt the intended meaning and scope of the laws that govern us.

If anyone has some insight about how to get the government to bring useful and accurate indentation to its official publications, please get in touch, I would be thrilled to work with you to help make this happen.


© 2012 Alex M. Hendler. All Rights Reserved.