For this activity we will use Python command-line to import the default
MARCIngester
class from the bibcat/ingesters/marc.py
module.
(py3-env)>python
>>>
(py3-env)>python -m idlelib
pymarc
module
>>> import pymarc
MARCReader
class using your own MARC21 file
or download the 150 MARC Record sample from Colorado College
here.
>>> reader = pymarc.MARCReader(
open("/tmp/rdf-app/cc-marc-sample.mrc", "rb"),
to_unicode=True)
>>> print(first_record)
=LDR 00947cam a2200313 a 4500
=001 40163506
=003 OCoLC
=005 19990428161357.0
=008 981009s1999\\\\mau\\\\\\b\\\\001\0\eng\\
=010 \\$a98047634
=020 \\$a0395691303
=040 \\$aDLC$cDLC$dC#P
=049 \\$aCOCA
=050 00$aQP38$b.A54 1999
=090 \\$aQP38$b.A54 1999
=100 1\$aAngier, Natalie.
=245 10$aWoman :$ban intimate geography /$cNatalie Angier.
=260 \\$aBoston :$bHoughton Mifflin,$c1999.
=300 \\$axvi, 398 p. ;$c24 cm.
=500 \\$a"A Peter Davison book."
=504 \\$aIncludes bibliographical references (p. 369-382) and index.
=650 \0$aWomen$xPhysiology.
=650 \0$aWomen$xPsychology.
=650 \0$aSex differences.
=902 \\$a150104
=907 \\$a.b13627557
=945 \\$aQP38$b.A54 1999$g1$i33027003963844$j0$ltbp $h0$oc$p$0.00$q $r-$s-$t1$u7$v0$w0$x0$y.i14279873$z990428
=994 \\$atbp
=999 \\$b1$c990428$dm$ea$fc$g0
MARCIngester
Class with the RDF Framework's
BIBCAT MARC Ingestion Rules in Turtle RDF format located at
kds-bibcat-marc-ingestion.ttl.MARCIngester
Class
using the default RDF Ruleskds-bibcat-marc-ingestion.ttl
marc_ingester.transform
method on the first_record
>>> print(marc_ingester.graph.serialize(format='turtle'))
@prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix dbp: <http://dbpedia.org/property/> .
@prefix dbr: <http://dbpedia.org/resource/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcterm: <http://purl.org/dc/terms/> .
@prefix dpla: <http://dp.la/about/map/> .
@prefix edm: <http://www.europeana.eu/schemas/edm/> .
@prefix es: <http://knowledgelinks.io/ns/elasticsearch/> .
@prefix kdr: <http://knowledgelinks.io/ns/data-resources/> .
@prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
@prefix loc: <http://id.loc.gov/authorities/> .
@prefix m21: <http://knowledgelinks.io/ns/marc21/> .
@prefix mods: <http://www.loc.gov/mods/v3> .
@prefix ore: <http://www.openarchives.org/ore/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix relators: <http://id.loc.gov/vocabulary/relators/> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://bibcat.org/9e054b36-0097-11e7-b2b0-a8667f19014b> a bf:Item ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-03-04T05:01:24.971684" ;
rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
bf:itemOf <http://dpla.coloradovirtuallibrary.org/9d817a06-0097-11e7-897a-a8667f19014b> .
<http://bibcat.org/9d817a06-0097-11e7-897a-a8667f19014b> a bf:Instance ;
bf:classification [ a bf:ClassificationLcc ;
rdf:value "QP38 .A54 1999" ] ;
bf:copyrightDate "1999." ;
bf:dimensions "24 cm." ;
bf:extent [ a bf:Extent ;
rdf:value "xvi, 398 p. ;" ] ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-03-04T05:01:24.917334" ;
rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
bf:identifiedBy [ a bf:Isbn ;
rdf:value "0395691303" ] ;
bf:instanceOf [ a bf:Work ;
bf:originDate "1999" ] ;
bf:provisionActivity [ a bf:Publication ;
relators:pbl "Houghton Mifflin," ] ;
bf:subject [ a bf:Topic ;
rdf:value "Women" ],
[ a bf:Topic ;
rdf:value "Sex differences." ],
[ a bf:Topic ;
rdf:value "Women" ] ;
bf:supplementaryContent [ a bf:SupplementaryContent ;
rdf:value "Includes bibliographical references (p. 369-382) and index." ] ;
bf:title [ a bf:InstanceTitle ;
bf:mainTitle "Woman :" ;
bf:subtitle "an intimate geography /" ] ;
relators:aut [ a bf:Person ;
schema:name "Angier, Natalie." ] .
@prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
@prefix kdr: <http://knowledgelinks.io/ns/data-resources/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix relators: <http://id.loc.gov/vocabulary/relators/> .
@prefix m21: <http://knowledgelinks.io/ns/marc21/> .
@prefix schema: <http://schema.org/> .
@prefix loc: <http://id.loc.gov/authorities/> .
bf:Item
with Colorado
College's Tutt Library using the IRI of
https://www.coloradocollege.edu/library/ through the bf:heldBy
predicate (substitute your own institutional IRI defined in
the Knowledge Graph activity)
bc:bf-Organization a kds:PropertyLinker;
kds:destPropUri [ bf:heldBy <https://www.coloradocollege.edu/library/> ] ;
kds:destClassUri bf:Item .
bf:Barcode
with a linked range of bf:barcode
to the bf:Item
About MARC URIs
bc:mrc-barcode a kds:PropertyLinker ;
kds:srcPropUri m21:M945__i;
kds:destClassUri bf:Barcode ;
kds:destPropUri rdf:value ;
kds:linkedRange bf:barcode ;
kds:linkedClass bf:Item .
custom
directory as
custom/cc-marc.ttl
.
>>> print(second_record)
=LDR 00921pam a2200277 a 4500
=001 38144340
=003 OCoLC
=005 19991207162048.0
=008 971205s1999\\\\njua\\\\\b\\\\001\0\eng\\
=010 \\$a97049002
=020 \\$a0134905172
=040 \\$aDLC$cDLC$dUKM
=049 \\$aCOCA
=050 00$aQC806$b.L48 1999
=090 \\$aQC806$b.L48 1999
=100 1\$aLillie, Robert J.,$d1952-
=245 10$aWhole earth geophysics :$ban introductory textbook for geologists and geophysicists /$cRobert J. Lillie.
=260 \\$aUpper Saddle River, N.J. :$bPrentice Hall,$cc1999.
=300 \\$ax, 361 p. :$bill. (some col.) ;$c26 cm.
=504 \\$aIncludes bibliographical references and index.
=650 \0$aGeophysics.
=902 \\$a160511
=907 \\$a.b13756497
=945 \\$aQC806$b.L48 1999$g1$i33027004066753$j0$ltbp $h0$oc$p$0.00$q $r-$s-$t1$u12$v0$w2$x3$y.i14450355$z991207
=994 \\$atbp
=999 \\$b1$c991207$dm$ea$fc$g0
MARCIngester
instance and with our
cc-marc.ttl
RDF turtle rule file.marc_ingester.transform
method with the second_record
MARC21 recordmarc_ingester.graph
>>> print(marc_ingester.graph.serialize(format='turtle').decode())
@prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
@prefix loc: <http://id.loc.gov/authorities/> .
@prefix m21: <http://knowledgelinks.io/ns/marc21/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix relators: <http://id.loc.gov/vocabulary/relators/> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://bibcat.org/bb1ac782-0123-11e7-9986-a8667f19014b> a bf:Item ;
bf:barcode [ a bf:Barcode ;
rdf:value "33027004066753" ] ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-03-04T21:44:23.368616" ;
rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
bf:heldBy <https://www.coloradocollege.edu/library/> ;
bf:itemOf <http://bibcat.org/ba977364-0123-11e7-871c-a8667f19014b> .
<http://bibcat.org/ba977364-0123-11e7-871c-a8667f19014b> a bf:Instance ;
bf:classification [ a bf:ClassificationLcc ;
rdf:value "QC806 .L48 1999" ] ;
bf:copyrightDate "c1999." ;
bf:dimensions "26 cm." ;
bf:extent [ a bf:Extent ;
rdf:value "x, 361 p. :" ] ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-03-04T21:44:23.250965" ;
rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
bf:identifiedBy [ a bf:Isbn ;
rdf:value "0134905172" ] ;
bf:instanceOf [ a bf:Work ;
bf:originDate "1999" ] ;
bf:provisionActivity [ a bf:Publication ;
relators:pbl "Prentice Hall," ] ;
bf:subject [ a bf:Topic ;
rdf:value "Geophysics." ] ;
bf:supplementaryContent [ a bf:SupplementaryContent ;
rdf:value "Includes bibliographical references and index." ] ;
bf:title [ a bf:InstanceTitle ;
bf:mainTitle "Whole earth geophysics :" ;
bf:subtitle "an introductory textbook for geologists and geophysicists /" ] ;
relators:aut [ a bf:Person ;
schema:name "Lillie, Robert J.," ] .
From this transformation, we see that our custom rules have populated the objects for
bf:barcode
and the bf:heldBy
predicates.
bf:Item
IRI generate_item_iri
that takes
a MARC 21 record and returns a rdflib.URIRef
that links
directly to the library's catalog.>>> import rdflib
>>> def generate_item_iri(record):
if not '907' in record:
return
bib_number = record['907']['a'][1:-1]
return rdflib.URIRef("http://tiger.coloradocollege.edu/record={}".format(bib_number))
third_record
>>> print(third_record)
=LDR 01469cam a22003614a 4500
=001 61109349
=003 OCoLC
=005 20070130035705.0
=008 050714s2006\\\\caua\\\\\b\\\\001\0\eng\\
=010 \\$a2005019975
=020 \\$a1412916186 (cloth)
=020 \\$a9781412916189 (cloth)
=020 \\$a1412916194 (pbk.)
=020 \\$a9781412916196 (pbk.)
=040 \\$aDLC$cDLC$dYDXCP$dBAKER$dUKM$dYBM$dIG#$dOCLCQ$dBTCTA
=042 \\$apcc
=043 \\$an-us---
=049 \\$aCOCA
=050 00$aQA13$b.P67 2006
=050 00$aQA13$b.P67 2006
=100 1\$aPosamentier, Alfred S.
=245 10$aWhat successful math teachers do, grades 6-12 :$b79 research-based strategies for the standards-based classroom /$cAlfred S. Posamentier, Daniel Jaye.
=260 \\$aThousand Oaks, Calif. :$bCorwin Press,$cc2006.
=300 \\$axix, 197 p. :$bill. ;$c26 cm.
=504 \\$aIncludes bibliographical references (p. 183-191) and index.
=505 0\$aManaging your classroom -- Enhancing teaching techniques -- Facilitating student learning -- Assessing student progress -- Teaching problem solving -- Considering social aspects in teaching mathematics.
=650 \0$aMathematics$xStudy and teaching (Secondary)$xStandards$zUnited States.
=700 1\$aJaye, Daniel.
=902 \\$a160104
=907 \\$a.b16842455
=945 \\$aQA13$b.P67 2006$g1$i33027005249309$j0$ltbp $h0$oc$p$0.00$q $r-$s-$t1$u3$v26$w1$x0$y.i17378928$z070130
=994 \\$atbp
=999 \\$b1$c070130$dm$ea$fc$g0
generate_item_iri
function on third_record
results in an item IRI
of http://tiger.coloradocollege.edu/record=b1684245>>> item_iri = generate_item_iri(third_record)
>>> print(item_iri)
http://tiger.coloradocollege.edu/record=b1684245
bf:Item
function created, we will run the marc_ingester.transform
on the third_record
and pass in the item_iri
to the function with a keyword
parameter.marc_ingester.graph
in Turtle:>>> print(marc_ingester.graph.serialize(format='turtle').decode())
@prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix kds: <http://knowledgelinks.io/ns/data-structures/> .
@prefix loc: <http://id.loc.gov/authorities/> .
@prefix m21: <http://knowledgelinks.io/ns/marc21/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix relators: <http://id.loc.gov/vocabulary/relators/> .
@prefix schema: <http://schema.org/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://tiger.coloradocollege.edu/record=b1684245> a bf:Item ;
bf:barcode [ a bf:Barcode ;
rdf:value "33027005249309" ] ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-03-05T15:10:34.526978" ;
rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
bf:heldBy <https://www.coloradocollege.edu/library/> ;
bf:itemOf <http://bibcat.org/e133d028-01b5-11e7-94fe-ac87a3129ce6> .
<http://bibcat.org/e133d028-01b5-11e7-94fe-ac87a3129ce6> a bf:Instance ;
bf:classification [ a bf:ClassificationLcc ;
rdf:value "QA13 .P67 2006" ] ;
bf:copyrightDate "c2006." ;
bf:dimensions "26 cm." ;
bf:extent [ a bf:Extent ;
rdf:value "xix, 197 p. :" ] ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-03-05T15:10:34.453103" ;
rdf:value "Generated by BIBCAT version 1.7.5 from KnowledgeLinks.io"@en ] ;
bf:identifiedBy [ a bf:Isbn ;
rdf:value "1412916186 (cloth)",
"1412916194 (pbk.)",
"9781412916189 (cloth)",
"9781412916196 (pbk.)" ] ;
bf:instanceOf [ a bf:Work ;
bf:originDate "2006" ] ;
bf:provisionActivity [ a bf:Publication ;
relators:pbl "Corwin Press," ] ;
bf:subject [ a bf:Topic ;
rdf:value "Mathematics United States." ] ;
bf:supplementaryContent [ a bf:SupplementaryContent ;
rdf:value "Includes bibliographical references (p. 183-191) and index." ] ;
bf:tableOfContents [ a bf:TableOfContents ;
rdf:value "Managing your classroom -- Enhancing teaching techniques -- Facilitating student learning -- Assessing student progress -- Teaching problem solving -- Considering social aspects in teaching mathematics." ] ;
bf:title [ a bf:InstanceTitle ;
bf:mainTitle "What successful math teachers do, grades 6-12 :" ;
bf:subtitle "79 research-based strategies for the standards-based classroom /" ] ;
relators:aut [ a bf:Person ;
schema:name "Posamentier, Alfred S." ] .
In the final exercise, we will process the remaining MARC records and adding each output graph to a master graph that we will then save
for
loop to iterate through the remaining MARC records in
the reader>>> for record in reader:
item_iri = generate_item_iri(record)
marc_ingester.transform(record=record, item_uri=item_iri)
master_graph += marc_ingester.graph
print(".", end="")
..................................................................................
..................................................................
master_graph
in a new output directory in your RDF application directory>>> mkdir output
>>> with open("/tmp/rdf-app/output/cc-150-sample.ttl", 'wb+') as fo:
fo.write(master_graph.serialize(format='turtle'))
266918
Using selected MARC records from Colorado College and the University of Colorado Boulder that were generated from the Alliance's Gold Rush comparison service, this project uses the BIBCAT to transform MARC records into BIBFRAME Linked Data. The RDF data is published to the web as Schema.org JSON-LD for indexing by Google, Bing, and other search engines. BIBCAT uses RDF rules that map MARC fields and subfields to BIBFRAME 2.0 entities and properties.
Source Code Repository Live Application