-
Notifications
You must be signed in to change notification settings - Fork 20
Add OAI-PMH Ingestors #1180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add OAI-PMH Ingestors #1180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds a new OAI-PMH ingestion pathway to TeSS to import Events and Materials from remote OAI-PMH endpoints, with support for both Dublin Core and Bioschemas RDF metadata (MR11 / #1179).
Changes:
- Introduce
Ingestors::OaiPmhIngestorwith RDF-first (Bioschemas) parsing and Dublin Core fallback. - Register the new ingestor in
IngestorFactory. - Add unit tests covering empty endpoints, Dublin Core materials/events, mixed records, and Bioschemas RDF parsing.
- Fix HTML
bodytagclassattribute quoting in the application layout.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
lib/ingestors/oai_pmh_ingestor.rb |
New OAI-PMH ingestor implementation for RDF (Bioschemas) and Dublin Core ingestion. |
lib/ingestors/ingestor_factory.rb |
Registers Ingestors::OaiPmhIngestor as an available ingestor. |
test/unit/ingestors/oai_pmh_test.rb |
Adds unit tests for OAI-PMH ingestion behavior and parsing outcomes. |
app/views/layouts/application.html.erb |
Quotes body class attribute to produce valid HTML when multiple classes are emitted. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def read_oai_dublin_core(client) | ||
| count = 0 | ||
| client.list_records.full.each do |record| |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_oai_dublin_core calls client.list_records without a metadata_prefix. In OAI-PMH, ListRecords requests require metadataPrefix, and many servers/gems will return a badArgument / raise when it’s missing. Consider explicitly requesting Dublin Core (typically oai_dc) here so this works against spec-compliant endpoints.
| client.list_records.full.each do |record| | |
| client.list_records(metadata_prefix: 'oai_dc').full.each do |record| |
| end | ||
|
|
||
| def read(source_url) | ||
| client = OAI::Client.new source_url, headers: { 'From' => config[:mail] } |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config[:user_agent] is defined but not used when instantiating OAI::Client. Consider passing it via headers (alongside From) so requests identify TeSS consistently, matching how other ingestors set User-Agent on outbound HTTP requests.
| client = OAI::Client.new source_url, headers: { 'From' => config[:mail] } | |
| client = OAI::Client.new source_url, headers: { 'From' => config[:mail], 'User-Agent' => config[:user_agent] } |
| @@ -0,0 +1,208 @@ | |||
| require 'open-uri' | |||
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require 'open-uri' doesn’t appear to be used in this ingestor (it doesn’t call open_url or URI.open). Consider removing the unused require to keep dependencies minimal.
| require 'open-uri' |
| def list_records(metadata_prefix: nil) | ||
| if metadata_prefix == 'rdf' | ||
| @rdf_response | ||
| else | ||
| @dc_response | ||
| end |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FakeClient#list_records treats a missing metadata_prefix (nil) as Dublin Core. Since OAI-PMH ListRecords requests normally require an explicit metadataPrefix, consider making the mock require/handle the explicit DC prefix (e.g., oai_dc) so the tests exercise the same call pattern as production code.
| assert_equal @ingestor.materials, [] | ||
| assert_equal @ingestor.events, [] |
Copilot
AI
Feb 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These assertions have the expected/actual arguments reversed. Swapping them to assert_equal [], @ingestor.materials (and similarly for events) will produce clearer failure messages and matches the assertion style used elsewhere in the test suite.
| assert_equal @ingestor.materials, [] | |
| assert_equal @ingestor.events, [] | |
| assert_equal [], @ingestor.materials | |
| assert_equal [], @ingestor.events |
Summary of changes
Motivation and context
This is a relevant step in the mTeSS-X project. In particular, it is MR11. See #1179.
Checklist
to license it to the TeSS codebase under the
BSD license.