Skip to content

Add weekly data update GitHub Action#101

Merged
LukasWallrich merged 1 commit intomainfrom
feature/weekly-data-update
Feb 1, 2026
Merged

Add weekly data update GitHub Action#101
LukasWallrich merged 1 commit intomainfrom
feature/weekly-data-update

Conversation

@LukasWallrich
Copy link
Contributor

Summary

  • Adds a scheduled GitHub Action that runs weekly (Mondays 6:00 UTC) to update package data
  • Efficiently checks SHA256 hash of data file first - skips R setup entirely if data unchanged
  • When data changes: updates offline snapshots, commits, and deploys Shiny app

How it works

  1. Downloads data from OSF and computes SHA256 hash
  2. Compares with stored hash in inst/extdata/snapshot/data_hash.txt
  3. If unchanged → exits early (fast, no R setup needed)
  4. If changed → sets up R, runs update_offline_data(), saves new hash, commits, deploys to ShinyApps

Test plan

  • Manually trigger workflow via workflow_dispatch to verify it runs
  • Verify hash comparison logic works correctly
  • Confirm Shiny deployment uses correct credentials

🤖 Generated with Claude Code

Runs on Mondays at 6:00 UTC and can be triggered manually.
Checks SHA256 hash first - skips R setup if data unchanged.
Updates offline data and deploys Shiny app when data changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Feb 1, 2026

Claude finished @LukasWallrich's task —— View job


Pull Request Review

I've completed a thorough review of the weekly data update GitHub Action implementation. Here's my comprehensive feedback:

Todo List

  • Read repository's CLAUDE.md file for repo-specific guidelines
  • Read and analyze .github/workflows/update-data.yml workflow file
  • Read inst/extdata/snapshot/data_hash.txt file
  • Analyze code quality and best practices
  • Check for potential bugs and issues
  • Review performance considerations
  • Evaluate security concerns
  • Assess test coverage needs
  • Provide comprehensive review feedback via GitHub comment

Review Findings

Code Quality & Best Practices

Excellent overall structure:

  • Smart hash-based change detection avoids unnecessary work
  • Proper conditional execution using GitHub Actions if statements
  • Clean separation of concerns with distinct steps
  • Good use of GitHub Actions best practices (actions/checkout@v4, r-lib/actions)

Well-structured workflow logic:

  • Early exit strategy when no changes detected (.github/workflows/update-data.yml:45-47)
  • Proper R environment setup only when needed (.github/workflows/update-data.yml:49-60)

⚠️ Potential Issues

1. Error Handling & Robustness

  • Missing curl error handling (line 21): curl -L -o fred_data.xlsx "https://osf.io/2tbvd/download" should check exit status
  • No validation that downloaded file is valid Excel format before hash calculation
  • Git commit could fail silently (line 87): Uses || echo "No changes to commit" but doesn't handle git push failures

2. Security Concerns

  • Secrets exposed in command (line 94): ${{secrets.SHINYAPPS_TOKEN}} and ${{secrets.SHINYAPPS_SECRET}} are passed directly to R command, potentially visible in logs
  • No integrity verification of downloaded data beyond hash comparison

3. Workflow Dependencies

  • Hardcoded file paths: inst/extdata/snapshot/data_hash.txt assumes directory structure
  • Missing dependency validation: No check that update_offline_data() function exists before calling it

🔧 Suggested Improvements

1. Enhanced Error Handling:

- name: Download data and check hash
  run: |
    if ! curl -L -o fred_data.xlsx "https://osf.io/2tbvd/download"; then
      echo "Failed to download data file"
      exit 1
    fi
    
    # Validate file is not empty and is Excel format
    if [ ! -s fred_data.xlsx ]; then
      echo "Downloaded file is empty"
      exit 1
    fi

2. Improved Security:

- name: Deploy to Shiny
  env:
    SHINYAPPS_TOKEN: ${{ secrets.SHINYAPPS_TOKEN }}
    SHINYAPPS_SECRET: ${{ secrets.SHINYAPPS_SECRET }}
  run: |
    R -e "
      rsconnect::setAccountInfo(name='forrt-replications', 
                               token=Sys.getenv('SHINYAPPS_TOKEN'), 
                               secret=Sys.getenv('SHINYAPPS_SECRET'))
      rsconnect::deployApp(appName = 'fred_explorer', appDir = './inst/fred_explorer', forceUpdate = TRUE)
    "

📊 Performance Considerations

Excellent performance optimization:

  • Hash-based change detection prevents unnecessary R setup (saves ~2-3 minutes per run)
  • Only downloads data once for hash checking
  • Efficient use of GitHub Actions conditional execution

Minor optimizations:

  • Consider using sha256sum with --check for more robust hash comparison
  • Could cache R dependencies between runs if updates are frequent

🧪 Test Coverage

Current gaps:

  • No automated testing of the workflow logic
  • Missing validation that update_offline_data() succeeds
  • No verification that Shiny deployment actually works

Recommendations:

  • Add a test mode that uses a small test dataset
  • Consider adding workflow status checks/notifications
  • Add validation step to ensure generated files are correct format

📈 Additional Suggestions

1. Workflow Monitoring:

- name: Validate update success
  if: steps.check-hash.outputs.data_changed == 'true'
  run: |
    # Verify snapshot files were created/updated
    if [ ! -f "inst/extdata/snapshot/data.RDS" ]; then
      echo "Error: Expected snapshot files not found"
      exit 1
    fi

2. Documentation:

  • Consider adding comments explaining the OSF URL source
  • Document expected data format and hash file structure

🎯 Summary

This is a well-designed workflow that demonstrates good understanding of GitHub Actions and efficient data management. The hash-based change detection is particularly clever for avoiding unnecessary work.

Priority fixes:

  1. Add curl error handling
  2. Secure secrets handling in Shiny deployment
  3. Validate downloaded file before processing

Nice to have:

  • Enhanced error handling throughout
  • Workflow success validation
  • Test coverage for the automation logic

The core concept and implementation are solid - just needs some defensive programming improvements for production reliability.

@LukasWallrich LukasWallrich merged commit a260ecb into main Feb 1, 2026
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant