Data Quality & Known Issues

We believe in radical transparency — about the data and our process.

This page explains what's reliable, what's uncertain, and where you should be careful. Budget data is complex. Our extraction isn't perfect. You deserve to know both.

What's Reliable

Coverage: 13 complete financial years (2013-14 to 2025-26)
Volume: ~200,000 line items extracted
Major totals: We aim for 100% accuracy on department totals
Structure: Budget classification hierarchy preserved correctly
Traceability: Every line item links back to source PDF
The Big Picture is Accurate: Total spending by year, major department allocations, broad trends — these are reliable. We verify major totals against source documents.

Known Issues

We aim for 100% accuracy but haven't reached it yet. Here's what we know:

Issue 1: Label Variations Across Years

Medium
Problem: Same department, different names across years.

Example

  • 2014-15: "Director Tourism Kashmir"
  • 2017-18: "DIRECTOR TOURISM KMR."
  • 2020-21: "Director Tourism Kashmir"
Impact: Time-series analysis requires matching these variants.
Workaround: Use budget codes as primary identifiers, not text labels.
StatusWe will build standardised entity names based on user priority.

Issue 2: OCR Errors in Some Documents

High
Problem: Machine extraction makes mistakes, especially with poor quality scans (mainly 2013-15 documents), Urdu/Hindi script mixed with English, handwritten annotations, and complex table structures.
Impact: Individual line items may have errors in amounts or descriptions.
Workaround: For critical figures, verify against source PDF (we provide links).
StatusContinuous improvement. We re-extract as algorithms improve.

Issue 3: Missing Context

Low
Problem: We show budgeted amounts and actuals, but not detailed project outcomes, beneficiary information, geographic distribution within J&K, or quarterly spending patterns.
Impact: You can see allocations but not results or implementation details.
Workaround: Combine our data with other sources (audit reports, news, RTI).
StatusFuture scope: we plan to add actuals and project-level data.

Found an Error? Tell Us.

We want to know about errors. Your reports help us improve.

What to include

  • Which file/line item (year, department, budget code)
  • What's wrong (amount incorrect, label wrong, missing data)
  • Evidence if you have it (screenshot, calculation)

What happens next

  1. We verify against source documents (3–5 business days)
  2. If it's our extraction error, we correct it
  3. We update the dataset and document the correction
  4. We credit you in our changelog (optional)

How We're Improving

Based on user feedback and our own quality checks:

NOW (Ongoing)

  • Manual verification of high-value allocations
  • Cross-checking with CAG audit reports where available
  • Documenting all known issues
  • Re-running extraction on problem documents

NEXT 6 MONTHS

  • Adding more years as new budgets release
  • Expanding to other datasets

For Researchers & Journalists

Using this in published work?

Proper Attribution

JK Open Data (2025). Jammu & Kashmir Budget Data 2013-2026. Retrieved from https://jkopendata.com

Best Practices

  • Always note relevant data limitations in your work
  • Verify critical figures against source documents
  • Report the extraction date/version you used
  • Share your findings with us — we'd love to feature your work

Questions about using this data? research@jkopendata.com