ETL – Uploads and Reporting

Problem:
Data coming from dozens of different sources as flat files in different formats over a period of 6 months
Why it mattered:
Information had to be gathered and evaluated for intellectual property case
Characters:

  • Lawyer
  • Media Consultant
  • It Consultant

Attempts at resolution:
Try to have sources submit files in 1 of 3 specific formats (Nice Try)
Solution:
Identify different formats and write ETL routines, which automated the importing and cleanup of the data, up to a point. Create a user interface to allow users who where familiar with the subject material to finish the data cleanup and categorization. Create queries and reports for legal requirements of the case as well as summary information and data integrity checking.
Summary:
The import routines found and corrected between 60% to 90% of the data problems. The process improved over time, since the format of the information from a particular source was usually the same every month. Exception reports identified problem records and data entry forms allowed users to fix incorrect information.