Data Architecture Question – Storing Daily Snapshot of Employee Master Data

Question

I'm trying to store daily snapshot of Employee data for FTE analysis project - analytics on how many FTE in various positions any given day. I can call a REST API, which will give me data for all the active and terminated employees as of API call time. Is it prudent to call this API every single day and store

Accepted Answer

I ended up using Delta tables on Databricks.Sample code to create the table and load data#read data into a dataframedf = spark.read.format("json").option("multiline", "true").load("/mnt/datalake/path/to/file/employee.json")#load data into a managed delta tabledf.write.format("Delta").mode("overwrite").option("mergeSchema", "true").saveAsTable("raw.DimEmployee")Now you can query data of any given day using below queryselect count(*) from raw.DimEmployee timestamp as of '2022-09-14'

Advertisement

Answer