“I have seen the face of time, and I can tell you: it is an ocean in a storm”
~The Sands of Time
Smartbridge recently had the opportunity to augment an ETL to perform effective dating and, since I wrote the original system, I was the logical choice to add the new functionality. It was an interesting task especially since the target system was simply unable to handle any effective dating. A parallel staging area that could address the effective dating had to be built from the ground up.
Slowly changing dimensions have long since addressed the staging area design, and have a large range of solutions for every situation. On a table-by-table basis, effective dating is just another field, slightly trickier, but not any more so than a foreign key.
The troubles start when you diagram exactly what is effective and when. In all but the simplest of examples, when stored in a database it is distributed amongst many tables. Its description is in one place, its price is in a different one and its applicable tax by state in yet another. Your suppliers might consist of a single table or, in some cases, have a 3NF subsection of its own. Individually, each table could have an effective day of its own, and therein lays the issue: what do you do when those dates disagree?
It was an interesting challenge, which came in two parts. In Part 2 of my blog post, I’ll describe how the data itself was sent, and the other ways it could have been, (and why our choice was the right one for this situation, but might not work elsewhere). But here I want to address the interplay of conflicting dates amongst linked rows in a database representing a unified item.
“You’re just not thinking fourth dimensionally!”
~Back to the Future
Imagine a complex, real world example you might store in a database, such as a purchased item. You must store the details of the orders that have purchased it, the details of the item itself, and the details of the suppliers. All three concepts are subject to change (new orders made, existing orders cancelled; item versions being introduced; suppliers changing addresses or being added or retired). If this information needs to be sent to a second database, and the information is sent at the item level, effective dating comes into play, because the item is different depending on the moment you send it.
An easy solution is to send the item just as it changes, but that is often too late for the downstream system to process the information. Information that is known in advance, such as a change in the number of items received per order (“egg cartons will now contain 10 eggs each”) need to communicated at the time it is known, not the morning when the warehouse receives the order, or you are bound to have to field a large number of confused calls.
Effective dating, then, allows you to send information with a timestamp, so that the downstream system can consume it in advance. But changes known and sent in advance face the real issue: contradiction of timing.
Continuing my example, on a Sunday, a supplier informs you that, from Thursday onwards, orders have to be sent to a new address. This information is packaged, and sent. The next day, Monday, standards are modified and the number of items per carton will change starting Wednesday. This information must also be packaged and sent, but is no longer an easy task.
On Sunday, the item sent was dated for Thursday, with the new address, and the old (then only) number of items per carton. On Monday, the new change cannot be as easily sent: the effective date is prior to the change sent Sunday, so it would include the old address, and the new number of items, dated Wednesday. So far so good, except that if the change is put in place, the Thursday file will overwrite the number of items per carton data, since it is dated later –or, if a rule is established that files sent later override files sent earlier, Sunday’s address change will never occur.
This example showcases the problem: effective dating means that data must be considered as points in time, and that information sent earlier can be contradicted by information sent later, but effective sooner. No matter how the solution is implemented, there will be a conflict, and the engineering challenge is to spot these situations in advance, so that the protocol of information transmission is as unambiguous as possible when dealing with it.
The key to the problem lies in the example above. In a reductionist approach, consider two changes, happening at different times and becoming effective at different times. Each change can be accompanied with data that is functionally unchanged from the previous version. Add whether one or the other information must override or integrate the other one and how to distinguish changed data from unchanged data, and the result is a host of special circumstances, all of which need to be coded for.
Therein lays the challenge of effective dating, and the opportunity. In Part 2, I’ll take a look at how to transmit effectively dated data, and the algorithm choices to deal with it.