Effective Dating: The Many Pathways, Part 2

How did it get so late so soon? ~ Dr. Seuss

In Part 1, I examined what the main problem with transmitting effective dated information has been in my recent experience. In a nutshell, any time you send effective dating information, you are in fact implicitly adding a second effective date: the time when the information was sent. The information is sent, as correct as possible at the time. But if after this is done, a new change comes through that happens before the previous one, you are faced with a paradox, where a general rule simply cannot be applied.

paradox

paradox

There are many solutions to this situation. I considered three, and briefly considered a further four, which I restricted first by project-based circumstances, and second by localized efficiency best practices – both of which could be completely different next time effective dating is required.

Eliminating the Paradox

One solution for effective dating is to eliminate the possibility of paradox by removing all implicit ambiguity:

  • Check and Resend: Anytime a change happens, not only is it sent, but also any files already sent that no longer are correct (such as a previously sent file that had the now incorrect value) are resent. For complex systems, the risk of cascading makes this difficult to implement.
  • Limit the scope: Effective date is applied individually to every single field individually, either with an actual date or a flag indicating if the field changed. For a large number of fields, this approach becomes costly, with information feeds easily doubling in size.

Either approach eliminates the danger of a paradox, but it comes at a great cost, making them inappropriate for large systems. Beware the second option, since code will be required for each individual flag/date, which will add significant overhead to the processing!

Living with the Paradox

A balanced approach, groups connected fields under the same effective date, which can then be adjusted to find the proper equilibrium between cost of transmission and processing, and the risk of paradox. This approach works best when the source and destination of changes are closely related, so that the groups in both systems are a rough match.

  • Next step: the information sent is always the next value it will take, and when that will happen. For groups that won’t change, a default ‘end of time’ date is sent. Its greatest advantage is that this structure closely matches how people think of changes, and thus is intuitive to build.
  • Last step: opposite to the previous, the information sent is a ‘snapshot’ at a moment in time, and the effective date informs when the data last changed, and what its value is at the time the snapshot is taken. For unchanged fields, a ‘time immemorial’ date is used instead.

These two approaches, and hybrids on the same theme, can produce a paradox, but it reduces the scenarios of when it can occur. The effective date is not just a fixed point in time, but a process, extending from the past or into the future, and thus can show that the information will be changed or will have been changed.

In the above example, a single file can be crafted to contain both changes, and would be the file sent when the second change occurred. Instead, the paradox now occurs only when the change happens on the fields controlled by the same effective date. This solution does not have the same cost as Eliminating the Paradox solution.

Correcting the Paradox

If you choose a solution where paradoxes can still occur, a process must be put in place to restrict changes, detect paradoxes, or behave appropriately when they occur:

  • Restrict Changes: The system could restrict changes happening before already created changes. This condition is easily checked, and if used, should apply only to important fields – otherwise, you are effectively limiting your effective date system to sequential changes.
  • Detect paradoxes: If you can define programmatically the paradox (for example, as I defined it, a field whose value is changed at a time prior to a change already in the system), this allows for code to be executed in response to it. This option is best for procedural or Object Oriented environments, though, rather than ETLs in RDBMSs, whose efficiency will be greatly impacted by the logic required.
  • Propagate changes: I was able to use this particular approach with great success in my own project. Certain fields, determined in design to be important and subject to frequent change, were specifically singled out to propagate. Unless the incoming information indicated otherwise, values of these fields, once changed, would also change all future staged changes, so that the value would not be lost.

In conclusion, effective dating in ETLs is not for the faint of heart. It is comparable in many ways to foreign keys and triggers, in that it establishes causal relations amongst rows of data, with the important difference that foreign keys and triggers have much more built-in software support, while effective dating systems are still in their infancy.

Maybe in a few years this will change, and just like GIS, we will talk about what RDBMS has the “best” EDIS. But until that day comes, if you are in need of effective dating in your database, you will need to keep all these considerations in mind while you design and develop your solution.

There’s more to explore at Smartbridge.com!

Sign up to be notified when we publish articles, news, videos and more!