Thursday, September 15, 2016

On the keeping of time

I told this story to a colleague this week, and it seemed worthy of a blog post.

Several years ago I worked for a large healthcare system. We supported more than 40 hospitals in seven states. I worked for the integration team and we developed and managed interfaces between all of their different IT systems.

They used an older mainframe based Patient Management System and a newer EMR. We picked up patient administration messages (ADT) and delivered them to the EHR and various other ancillary systems. We had an audit report that told us when a message took more than one minute to flow from the Patient Management System to the EHR, and we would have to explain why that happened. This was considered an SLA failure. One month we had a particularly high SLA failure report. As a junior developer I drew the short stick and had to investigate these failures.

I could see that we were receiving the ADTs and that they were going out to the EMR in a few seconds, well under the sixty seconds that we were permitted. I scratched my head and at the next status meeting I asked the innocent question "How are these times established." It was explained to me that these were the "system times" in the Patient Management System, the Interface Engine and the EHR. So I then asked the follow up question, "How do we know that the times are in sync?" What followed would be referred to in literature as a "pregnant pause."  Since I asked the question, I was tasked with finding the answer.

The Interface Engine and the EHR ran on unix boxes. These machines synced their time up with a time server at the US Naval Observatory every weekend. So, when I sat down with the EHR tech we verified that the times were in sync. When I sat down with the Patient Management System mainframe tech and we looked at the time in his system, it was a full 40 seconds off from the other two systems. Since that time "started the clock" for SLA purposes, this was an issue.

So I asked him, "How is the time set on your system?" The answer that I got was shocking.

"Well, when we do the quarterly Initial Program Load, whatever tech is performing the reboot looks at his watch and sets the system time to that."

This was a multi-billion dollar organization, and that is how they managed time. I was dumbfounded.

So, I asked, "How should we fix this?"

"Fix what?"

"How do we get the time in the mainframe to be set correctly?"

"The mainframe time is correct."

"Not according to the time servers at the US Naval Observatory."

"Yeah but....."

The mainframe was the center of the universe. The mainframe folks considered that the time in the mainframe was more correct than the US Naval Observatory.

Eventually we agreed upon a procedural fix which would require the tech that was setting the time during the quarterly reboot to bring up the web page of the US Naval Observatory and set the mainframe's time correctly.

Sunday, February 21, 2016

My Experience at MHacks

MHacks is a hackathon at the University of Michigan. My company was a sponsor of the event this year, along with larger, more well known companies like Intel, Target and Disney.

I got to participate as a mentor, which basically means that project teams that are looking for technical guidance seek out mentors for assistance.

My specialty is Healthcare IT. There were two project teams that I had the pleasure of working with.

The first was a simple series of web baesd forms that asked some lifestyle questions and asked for the person's height and weight to calculate and display their body mass index (BMI). The actual calculation of BMI would be done in javascript on the web page. It has been over a decade since I did any web based programming, so I sent this team off to find some web based developers to get the syntax right.

The second team was more ambitious. They wanted to have web forms/apps that allowed a patient to update their demographic information and to schedule a visit with a doctor. I outlined the basic standards that support these data exchanges and then showed them the FHIR resources that were available to support this exchange. There are a couple of test servers out there that you can exchange with and they were going to use those as the back end for their app. I told them to emphasize that their solution was "standards based."

All in all, it was a very cool experience to see so many young people who are enthusiastic about technology.

Saturday, February 13, 2016

It's all about how you recover

There is a lyric from Marillion that goes like this:

Failure isn't about falling down,
Failure is staying down.

I am in a leadership role in my present job. I work with a team of developers in the role of EDI Architect. We primarily move data between our internal systems and external trading partners.

One member of the team is a Senior Developer with lots of mainframe experience who is very new to the technology that we are using as our interface engine. This engine has a visually based development environment that can be challenging for a traditional "looking at code" developer to get comfortable with.

This week we promoted the first of his projects to production. The next day we realized that we had configured things wrong. He got very flustered. I told him to take a few deep breaths and then sit down with me while we figured it out. I told him that we needed to make sure that the change that we made was correct and that that was more important than making the change quickly.

We sat down and analyzed the problem and came up with a solution that we both agreed with. When we made this change, I brought each component up one at a time to make sure that they functioned as expected. We looked at the data flows and verified that they functioned correctly.

When we got done, I told him "We all make mistakes. It's all about how you recover."

I have now added a couple more things to the list of things that I look at when promoting to production. Hopefully, we both learned something.