Data Engineering

Hot and Cold data

Any sufficiently big system will invariably need to have Hot and Cold data. What is the difference? Hot data is part of the active data set that is used for decision making, while Cold data is data in storage for historical analysis and handling of exceptional situations.

What you do not want in your system? You do not want to have situation where you must use all of your data for decision making! With the growth of data set, each query will need to go through all of the data to get the answer, as dataset gets bigger, answers will take more time. Hardware increase is linear, while this problem is exponential, so you will hit the limit.

This problem is sneaky, because at the start of the development you are not going to see it, everything goes great, until suddenly timeouts starts happening, and systems stops feeling so responsive as in the first days. If you get this problem in first year…. start talking to developers…. whole application architecture is probably not designed for scaling. You are risking massive fail during scale out phase, with a whole application redesign looming over the horizon. Do not go into redesign project with the same guys who designed initial system, they will fail again. Do not go into this redesigning with the new hires, they will fail with the same results as initial development.

What to do? Hire consultants or experts that did similar project in the past to help out. And no, you will not find experts in sub 35 year old range. You will find talents that can implement the system, but not an experts who can design it to last and scale. It is just a matter of having enough failures in the pocket, and enough maturity to learn from these failures.


CRP underskin sensor

In a few years from now, driven by increased antibiotic resistance and advances in sensors and biotech, we will have sensor under skin detecting CRP level.


Arthur C. Clarke was wrong

This is the future that you get when you advance single technology in isolation from everything else. In this case space technology.

To advance space technology so far to have capability to create worlds like in video, you need to advance computing technology. This advanced computing technology brings robotics and VR and AR on the level that enables new technologies that make a lot of stuff in this video obsolete. For example, there is no need for EVA if we can have remotely controlled robot. There is no need for laboratory and working space to be real at all, it can run in virtual reality or remotely. Building stuff can be done remotely by robots. Lesson learned, no technology can work in isolation from the big picture.

But this future has this longing and romanticism of wild west, that makes it so attractive.


This is where we are going with Augmented Reality tech

Product Management

Long lead time items

If you are working on an IoT project that includes any kind of HW component, you are doomed to deal with dreaded LLT items. These are the items, that take 1-3 months to get procured, and are critical for continuation of the project. As a product manager the only way to deal with this is doing the proper planning and having experience payed in sweat, blood and failures.

Movie style renting a private plane and going to the Shenzhen just happens in the movies.

Product Management

There is no perfection in decision making process

Few days ago we needed to make a long term decision about the direction we are going to take with parts of the system. After heated discussion we got to a point where we find out that we have two available directions , let’s say direction A and B. Majority supported direction A, I supported direction B.

Later during the day I started explaining my disdain to a friend how the rest of the team “do not get it” and that we are heading for disaster with the direction A supported by majority.

And then the friend struck me with a comment “How can you be so sure that the direction B would be better or more successful even if the direction A proves itself totally wrong?”

At the profound Zen moment I understood that there is no best direction to take at this situation, or the perfect one. But it is critical to make the decision and make a follow through with it.

For a perfect decision we would need a perfect knowledge about world, and fortunately for all casinos around the world, that is not possible. Since we do not know all the factors affecting current situation every decision about the situation includes a lot of assumptions and is most probably wrong. But it is critical to follow through with decision until we get to a different situation where we need to make another decision and so ad infinitum.

If anyone knows how to get perfect knowledge about any situation ….. I am listening.


Software Architecture and Coding

What is the difference between bad code, good code, excellent code , perfect code and artistic code in relation to software architecture?

Bad code is one big function doing everything and a bit more, it’s pain.

Good code is one class that does one thing, it’s good.

Excellent code is code written by the master of the programming language that knows what other programmers can only dream about and using it all the time. Whoever sees the code says it’s great but there is not a lot of every day programmers who would like to work with that code.

Perfect code is the excellent code dumbed down enough so that every programmer can understand it. It is not using hottest language features for the sake of using them. It’s heaven from the software architecture standpoint.

Artistic code is the excellent code made by the artist who is also protective about that code, who considers everybody who does not understand that code a lesser form of life. It’s pure hell from the software architecture standpoint.

Often the best and most productive programmers in general consensus can actually create more problems with the “artistic approach” if they are not paying attention to overall software architecture of the software they are working on.

Data Engineering

Evolution of the information system and data flow

This short novel, is inspired by actual events that happened between May 2005 and May 2012…. all of it is true as far as I remember.

First steps was manual report generation….data analyst way. Unfortunately this was not feasible due to number and frequency of reports. Management wanted to track sales in real-time.

Next step was generating reports that were using live data. At the time it seemed like a great idea, however very soon we found out that you can not have fast insert to database and fast search. Every report requested by anyone in the building was slowing down the data server.

This was the lesson why you need to split OLTP and OLAP databases.

Second step was nightly automatic data parsing and nightly database load, preparing data used by reports. However management still wanted to have some data coming in real-time, so we had implemented probably first version of map-reduce algorithm that I know of. Real-time data was parsed and handled. All reports requiring real-time data used very small tables that were updated multiple times per second.

With time, this reports and web pages supporting them where spread across the company, and everyone was able to see everything. Obviously, top management did not like the idea that coffee lady can check the company earnings etc. Solution was to integrate all this separate pages into single application with log-in, permissions, security etc.

After integration of all web pages into single application, we faced the never ending requests for additional reports from all over the company. In addition to supporting nightly data ETL, reports preparation we also developed new lottery games and integrated our system with external customers. All this efforts, just caused proliferation of reports.

Down the line, top management got saturated with different reports signifying different KPI. And business decision was to focus on providing single set of KPIs aligned with business goals, instead of ad hoc KPIs defined by each department.

Few years later due to falling sales, Marketing and Sales started coming up with requests for reports that can be summed up in: “compare everything to everything else”. Only way to mitigate reports explosion was self service using pivot technology from Microsoft and data warehouse SSAS from that time. Using this technologies dev team got ahead of all requests. Data warehouse got data prepared on nightly basis. Processing during night took around 6 hours. We had multiple redundancies and consistency checks of the system.

Downside of this technology was that one excel file became new Information System, because everyone had a feeling of control and they could do whatever they wanted, some entrepreneurial colleagues started making their reports that used some of the data from DW.

Suddenly we were scrambling across company and we were keeping track of the Excel file version.

After introduction of Excel, we found out that no-one is logging and using IS.

And hilariously we got to the same situation we had at the start, coffee lady complained about her salary when she saw how much money company earned (while she ignored expenses).

Now we had to introduce the permissions on DW tables, to limit who can see what. Whole situation finally blowed when CEO sent the Excel with CEO privileges to payments, and payments accidentally sent the CEO Excel to sales…… and suddenly everyone was using CEO level access.

All this time our chief IT guy was Linux proponent, while the company was running Windows clients and mixed Windows and Linux servers.

This excel debacle was final straw that gave development team enough ammunition to push IT to support move to Active Directory. Introduction of Active Directory removed all the problems we had with permissions and privileges. Suddenly there was no need to log-in into IS, everyone could use any Excel and it worked out of the box.

It was a wild time, we were solving all this problems on the go while the systems are running, games are played by 7 million auditorium. There was no support for management to stop everything for a few days while we put new system, only way to stop anything for few hours was major crisis and breakdown in the system.

Today that majority of the system can be implemented by one person and all reports could be done using power BI, tableau, qlik view etc.