UnCooking

There's no such thing as "raw" data.

About 25 years ago, I sought a benchmark project for our machine learning methods…and luckily we were asked by the quality assurance department of a continuous casting plant unit of a steel maker to analyze the dependencies of surface cracks at the slabs on a vast variety of process parameters (recipes, temperatures, speeds, cooling…). Fortunately they've collected a large sample set (across about 50 parameters) for supervised learning. To avoid "knowledge bias" they named the parameters just p_i (not telling us the meaning).

Data  are cooked 

Using various techniques including decision tree methods we found important dependencies that most became understandable after discovering the parameter meaning…Carbon, casting speed…some contributed just as pink noise..

One, p_28 showed a dominant influence…drumroll…the date?! Discussing this we found that they did periodically change the whole casting bow (secondary cooling zone)…and clearly this influences geometrical properties, like the alignment between the mold (primary cooling) and the bow and the bow and the straightening section…

Dates are perfectly cooked data (they contain, but hide, all interesting informations)…

Its like in the kitchen: chefs do not only use "raw" ingredients, but semi-finished things like stocks, sauces…But even the raw ingredients are result of natural processes that influence the final product...the dish.

Again…working in the data salt mines is difficult

To understand "raw" data you need to look deeper and need insight that you at the other hand want to extract from data.

I've worked with machine learning for 25 years now…and this made me quite modest with my objects of desire. To extract knowledge from data needs a lot of data uncooking…and this means modeling.

This is one reason why I'm advocating intelligent mixes of modeling and machine learning. Remember, machine learning does hardly generalize…in combination with modeling you get deep learning…by adaptive re-calibration.

Related to this I wrote are you a consequentialist?