data | Roger Swannell

Weeknotes #263

Roger on 13/08/2021

This week I did:

Show me the data

Spent some time this week working on data processes and understanding how and where data is collected, processed and stored to see if there is anything we can do to improve the consistency of collection and efficiency of processing without disrupting any existing processes that rely on the data. It could be someone’s life’s work to understand every piece of data from where it starts, where it goes, how it’s used. But not mine. I’ve developed a framework for how we understand different types of data and how the usage of the data defines how generic or specific it is, which helps us understand the best way to collect, process and store it. Given that it’s business critical data this whole piece of work needs greater robustness and consideration, so testing out my thinking by how well the framework (which is really just a visual representation of my thinking) communicates the context for what we design next is important.

Ahead of schedule and on target

I finished analysing the information I collected for my dissertation which puts me a week ahead of schedule. Next is writing a case study from the analysis that ‘builds theory’ around the innovation processes used in charities. And I scored 80 on my last exam (who knew I knew so much about Blockchain) which puts my overall score for the modules (which make up 40% of the overall grade) at 70.1. Right on target. Distinction with the least amount of effort.

Read and listened to this week:

Rethinking your position

The Knowledge Project podcast episode with Adam Grant is the kind of podcast episode you can listen to again and again to get the most out of it.

Charity innovation stuff

I’ve found lots of interesting papers on innovation and the charity sector which don’t fit with my dissertation but which I might come back to, so I’ve added them to the notes section of my website.

And thought about:

Top-down or bottom up?

What’s the best way to plan lots of pieces of work? Should you start at the top with a goal and work down defining the things you need to do to achieve the goal? Or should you start at the bottom with all the work you know about and build up into the categories where things naturally fit? Top-down applies a structure, bottom-up is more emergent. Top-down seems better for planning against external constraints like deadlines, bottom-up seems less likely to miss things and better at spotting connections and dependencies. Anyway, this is the planning tool I want.

How to compare things

We usually leap to doing the thing we need to do rather than figuring out the mental model or thinking process that we need to apply in order to do the thing effectively. Comparing things is an obvious case. We know we need to compare five similar things in order to pick the ‘best’, and we might have a vague, intuitive idea of what ‘best’ might look like based on experience, but we don’t really have any means to judge what makes one nearer to ‘best’ than another. Luckily, there are only two types of comparison to choose from: absolute or relative. Absolute has preset criteria to judge against. Relative compares one thing to another. If you were comparing dogs to see which was the most intelligent you would need an absolute definition for intelligence, and then whichever dog got closest would be the most intelligent. If you were comparing dogs to see which was the biggest, you would compare them relative to each other, you wouldn’t have an external criteria to judge them against. Choosing how to compare things before comparing helps to compare them in the right way.

Q4

My masters will be finished in September. Then what? What shall I do with everything I learned about innovation and how it’s used in the charity sector?

This week I’m grateful for:

Open-mindedness

The open and creative thinking of some of my colleagues as we’ve explored the way forward for projects. They’ve given me the space to work through my thinking about things rather than requiring the single right answer.

My growth area this week:

Always being right

I’ve started to realise how much I try to prove that I’m right. So, personal kaizen, what 1% improvement can I make on this? I’m going to start by trying to build my self-awareness of it, try to catch myself doing it, and maybe keep a note of the situations it occurs in, asking myself ‘What problem am I solving for this person?’, to help me understand if the stuff I’m saying is for them or for me.

Data & Society researchers produce rigorous and relevant evidence to ground and inform public debates

Roger on 29/08/2020

Data & Society researchers produce rigorous and relevant evidence to ground and inform public debates

Architecture Playbook

Roger on 28/08/2020

https://nocomplexity.com/documents/arplaybook/introduction.html

Why data will be so valuable in the future

Roger on 29/06/2020

‘Data is the new oil’ is an attempt to suggest how valuable data is. It’s an analogy that works when we consider data isn’t valuable in its raw form, it needs to be processed to become something more valuable. Oil is valuable because so much can be done with it, just like data. But there the analogy breaks. Oil is a limited resource, and once processed it cannot be reversed. Data is constantly and continuously being generated, and using it does not use it up. The same data can be reused repeatedly in different ways.

In many ways data is unique and non-analogous. Data is both what is processed by computers and what they use to run the processing. Nothing else in the world uses fundamentally the same thing to do the processing and be processed. Data is an abstraction of the real world, which means all data is interrelated with every other piece of data. Nothing else in the world is as connectable.

Conceptually, all data is connected. Someone buys a desk.The measurements of the desk are on a spec sheet used by the manufacturer, along with the safety standards the desk meets and what materials it’s made of. The manufacturer holds data on the employees that built the desk, including how much they were paid, when, etc. That employee’s bank also has data on how much and when they were paid, but also what they spent their money on. The retailer they spent their money with knows some of what they did with their money, including that they bought a desk. But because all these data sets are independent no one person or system sees how they connect.

As more data becomes more interconnected its usefulness increases exponentially. But in order to achieve the interconnectedness of datasets and make them useful, collection, storage and processing have to be decoupled from each other. When competitive advantage comes from the collection, storage and processing of smaller specific datasets that organisations use to draw insights only relevant to themselves, interconnection is prevented. If data collected from numerous sources is stored in a way that is equally available to everyone then competitive advantage can only come from processing. Those organisations that have the capabilities to utilise insights from the analysis of huge aggregated datasets win out, but require an intermediary to store the data and prevent monopolisation.

Data Trusts work like a bank but for data rather than money. Just as no organisation keeps the money it makes, nor would they keep the data they collect. Industry standards would standardise data collection and laws would make it illegal for organisations to store data. Data would be held by these Data Trusts and made available only to those that contribute their data. Anonymised data is accessible in real time for processing by organisations to draw insights that enable them to make decisions that take account of an unimaginably huge number of data points.

Data Trusts would specialise in particular types of data; retail, health, manufacturing, etc., creating a further layer of anonymisation and aggregation for organisations wishing to correlate datasets. Interesting new commercial models would develop around the cost of accessing data to take account of increasing returns mechanisms and the decay of relevance.

Why we’re calling for a data collective

Roger on 07/06/2020

We propose forming a data collective: a conscious, coordinated effort by a group of organisations with expertise in gathering and using data in the charity sector. Here’s why…

What is Digital Asset Management and Why It Matters for Organizations

Roger on 06/05/2020

https://cloudinary.com/dam-guide/dam

On Data Ethics: An Interview with Shannon Vallor

Roger on 12/01/2019

https://www.scu.edu/ethics/internet-ethics-blog/on-data-ethics-an-interview-with-shannon-vallor/

Why Data Is Never Raw

Roger on 31/12/2018

https://www.thenewatlantis.com/publications/why-data-is-never-raw

Real-time Lightning Map

Roger on 06/08/2017

There are a few of these cool little websites that surface real-time data for things like aeroplane locations, tor data flow, and this one for lightning across the world.

Data has a half-life

Roger on 21/05/2017

Every piece of data that exists about a person, their behaviour, and any prediction has a half-life. Relevance decays over time.

Name, for example, might have a half-life of about a fifty years or so. In a hundred years I’ll be dead and my name will only be half as relevant as it was fifty years before when I was alive. My search history could have a half-life of between two weeks and two minutes. If I’m trying to find my nearest petrol station, the chances are that the results are most relevant between now and when I put some fuel in my car, and ten minutes later the relevance has halved and is only useful for agregating with other behavioural data. A year’s worth of transactional data about what I’ve bought each of the past fifty two weeks might have a half-life of five years if my purchase habits stay the same, but as those habits are likely to change over time the data set would also change over time with certain points decaying faster than others if I stopped buying certain items.

Understanding how each piece of data has it’s own half-life and how the relevance decays over time based on that half-life can help companies provide better personalisation and could be a means of deciding when data should be deleted to conform with evolving data protection regulations.