ASF 019: Simon Whiteley interview

ASF 019: Simon Whiteley interview


As Chief Cloud Architect with Adatis Consulting, Simon Whiteley has been working with the Microsoft BI Stack for a decade, starting off building traditional warehouses with SSIS and SSAS but bringing in alternative technologies where needed. Most recently, he has turned his head to the clouds, advocating new methods of getting insight to the right people. Whether it’s automating file ingestion into Azure Data Lakes, applying massive compute via SQLDW & Data Lake Analytics, or simply shifting your current SSIS packages into the cloud, he can help.
Big believer in the wider data community – Regularly present on cloud analytics architectures and runs the SQL Surrey PASS chapter & co-organises the London PASS Chapter. Simon is a strong advocate of the use of modern development practices in the SQL World.
Data Geek, Cloud Herder, Speaker, Cyclist, Microsoft Data Platform MVP and general Tech-Nerd.

This talk has taken place after London SQL Server User Group meeting in London (UK), CodeNode on 29th January 2019 (Tuesday).
Interviewer: Kamil Nowinski.

How many bicycles he has got at home and where was the destination of last year of a 400 km trip?
Which tool among Azure is the closest to his soul?
What exactly is hiding behind the term “Modern Data Warehouse”?
When Azure Data Factory (Mapping) Data Flow will be publicly available?
Also, we will try to answer the fundamental question: is the SSIS dead or alive?
Find out the answers on these questions and much, much more.

Audio version


Don’t you have time to read? You can listen to this as a podcast! Wherever you are, whatever you use. Just use the player directly from this site (above), find it on Spreaker, Apple Podcasts, Spotify (new!) or simply download MP3. Enjoy!


Kamil Nowinski: Hi, Simon.

Simon Whiteley: Hello.

KN: Thank you for accepting my invitation to this podcast, SQL Player.

SW: More than welcome, thank you for doing my user group.

KN: I was happy to help as always. Could you, at the beginning, tell me what your name is and where do you live?

SW: Okay, so I’m Simon Whiteley. I live in a place called Gravesend recently, in Kent, previously London. So I’m kind of London, but not quite London anymore. London in my heart.

KN: It depends who you ask, they tell you that it’s not London or it’s a part of London?

SW: No, I live really far away from London, I live in Kent.

KN: What are you doing for a living?

SW: So I am the Chief Cloud Architect for a company called Adatis, so a consultancy. We do advanced analytics consulting. Used to be BI developer, used to write lots of SSIS, lots of SSAS, SSRS. These days it’s all entirely Data Lakes, Databricks, SQL Data Warehouse is using that modern Azure stack, so everything I do is making sure we’re doing things in the most modern way, we’re learning and using the most recent approaches. I personally tend to move between a lot of different clients, making sure we’re sharing learnings. Making sure if we build something really cool in one place, we’re actually sharing that with the other people working with, and we don’t have two entirely different ways of doing the same thing. So I spend a lot of time across a lot of different clients just talking tech.

KN: You’re synchronizing the works of other people.

SW: Yeah, dipping my own in, going ‘Oh, no, there’s a better way’, generally annoying our consultants, that’s kind of my role.

KN: When you said ‘the modern architecture’, there’s a modern of everything basically those days. What do you mean by that?

SW: So that is the big shift of taking all the learnings from Big Data. It’s very much the idea that you’re either a Big Data developer or you’re a SQL developer and there’s a lot of different tools and techniques used between the two. So Modern Data Warehouse is all about taking those approaches, taking the Big Data ways of doing things and just making it normal, making it part of how you do BI. So every BI solution we do these days has a Data Lake, it has Elastic compute. It uses things like Parquet and avro. And it’s kind of all these things that used to be just Big Data and just making it path of just how we do things.

KN: So there’s basically no more solution without Azure? Without a cloud solution?

SW: That’s not true anymore. It was that way. So very much it was the… If you’re in Azure we go great. You’re going “Data Lake”, you’re going Databricks, you’re going all this modern new tech. And if he came to me and said ‘I’m going on-prem’ I would wince and go, ‘Okay, you’re getting SSIS, you’re doing things traditional, we’re not elastic scaling, it’s a very different way of working. 2019 has Big Data clusters which gives an in-house spark engine based on docker, which means you have a lake. You have inbuilt spark, you have all of the things we’ve now been doing and calling the Modern Data Warehouse in Azure. You can now say “I’m gonna do this on-prem”. I mean, you still have to have a bit of tin underneath, you still have to have big enough VM host to support the maximum capacity of the scaling and stuff. But all the tools and techniques, the landing things as flat file, so you can acquire data really quickly. All of that stuff you can now do is on-prem learnings when 2019 comes out, cause it’s not out yet. But when it is, there’s a big change. The way we talk about on-prem is going to change a lot this year. I sound like a Microsoft sales person. I’m just scared because there’s a lot of stuff I’m gonna have to reapply, lots of ways of working that we had kind of written off never having to think about that on-prem, that suddenly now is going to come into every single on-prem conversation. So I need to get a lot of solutions, a lot of patterns, and a lot of it is going to be taking what we do already and re-applying it. But there’s a lot of rework and lot of rethinking about how we do stuff.

KN: That’s good, we’re still advocates of Microsoft stuff.

SW: Absolutely. And it means I still have a job this year because everything has changed again.

KN: Everything’s changing. So when we’re talking about technology. I know that you are doing a lot of stuff with Microsoft tools. Which one of the solutions, one of the services is the closest for you and which of these two? So the question is what is the closer to your soul: Azure SQL Data Warehouse or Azure Data Factory?

SW: Either way, one of them is gonna get really angry with me. So it’s very weird I’m going to say this, but it’s definitely Data Factory. Because Data Warehouse is awesome, it does a good job and you can do a very, very heavy crunching. It’s kind of cool. But it’s not solving a new problem. You could always just have a really, really big SQL Server and put stuff on it. You can have Cassandra, you can have Snowflake, there are different kinds of MPP engine you can have. Whereas Data Factory now, definitely V2, definitely the current iteration solves so many problems for me. Cause all things like having to use BIML to make SSIS packages and having to write PowerShell to crank out so many iterations of ADF V1. All of the pain of creating like any kind of orchestration and scheduling in Azure it’s gone with V2.

KN: Even in generic pipelines or so?

SW: Yeah, you can just build a generic pipeline and then say ‘Now run this same pipeline with a few variables for each of my thousand tables’. Because it used to be, you could get around the problems by code automation. But at the end of day to support it you still then have a thousand separate pipelines to support and now I can just say ‘Well there’s a generic thing and that will do whatever I want, no matter what the data source is’ and that’s key to the whole ethos of Data Lakes. I don’t want to build a separate thing for every single file, I want to make it as… you want to remove the barrier to entry. I want to build frameworks and I want to say ‘So now if I’ve got a client who’s using ten different tables in their SQL database and they go ‘Actually I need another one’ then all they need to do is just add a line to a database and then magically that table comes in next time. So the whole reason you use Data Lakes is to remove the barrier to entry of getting data into a system. I want to make it as easy as possible to start collecting that data. I don’t necessarily want to process it. I don’t know what logic I’m gonna apply, I don’t know how I’m going to use that data, but I want to start keeping it. And if I can just write and either add JSON file or change a database record and suddenly that works, and that means my business users can then have a power app over the top of it and go ‘Here are the tables I would like to keep’. And I can just push that, that’s no longer my problem. My business users define what data gets pulled into a lake, that makes my life so easy and it really gets behind the idea of data lakes. So it’s kind of like I think for the business Data Factory’s had more of a change. So Data Warehouse is neat and does a job, but the business users don’t really see that much of it. It tends to be abstracted away for business users, whereas Data Factory actually has some use for them, it has some purpose, they get a tangible return based on Data Factory. As for me, it makes a difference.

KN: Yeah, true.

SW: That was a long answer.

KN: Another one is extension of Azure Data Factory, which is the private preview feature, Data Flow. What do you think about that one?

SW: I mean, you know it’s close to my heart. Again, that’s a business user thing, so we’ve gone the whole big data processing element. We’re using a lake, we’ve got raw data landing in a lake. It’s maybe too big or it’s the wrong shape. It’s coming in a bad format or it’s coming in a streaming data and that means I can’t put it into SQL Data Warehouse. So I’m using the lake to kind of fix those problems, the four V’s of Big Data. So I’m trying to get that data and making it accessible. And Databricks is now the de facto answer. So it was Data Lake Analytics for a while and again someone who would have to go away and learn U-SQL, they’d have to learn this new language which is just used in one single tool to do it. And now Data Lake Analytics has kind of fallen out of favor, everyone’s pushing Databricks and again, you need to go into Databricks and you can write SQL in there. But you need to use either Scala or Python at some point. Maybe a bit of R, maybe a bit of SQL but… Now that’s kind of been great and we started using it and, we’re using it lots of places, we’ve built lots of wrapper, kind of framework patterns. And that’s great. So I can just pass some parameters in and have it automatically process some data. But again, someone has to understand that to support it. Someone needs to understand Python or Scala and go and manage that. That’s another language I need. Whereas Data Flow means I can pass it to something like SSIS developer, who can pick that up and go ‘Okay, I know what this is about. This is just a Data Flow’.

KN: I will put this component here.

SW: Okay, it’s going horizontally, it’s not going vertically, it goes horizontally, so it’s a bit different. But otherwise it’s designed to be really familiar and similar to that kind of traditional BI user who’s finding the whole thing intimidating, going: ‘Do I need to rescale, do I need to retrain, do I throw away all my knowledge of the past 10-15 years of development?’ And he’s going ‘Okay, no, this is something I get, but it’s gonna write it in Scala behind the scenes, so it’s actually going to run of natively efficiently on Databricks’. And it gives you a lot of control. I mean, there’s critics of it, there’s people who don’t like it, because it is seen as an abstraction layer, it’s convincing people: ‘Don’t write code, code’s too hard. Use this graphical user interface instead’. An there’s valid reasons for that. But you can drop down. For me for the simple stuff, for the business user stuff, for the data analyst level, where they don’t need to retrain, they don’t want to rescale, but they need to be able to, you know, you can have the people who understand the transformation rules designing your transformation pipelines, because it’s nice and easy. And then if there’s something really hard that is complex, or is going to need complex performance optimization, you can drop down to the Scala, you can have someone writing the hard specialist bits and not doing the ‘I need to knock out a thousand of these kinds of jobs’. So it’s separating the problem. You’ve got straightforward business transformation, anyone can do that, it’s really straightforward, and then your specialist and it’s kind of meaning you can balance that team out. It’s cool, it works for me.

KN: It’s cool because you can create the business logic with Data Flow without code. And it’s very efficient process.

SW: The thing I’ve not tried to do yet is to make the data flow fully parameterized, so I can apply the same logic to different data sets.

KN: I tired. You can’t.

SW: That for me is the problem, because I’ve gone so far with normal Data Factory. In terms of getting it so it’s co-driven, sort of parameter-driven. Getting it so I don’t need a thousand different pipelines to process my files. And then if I drop down to Data Flow, I’m suddenly ‘Oh no, I now need a thousand different pipelines to manage my files. So it does have some limitations still.

KN: But I think this is a little difference between Azure Data Factory pipeline and Data Flows. Data Flows, I guess, it’s basically designed for specific reasons, for specific scenarios. Framework probably will not help here.

SW: But then it’s the same in code. We can have things automatically checking data types, checking files are in the right shape, we can do all of that automatically.

KN: Yeah, from the source perspective for example you should be able to apply the parameter, but currently you can’t.

SW: But to do business transformation, to apply logic, nothing’s gonna write that automatically for you, no matter what the vendor promises you. It’s not gonna do it for you. So you either have a separate SQL script for each of your data sources or a separate Data Flow for each of your sources or a separate Python script. You need to have your transformations documented somewhere. So Data Flow is somewhere you can use to keep the transformations. Some people won’t want to and some people ‘what you want to have?’. Maybe you know kind of SPARQL, they want to have the SQL written inside the Spark engine because that’s more natural to mop out of them, they can. So it’s options, it’s all about having different tools available to you to do different things. There’s certainly not a ‘everything is going to happen in Data Flows from now on’. I’d say much more likely that there’ll be generic things that can be one generic Python script that runs for every single thing, just saying ‘what structure should it be, is it in the right shape, all of the data type’s right’. You don’t need the Data Flow to do that, cause you can’t parameterize it. So pass it to a single script, tell it what shape it should be in and that’ll run. And you can use that same notebook for every single file type. So that kind of stuff I wouldn’t use Data Flow for. And only when it is file specific, drop-down and use it there.

KN: That might be a very strange question. What’s changed in ADF between version 1 and version 2?

SW: All of it! The variable-driven stuff, so suddenly you don’t need to generate a ton of script, you can just have reusable components. But also the control flow stuff, so whenever one picked up ADF, and they wanted it to be SSIS, they wanted to work the same way and so many people were frustrated with ADF v1, because it wasn’t SSIS, and now the fact that you can have failure paths, you can have Foreach loops, you can have weights and web lookups and you can get reference data from a database and use its parameter. All these things, it’s so sound that we’re excited about it, because it’s just the most obvious thing you should be able to do. And that’s like the big shift between ADF v1 and v2. It became a product that actually you can use properly seriously in an enterprise situation. V1 you could barely, you have a load of workarounds, a load of code and it kind of worked. Honestly for the entire lifecycle of v1 I didn’t use it. I used it once or twice on projects where we thought we’d try and force it. And it was OK. Most the time we used SSIS, we had SSIS sitting on a VM and we were sad about it. So v2 it’s just so much more acceptable. It is a good product. I was such a hater of Data Factory for so long. It was one of those things, you mentioned Data Factory and I’d go ‘Ha! No-one’s going to use that’. Now I love it! As you said, I love it more than I love SQL Data Warehouse and you know that’s dear to my heart.

KN: One of the questions during my session was when will Data Flow private preview become public, but didn’t hear your answer.

SW: Soon. All I can say is soon. Yes, 2019.

KN: In Microsoft nomenclature?

SW: I can neither confirm nor deny any of these comments. But it is coming soon. It’s coming soon to the point of people who were kicking off projects now, I probably recommend they look at it, and they consider using it, because it is getting near to that point in its life cycle. That’s about as detailed as I can be in that answer.

KN: Last question about technology. Is the SSIS dead or alive?

SW: That was another question from tonight that I was very good and I didn’t say anything about. Do I think SSIS is dead? So it’s becoming a niche use case, so there’s this very few scenarios when I would recommend SSIS now. And it tends to be, if you have a massive team with a ton of SSIS existing, you’ve got a lot of in-house SSIS skills and all of your application exists as SSIS package, and you wanna lift and shift, absolutely use SSIS, there’s nothing wrong with it. So you’ve got Data Factory has the SSIS integration run time behind it, it’s all SSIS-packaged on it and it’ll work the same as an on-prem SSIS box. What it won’t do is take advantage of most of the whole Azure to the point of having Azure. It won’t elastically scale. So you can have multiple nodes, but all that will do, it will help with concurrency of package execution. So if you try to do one massive package, then you need to pay for a really big VM to actually get that to work properly. And it takes 20 minutes to turn on, so absolutely everything that I do these days, we’re trying to tie the amount of compute you’re using as closely to your workload as possible. So things that have a twenty-minute workloads are not tied at all to your thing. It just means that SSIS for me doesn’t scale properly and is too slow in terms of startup to actually fit in a modern data warehouse. But if you want to get that quickly and you’ve already got a lot of SIS stuff, by all means use as a lift and shift. Get your whole platform onto Azure, get it working using the resisting stuff and then it’s basically technical dare to go and strip that out and solely replace it with things that naturally scale more. But that’s all I see it for these days, it’s a stepping stone to moving to the right wing’s in things that are more cloud native. That’s all I use it for these days.

KN: The new solution will be adopted year by year by all existing companies consequently. And if the project or your, let’s say, performance point of view needs a more robust and faster process, probably you’ll adopt the idea of SQL. If you have budget, time for that.

SW: It’s not so much ADF, it’s Databricks. So it’s using an in-memory Spark engine. And whether you do that by accessing Databricks directly, or whether you do it using ADF as an abstraction layer, either way what you want is a Spark layer, because that’s the thing that can naturally scale out over multiple nodes. No matter how you get to that Spark layer, that’s the way you should be going.

KN: Okay, let’s talk a little bit about private things. Not really deep private, but you know. I know that you have, let’s say, a lot of bicycles at home. Seven?

SW: 7, technically 6, one’s my wife’s. 6 for me, 1 for the wife.

KN: You have 7 at home! What do you do with all this stuff? Apart of cycling of course.

SW: They mainly live in the garage, that’s why I had to move to Kent, so I can have a garage that can fit enough bikes in. Honesty I probably use one bike more than every other bike. I have my cargo bike which is like the big long one with a plank on the back, so I can carry several bags, several PCs, a crate a beer.

KN: You’re doing this for shopping?

SW: Yeah, you know. so I don’t drive. You can see the London ethos, I’ve never needed the car, because, you know, London. So if I’m going to the supermarket, I ride my big old bike and I fill up the back. It’s basically a pickup truck in bike form. That’s the way forward and it’s not too slow. I rode from Lyon to Marseille this year, it’s 400 kilometers on a cargo bike, it’s a long way to go. It was last year. It wasn’t last week.

KN: Big challenge.

SW: Yeah and I’m not fit enough for that these days. I’m generally unfit, but I’ve got good cycling legs.

KN: Tell me about that journey.

SW: The journey was a charity ride. So with Adatis we do sort of every year, every other year we do some big challenge thing, so we did London to Paris originally. It was good, rode down, got the ferry across, rode through. Last year was Lyon to Marseille, down the river, through Avignon, through Valence, through some beautiful places. It was good, it was warm. It was like middle of summer, so going up to Marseille, we didn’t quite realize there’s a massive hill. Like a huge hill going into Marseille and it was like maybe 30-32 degrees on a cargo bike. If there was a bus, I could have just hitched a ride with my bike, I would have been very happy. But we made it, it was good, didn’t break down, no major injuries.

KN: Remind me how long the distance was?

SW: 400 kilometers. And we raised a lot of money. So we raised over 10k for different charities.

KN: So every year you’re doing this charity?

SW: So far it’s every other year, but we want to start doing every year. So we’re talking about what we do this summer.

KN: So every other year there’s a different activity?

SW: Yeah. So we’re getting some pushback now and everyone’s like ‘Oh, did I have to be cycling again?’ Obviously, I’m biased, I wanna cycle. We’ll see.

KN: We are sitting in the building, what is it called, Code Node?

SW: Code Node, yeah.

KN: Tell me something about that user group. It’s London SQL or SQL London?

SW: It’s the London Pass Chapter technically. So it gets called the London SQL user group, the SQL London user group, the many different user group things. I have a secret campaign, because just like every other SQL group in the entirety of the world, it shouldn’t be called SQL anymore, because we’re doing Data Factory, we’re doing all this other stuff, we’re doing Azure.

KN: A lot of conferences changed their names.

SW: Yeah, so the other group that I look after, SQL Surrey, is now the Surrey Data Platform group and something similar is going to happen to London at some point, but don’t know when yet. We’ll have a rebranding event at some point.

KN: When will you reveal that?

SW: When I convince the other organizers that we should do that. It’s by no means agreed, that it’s not an announcement, but it should come soon.

KN: So another question related a little bit to technology, but we’ll go back to this. What is you hobby basically?

SW: My hobby, oh. Okay, there’s a couple… Cycling if you didn’t get that. I do a lot of board gaming, so I’m a massive table board gamer. The wife is a massive board game fan. We have a bookcase just full of different games. So anytime I can get a crowd of people around the table to play a game, I’m a big fan. Don’t talk to me about Monopoly, Cluedo it’s not that kind of thing. And cooking. I am an avid chef, I spent a lot of my time cooking. If I’ve got spare time I’m probably in the kitchen. And so my wife is French and she’s a big baking fan, so it’s good. So I make food, she makes cakes. She’s French, so patisserie and all that kind of stuff. So together we make people fat essentially. We make people fat, and then we get to play games and then I go for a ride to burn it off. And it’s all good. Kind of works.

KN: Everything is balancing.

When did you start your adventure with Microsoft Data Platform or whatever at the beginning?

SW: 2005-2006 around then. So I started off actually 2004, I worked for IBM for a year and I had the horror of building reporting systems in Lotus Notes and all kind of stuff like that. That was a bad experience. Bit of Hyperion Brio, a few other things around that time. First job – load of Access databases, good old Access, building reports. We had this process on a Monday morning of this Access database, would churn through and create all of the company’s reports. And it would take like six hours using pure Access reports crunching and churning and doing all that stuff. And that was my first realization of how bad BI can be.

KN: You heard during the university probably?

SW: No, I didn’t do a tech degree. I did a business degree, I did nothing but management and accounting and marketing and economics and all of that kind of fluffy business-y stuff. I always knew I was always a techie. I spent my childhood building PCs with my dad in the garage, doing that kind of thing.

KN: Yeah, I believe the passion is even more important than the official education.

SW: I’ve always thought ‘Yeah, you can pick up tech’, but I didn’t know much about how business runs, how people deal with data, how people deal with marketing. And BI it’s good having that backing. Being able to go to a marketing department and say ‘What are your challenges?’ And actually understand how they work in the theory and the fundamentals behind it. It kind of helps with that working with these different people going ‘I know a bit of HR, a bit of marketing, bit of accounting. Obviously we’re talking 15 years ago, so probably a bit rusty. I couldn’t walk into a marketing department and work as a marketeer, but it just helps having that familiarity. It gets you started in that conversation knowing what kind of challenges they face. And then yeah, we got SQL Server, we replaced all Access databases.

KN: In the same company?

SW: Oh yeah, we eventually, you know, we brought SSIS in, so I helped build the very first cube there. And a lot of it was kind of that internal… didn’t really see a lot of the world, didn’t go to other conferences, kind of had that little bubble, we only knew what we knew and everything was an incremental improvement. I stayed there for seven years in the company. And then when I left, I joined Adatis, where I’m now. Three main jobs. And it was like going out into the world and going ‘right, all this stuff that I’ve learned and taught myself and worked with over 7 years, is that actually the way people do things, does that actually work?’ And the answer is: some. There’s definitely things I didn’t know that well. Definitely things I’ve done that no one else had done. So a weird mix of… a balance of skills and you get a lot of that. When people join the company we go ‘What do you know?’ And it’s like ‘I know a little bit of that, a little bit of that, and tons about that!’ And you kind of get that imbalance and that’s what’s there. And it’s about smoothing that out. Filling in any gaps, making sure people have that kind of a decent bit of knowledge about everything and then, yeah, specialisms when they know more than everyone else. That makes for a really good problem-solving team. You get a lot of people around the problem and there’s people who they all understand the same language that they’re are talking. But people have different depths and different viewpoints and different perspectives. If all the problems are fixed by one person, you got a very skewed solution at the end of the day. This is why it’s really funny at the moment, because this is whole shift. We’ve got like a load of the old guys. You’ve got the people who’ve been doing BI for 15-20 years and they have very firm ideas about how things work and they’ve got a massive depth of experience about a load of tools that people don’t use anymore. I’m really depressed, my MDX knowledge is just useless these days. And then you’ve got this whole group of youngsters coming in, you’ve got grad students coming in, who… they’ve learned Python, they’ve learned R, they’ve learned some of the languages that we haven’t historically said ‘That’s a BI language’. It’s not C#, it’s not SQL, that’s not Microsoft BI. And now it’s core to everything that we do. The old guard were heavily just trying to learn a language that all our grads are like ‘Yeah, we know that’. So it’s good and it’s a nice challenge that we’ve got people… the old guard can teach them ‘This is the way of working, the methodology, this is Kimball and this is the problem that you get if you build it in this way’. And then the other people coming in with a wave of new learnings and new ideas. You need that mix of experiences.

KN: You have those experienced people in Adatis. Tell me about the team, the company.

SW: So we’re now, I never know, 60-70 people, something like that. It’s different every time you ask. I joined nearly 6 years ago now and we were 10 people. I was like person number 10 and now to be 60-70 people, to go to the office and have people I’ve barely actually spoken to, it’s just a weird experience. But again, it’s really good, because we’ve now got that such a rich mix of background and where people are coming from, it’s good. So the culture is generally good, the culture is core to what we do. And it’s hard, you know, consultancy is hard. You have to travel to clients, you spend a lot of time on the road. And you have to have people expect you to be an expert on everything. They can be hammering questions, saying ‘How does that work, what do you do to that, what’s that scenario?’ So it is a fairly pressurized job, but it’s also fun. It’s not for everyone, but I love it. That is what I do. I love consulting around the Data Platform, especially given how much it’s changing. If I was still answering the same old SSIS questions I was dealing with 10 years ago, then it wouldn’t be for me. But given that almost every 6 months the answers to the questions are different these days, because everything is evolving so quickly and everything is changing. And it’s been a difficult period for a lot of people, because it’s in so much flux. You started the project a year ago, you’d using Data Lake Analytics for everything. And now if you start a project, you’d be using Databricks for everything, and that’s caused a lot of uncertainty in people. People are saying ‘That’s what you’re telling me now, am I gonna believe you in 6 months’ time, you’re gonna tell me something else’. So it’s difficult, but it’s fun.

KN: What do you like the most in your role in the consultancy company?

SW: For me it’s getting to work with a big range of people. Because of my role I don’t tend to be… It’s both good and frustrating. You’ve got the really good feeling when you finish a project, you deliver it and you’re like ‘I built that’. But my role’s different. I don’t get that feeling of start to finish, because I get involved in lots of different projects and usually quite near the start, usually when there’s lots of technical uncertainty, I’m there to try and de-risk projects to say ‘Oh, we don’t know how to do that. Cool, this is how you would do that, this is the patterns, this is where Microsoft’s going. Being an MVP I’ve obviously got that kind of view into the product teams, into how things are working, where the road map is, being the gold partner, I’ve got a different avenue into the same, so I’ve got quite a good view of where things are going. So I can go in and give that recommendation and make sure people are designing it right. Setting up for success to use a horrible business term. But I get involved in lots of that early phase projects and then dragged in, you know, to do any technical spikes and technical troubleshooting. Now I never get that day of just going in and like ‘Ah, today’s gonna be easy. I’ve got just stuff I need to crack through’. My day on-site where the client tends to be ‘Ah, Simon, we’ve saved up all these horrible problems that we don’t know how that works’. I tend to have fairly stressful days on-site, but it’s good. It keep it interesting, it’s always new problems that haven’t been solved yet. Because if it’s already been solved, then we’ve shared the answer and people can go and get the answer. So it’s an interesting role, it’s tough, but it’s exciting. Again, I get bored easily. So you know, having a role that changes…

KN: Problems that are easy to resolve are not very satisfying.

SW: If we’ve fixed it before, I’m not interested.

KN: So having all these changes every month, every week and all this technology that you’re working with. How you are trying to keep yourself up to date?

SW: It’s not one man. So we’re very keen on R&D, so we have a whole research and development process where all of our consultants are expected to be researching something new. So it’s not just ‘Ah, you’re on a boring project, it’s not gonna be interesting’. Part of our progression, part of the skills path, the career path is about doing R&D, about blogging, about finding a subject that’s your own. So I might be sitting there going: ‘Okay guys, there’s a new thing coming out. Can someone go and look into it and write it up and write a blog and compare performance between these things? Or write a pattern for have to do that’. That’s not me doing all of that stuff, that’s impossible for one man to be on top of all. I take the interesting ones for myself. I’m gonna make sure that people are doing it. I spent a lot of time reading the blogs and reading the output that our guys have done so that I can get good advice, but…

KN: At least you need to read a lot. But it consumes time.

SW: I spend a lot of time on trains.

KN: So you have this time. This might be a very hard question, depends on you. Which achievement are you satisfied with the most?

SW: I don’t know. So the MVP means a lot. That kind of thing of working in the community for a long time and kind of doing talks and running user groups, that kind of recognition means a lot, it is good. I don’t know… Lots of achievements. I don’t really keep track of that. I come across as a fairly egotistical, but actually there’s not many things that I do just for me. Most of the time it tends to be a team’s done things, that kind of stuff. There’s plenty of solutions that we’ve built, that were designed, and I’m like ‘Yeah, that’s really cool’. Yeah, I really like some of the stuff that we did. The data science and stuff. And that doesn’t come naturally to me. I’m not a trained data scientist. I used to do stats a lot, but not a lot  these days.

KN: So the really last question at the end of our conversation. Tell us where we can find you.

SW: All over the place. If there’s a conference happening and it has to do with Azure and data I’m probably there recently. So big things, got a pre-con coming up in SQLBits. So myself and Terry McCann doing a Databricks deep dive. It’ll be fun, cause we’re doing data engineering versus data science. So he’s repping the data science side of thing. I’m doing the real work. Also doing intelligent cloud conference in Copenhagen. Doing another pre-con around modern data warehousing, so covering all the different facets of that. I’m on Twitter, I haven’t blogged in an age. I get people to blog for me. I keep people to do R&D on certain subjects, so I wouldn’t take credit for their work obviously. But yeah, @MrSiWhiteley, it’s me on Twitter. Generally talking about Azure-y things and if anyone has any questions about how BI works in Azure, how data science fits in, how you do any of this stuff, what’s the point in a Data Lake, that kind of question I just get asked on a daily basis. So if anyone wants to know anything more about that, feel free to reach out!

KN: Thank you for this conversation today.


Useful links

Simon’s Twitter: @MrSiWhiteley
Simon’s Post activities on Adatis Blog
Simon on LinkedIn
We were talking about Azure Data Factory, Azure SQL Datawarehouse, SQL Server Integration Services (SSIS), Azure Databricks.
Simon is one of the leaders of the PASS SQL London User Group

Previous Speaking at SQLBits in March 2019
Next Last Week Reading (2019-02-24)

About author

Kamil Nowinski
Kamil Nowinski 200 posts

Blogger, speaker. Data Platform MVP, MCSE. Senior Data Engineer & data geek. Member of Data Community Poland, co-organizer of SQLDay, Happy husband & father.

View all posts by this author →

You might also like

Podcast 0 Comments

ASF 023: Amit Bansal interview

Introduction Amit R S Bansal is a SQL Server Specialist at SQLMaestros (brand of eDominer Systems). He leads the SQL and BI practice with a much-focused team providing consulting, training

Podcast 0 Comments

ASF 020: Paweł Potasiński interview

Introduction CTO at Clouds On Mars, former Microsoft employee as Data Insights Product Manager for Poland. In 2007 Pawel started Polish SQL Server User Group (PLSSUG), currently known as Data

Podcast 0 Comments

ASF 016: Chris Webb interview

Introduction Chris Webb is independent consultant specialising in Analysis Services, MDX, Power Pivot, DAX, Power Query and Power BI in general. He has been blogging about Microsoft BI for over


Leave a Reply