ASF 029: Tomaz Kastrun interview
Introduction
Tomaž Kastrun is BI & DEV developer, data analyst & data science.
With more than 15 years of experiences in the field of databases, business warehouses and development, with a focus on T-SQL programming and query optimization. His focus is also data mining, statistics and research. He has been working with Microsoft SQL Server since version 2000.
He is Microsoft MVP, Microsoft Certified Professional and Microsoft Trainer.
Tomaž is a blogger, author of many articles, speaker at various community and Microsoft events and an avid coffee drinker.
This talk has taken place after SQL Saturday 2019 in Ljubljana, Slovenia on 14 December 2019 (Saturday).
Interviewer: Kamil Nowinski (T).
Audio version
Transcript
KN: So today we are sitting at the venue during the SQL Saturday conference in Slovenia with…
TK: Tomaž Kaštrun.
KN: It’s very hard to pronounce your name.
TK: Yeah, Slovenian names are hard to pronounce but your pronunciation is correct.
KN: Thanks! So where do you live?
TK: I live here in Ljubljana, where the venue for the SQL Saturday Slovenia is happening. So this is my hometown and welcome to my hometown!
KN: Thank you very much! I know that you are one of the organizers of this conference. Could you tell us a little bit more about the conference itself and what is your role as the organizer?
TK: So SQL Saturday Slovenia this year we have the 7th iteration, so we started in 2013, I joined in 2015 I think, so literally we are full and I was the last one to join, and we usually just have each their own task that we need to cover and that’s it. At the event, yeah, all the usual stuff, so accepting the people, accepting the attendees, taking care of the speakers, we give great attention to every detail, to all of the speakers because they come from abroad and they’re tired, stuff like that, so we order this, we do this, coming a step forward closer to them, and this really helps them a lot as well.
KN: Yeah, it’s quite a local conference but it’s quite huge and very popular. One of the most important during the year I think. You have speakers from around the world.
TK: We are definitely very proud of that. We are taking quite huge pride in that. And this is something, you know, if you go abroad and if you visit all the people, get in touch with them, all the speakers, they will definitely come and sort of return the favour, come to your place. So it became a tradition. Since I’m the last one to join, I think Mladen and Dejan, and Matija, they were the first to start this attention to details and bringing a lot of people from abroad, and now this year is just amazing. And so it was last year. So everything is really amazing and we try to improve to get better. Not bigger but better in terms of bringing different audience, bringing different speakers and choosing the right…
KN: Keep the quality of sessions.
TK: Yes, the variety of sessions.
KN: The size of the conference is not as important as the quality of the sessions.
TK: So as long as we have speakers happy and attendees happy, I think this is the best recipe.
KN: Great, and I think you are doing a great job here, hospitality is fantastic.
TK: This is what we really try to keep the level up.
KN: So what do you do for a living?
TK: So I’m a BI developer/data scientist and I basically do a lot of stuff with machine learning, data mining, statistics, statistical analysis, and a lot of stuff with SQL, with cloud computing, Azure, stuff like that.
KN: A great mix.
TK: Yeah, it’s a hybrid.
KN: So BI but probably these days with more modern BI? In the cloud and with all the new services and also I’ve seen you speaking on the topic like Machine Learning for example, R language, that kind of things. So, about R language, could you tell our audience how to start or how difficult the R language is to learn and when you can use it, in which cases?
TK: There is the competition called Advent of Code going on right now, so if you Google adventofcode.com, every day you get sort of like a task you need to solve in any kind of programming language. So this year I decided to do it in R, which turned out sometimes a little bit problematic, but it goes so, which means that R is a statistical programming language with more stress on statistics than in programming but yeah, I’ve been using R for quite a long time, as well as Python. Once you start using it, you get fluent in it, for those who want to start and learn Python or R, or both, I would definitely suggest using it and learning it by cases, examples and stuff like this.
KN: Both of them?
TK: Well, one of them, both of them, many of them, it doesn’t really matter. The best way to practice is to have some examples, samples and…
KN: Because R language is like you said, more for statistical point of view, so this way it’s more often used by mathematicians.
TK: Yes, statisticians, mathematicians. But you can also do programming. It’s not the size of C++, C# or Python, but also you can do some programming.
KN: OK, and if someone has not started using either R or Python, which language would be easier to learn or use?
TK: Really depends on the background so if you have more programming languages, you would definitely prefer Python over R. If you don’t have that much of programming background, you would prefer R over Python. But in reality, I think the modern data scientists are people dealing with statistics, dealing with machine learning, data mining, they ought or they should or they must use different and more languages. So R, Python, SAS, I don’t know, you name it. Julia, Knime, they all use several different languages. Because one language might give you the benefit for this algorithm, the other one for that algorithm, so it really just depends.
KN: It depends on your scenario or case. OK, let’s talk a little bit about machine learning. Could you explain a little bit what machine learning is?
TK: As the term suggests, you let the machine learn on something and then execute something. So to sort of give it a different name, we used to call it statistics, then we used to call it data mining, and now all of a sudden it’s called machine learning, which essentially is, you know, it’s not the same but there are huge similarities in terms of the same algorithms, so what it does it just takes, for instances you take some data and based on this data, which we call training data, you train a model. A model is just sort of a presentation of a function and this function basically gives you the next best value. We call it, let’s say, prediction. So this is essentially what it does. You leave some data set, let machine learn itself based on this, and then it produces some sort of result, which is a function, which is a model. And then, based on new data that basically comes in, you try to predict what the model that you’ve trained upon tries to tell you what the next value will be. Of course, there are even other processes behind. You need to test, you need to validate, stuff like that. But essentially this is what it is.
KN: It already sounds a bit complex. For people who don’t know this.
TK: Yeah, but otherwise I’d say if you love it, you don’t find it that difficult. Some things are difficult of course. It’s hard to explain an unexplainable. The thing is you just need to sit down and just investigate and learn. You just need to have this continuum. Just don’t give up after five minutes if you don’t get the right result. And this is also something modern data scientists usually forget about. After five minutes: “ahh, the model doesn’t program, that’s crap”. You need tweaking, attention to details, stuff like that. This is essentially what machine learning is. And there is another perspective of that a lot of data scientists think “yeah, this is the sexiest job, yeah, I’ll do a lot of programming, a lot of testing”, where in reality it’s like 80% is just collecting the data…
KN: Exploring the data and then cleaning the data.
TK: Exactly, so this is the problem with it. The skewed perspective of this machine learning world.
KN: The data science role or position is quite popular right now. What do you think, is this the right direction, calling many of the roles “data science” or this should be called differently?
TK: I think the data engineer… So the roles have changed over the years and you would have like DBAs, developers, stuff like that, so nowadays you have all this data science, machine learning has been injected in all these aspects, which means that the modern DBA should also know at least some Python, at least some R, should know how to explore the data setting using Power BI, should know how to use Azure functions… All this mesh of different technologies, different languages, this is something that the technology is basically pushing us toward, that we need to explore.
KN: Yeah, exactly, it’s quite hard these days to understand. I mean not to understand, but to stay up to date with all this technology. Basically you must know maybe not all of them but much more than in the previous years.
TK: In terms of different technologies, sometimes you just need to say “OK, that’s it, I’m not going further because I want to investigate what I’ve learned and try to deepen the knowledge”, because if you’re just trying to catch all these new technologies, basically you run out of energy, you run out of talent and essentially at the end you just have some idea how the technology works but you don’t have any in-depth knowledge of what a particular algorithm does, what a particular language does.
KN: And for people who are trying to understand or learning about machine learning – how to understand the three types of machine learning algorithms like supervised, unsupervised, reinforcement?
TK: I always give some similarities to supervised/unsupervised, so imagine, since you have a daughter, imagine telling your daughter that “you should go clean your room” and you give her two buckets and there are dolls and cars on the floor, and you just tell her “clean the floor and clean your room”. And your daughter would be exploring those different toys and based on the similarities of the attributes of the different toys. So let’s say dolls have hair, legs and stuff like that whereas the cars have wheels and stuff like that. She would be exploring herself which would be rather unsupervised. In terms of the supervised learning, you would tell your daughter “put the dolls in bucket #1 and all the cars in bucket #2”, so this is the difference between supervised and unsupervised learning. We also call it directed or undirected learning because with unsupervised or undirected you are not giving any directions where the algorithm needs to go or the exploration of the data or the discovery of the data should go. We don’t know, at this moment we don’t know, let’s explore it. Whereas in supervised, directed you usually know where you want to go.
KN: That’s a really nice explanation. For sure I will remember that now.
TK: And with the reinforcement learning I think it’s relatively old but people have kind of been scared of it, not sure, but if you go to ODSC conferences, they will tell you they’re using it for auto-piloting cars and stuff like that. So essentially the idea is, once you go in the right direction, you get awarded, vs if you go in the wrong direction, you get penalized. So this is the essence of the algorithms.
KN: How to start with machine learning if someone is interested to get that data science role in the future?
TK: This question is kind of hard to answer because I get a lot of these questions and I have an easy answer because I studied it at the faculty but the problem is if somebody new wants to start, I always say: either explore some problem, go to Kaggle, download the dataset, go find a UCI where they store all the open data sets, and just start exploring, and by learning and doing. Of course, if you can take some classes at the university, that’s even better. There are also some webinars, online classes, books available. So yeah, these are great sources, but in reality, I would say that the best thing is always when you’re in an interaction with a person, with a professor, with somebody who has you know the knowledge, it’s much easier to get on board. So definitely, if you are learning it in a supervised environment, let’s put it in this perspective, then your chances for success should be higher than if you’re exploring it by yourself. I mean, you would succeed fast enough.
KN: And from a technology perspective, what kind of tools, what can help you? Machine Learning Studio, something else?
TK: So there is Azure Machine Learning Studio, since you mentioned it. You can use that, which is quite easy. It’s based on Python SDK. You basically just throw in data and there is a set of algorithms that just basically does all the different iterations, goes through different algorithms and basically outputs you the result. There are also some other tools like R Studio, Python IDE. We can just explore that. If you’re in a corporate environment, there are also tools available for that as well, so IBM for instance, SAS which is famous, SAS Institute from the United States. There is Knime, which is a Swiss or German-based, not really sure, company. You get a lot of that. And now with the emergence of different technologies and different companies, we have even more platforms to do machine learning.
KN: And I think I should provide some links at the end of this conversation in the blog post.
KN: OK, so what do you think about the MVP these days?
TK: I’ll go on the other side and answer that the thing from my perspective is that people don’t respect the community work nowadays as they did twenty years ago.
KN: You mean people like the audience or people who have the MVP?
TK: In general, so let’s say IT people. Since they know that you can go to Stack Overflow, that you go to YouTube and stuff like that, but in reality, they don’t understand that in the background there is a community. Stack Overflow is a community, YouTube is a community, events like today SQL Saturday Slovenia, that’s a community event. And people forget about that. So they don’t respect that. And there’s the problem that a lot of developers, they just say “yeah, I’m just going to learn it”. “Yeah, who cares about this meetup, who cares about stuff like that”. No, no, no! That’s the wrong perspective. You should commit to that, try to learn something, try to share something, this is what basically builds the community. Since we have this plurality of books and audios and videos and stuff like that nobody really cares, which is wrong because people in behind who create, who are creators of all this material. They should basically be awarded for that, because a lot of people do this for free. And from time to time it’s really hard. I do respect all the MVPs and all their work that they do, it’s really amazing, but on the other hand I kind of, you know, I would really hope and wish that people in the tech world really respected the community work. It’s basically our job to push this, to be sort of like evangelists, leaders.
KN: Show people that it’s really worth, it might basically change your life, professional life.
TK: Yeah, so coming back to the MVP award, I think this is a great, great recognition and I’m proud of it. To be honest, I’m proud of it to be recognized as an MVP. And I know it’s hard work.
KN: Exactly. My next question is going to be “how often do you travel?”, across Europe or worldwide.
TK: I had a little bit of a pause and break in the last half a year or so, but usually I do 15–20 travels. So SQL Saturdays, different events, different conferences. I try also to introduce Microsoft technology to totally different communities that are related to IT or different conferences that are not necessarily Microsoft-oriented, so yeah, this is usually what I do. I present Microsoft with their technologies, solutions and stuff like that. So yeah, I’d say 15–20 plus blog posts, all the writings…
KN: Which also consumes a lot of time.
TK: Yes. And the tech world also forgets that a lot of community leaders, they do this out of their free time.
KN: And for free. Like SQL Saturday. All the speakers are doing it for free. Out of curiosity, how long it does it take you to write one blog post?
TK: That really depends because sometimes you just get an idea and you want to put it out of your head, so it’s just like, I don’t know, 3 hours and sometimes it’s just trouble with it, like two weeks, three weeks and then you decide “OK, maybe I just should not get married to that idea”, so it really depends. But yeah, at least 3 hours. The only rule I really stick to is you know, if you feel doing it, do it. It should come from the heart.
KN: How do you prepare yourself for a speech/session?
TK: That’s a funny thing. Usually, I walk in my apartment, talking to myself. Then I envision all the material that I prepare and then I sort of do a dry test with the code and again talk to myself, describing the code and blah-blah-blah. So basically, I’m talking to myself, that’s how I prepare.
KN: OK, so what is first, you just look for a topic and the content, talk to yourself and prepare some slides and demos?
TK: I usually have too many ideas and too little time, so this is the problem. Once I decide on one, then I just start collecting the ideas, the material, testing, so at the end when I’m 70% finished or satisfied, I’ll just start to do little updates, incremental updates. And essentially, I’ve never done one session twice, to be honest. So every time I try to add a change or correct the code, change the slide, add this, remove the parts that don’t work, just stuff like that, so essentially, even if it’s the same topic or literally the same presentation, it’s not the same. At least 10–15% is changed.
KN: Ah, I thought that you create a completely new session. Because obviously you can change slightly because you have experience after your first presentation in front of an audience, but then there’s a lot of people if you travel across Europe, there’s a lot of people who haven’t heard your session.
TK: Yeah, to be honest, I don’t really… From that perspective OK, but I don’t really consider that a problem, so I just go through sitting at home and “OK, I’ll remove this, I’ll add this, I’ll remove that, I’ll skip this code, I’ll add this one”.
KN: So every time you tweak your session.
TK: Exactly. So it’s like it’s unfinished work to be honest.
KN: So if you do that kind of things, do you change your slides at the last minute?
TK: No, this is something I’ve learned the hard way so I don’t do that. Again, ever.
KN: What kind of conference would you recommend to people?
TK: Depends on the topics that they are interested in, but in general I would say the conferences where you could speak with the presenters or with the companies are the best ones, because you basically get the knowledge from the field, the insights from the people, and in my personal opinion this is probably the best way of networking, getting to know people and stuff like that.
KN: Not only knowledge, but you can extend your network.
TK: Yes, and also again, speaking with a professional who has invested a lot of time preparing a particular slide, a particular presentation should also be considered.
KN: So what kind of hints would you give to young people if they want to start working in the IT market if they haven’t started working there yet?
TK: First of all you need to listen to the heart and if it doesn’t sound right, then just don’t do it. If it sounds right, then yeah, you are basically on the right path. And this is the only real advice that I usually give, also for instance to my daughter. Listen to your heart. If you’re doing it out of passion, then you’re doing it right. Because Steve Jobs once said: “once you’re at the end of the road or at the solution, then you realize how to connect the dots back to the start”. Once you are on this path, you don’t really know. You feel if you’re going in the right direction, you get hints, you get some semi results but only when you arrive at the end, you can connect the dots. So basically listen to your heart.
KN: You need to feel it, you need to love it, and that helps a lot. How did you start your journey?
TK: Oh, I don’t know. I remember my mum when I was probably like 6–7 years old. She introduced me to a course on programming with I think it was like Spectrum ZX and we were… Honestly, I don’t remember but I was like 7 years old, so this is how I got introduced into this big world of passion.
KN: So that was your first contact with a computer. And what was the first connect or what was the first job when you started?
TK: I think all the next jobs were it was permanent or it was just you know some summer job or whatever had to do something with computers. Honestly, I don’t remember when was the first one, but I can say it all had some relations to computers.
KN: They all have something in common, computers and IT. OK and the last question: what about your work/life balance? You do all these things for the community…
TK: Don’t go there, come on…
KN: We all have the same, I would say, maybe not a problem but “challenge”.
TK: Yeah, that’s a nice perspective. It is a challenge balancing all the community work, all the things that basically drive you, and all of the work that you need to do plus personal, private life. So it is a balance. And if you are mastering it, then it’s easier. If you have problems, then it’s not that easy. So basically yeah. But I think once you realize it, it’s much easier that “OK, I have a passion for that”, that instead of sitting in front of the television, you code something or play with technology. But you know, I know you’ve been there, I’ve been there, we are all basically in the same situation, so it’s hard to balance both, yeah. And there is no formula to that.
KN: No formula, no recipe. OK, thank you very much Tomaž for this conversation.
TK: Thank you for having me and I wish you a pleasant stay in Slovenia and at the SQL Saturday Slovenia.
KN: Thank you very much.
TK: And thanks for being a speaker and coming here. It was really nice having you.
KN: Thank you for having me and for this great and fantastic hospitality here. I really love coming back here over and over again. And this year being with my daughter.
TK: Yeah, she’s beautiful. Thank you!
KN: Thanks!
Sylvia: Tomaz, thank you very much for your kind words!
Useful links
Tomaz’s profiles: Twitter | LinkedIn
Tomaz’s blog: Blog
Tomaz’s book: SQL Server 2017 Machine Learning Services with R: Data exploration, modelling, and advanced analytics
Related events: SQL Saturday Ljubljana
About author
You might also like
ASF 031: Paul Andrew interview
Introduction Paul Andrew. Principal consultant and architect at Altius specialising in big data solutions on the Microsoft Azure cloud platform. Data engineering competencies include Azure Data Factory, Data Lake, Databricks,
ASF 026: Aaron Bertrand interview
Introduction Aaron Bertrand is a Product Manager at SentryOne, with industry experience dating back to Classic ASP and SQL Server 6.5. In his spare time, he is either playing volleyball,
ASF 021: Guy in a Cube interview (part 1)
Introduction Guy in a Cube is all about helping you master business analytics on the Microsoft Business analytics stack to allow you to drive business growth. They are just two
0 Comments
No Comments Yet!
You can be first to comment this post!