DP-203 Data Engineering on Microsoft Azure – Design and Develop Data Processing – Azure Databricks Part 3

  • By
  • June 23, 2023
0 Comment

7. Lab – Reading a CSV file

Now, in this chapter, we’ll see how to process our log CSV file. So, one of the most important aspects when it comes on to any service that we have seen. So, from the exam perspective, you should be able to understand how to go through CSV files, how to work with your Jsonbased files, and how to work with Park ABS files. That’s why I’ve covered all of these different types of files in each particular section. So we have to do the same over here as well. So, firstly, I’ll close all the cells that I have open. Now here what I can do. I can go on to the menu option that is available here, and I can click on Upload Data. So now I can upload my log CSV file onto Azure databricks.

So, Azure Databricks actually has this underlying data bricks file system in place. So if you want to work with files locally, you can do so you can actually upload your files and you can work with them. Yes, as your data bricks can also connect onto your Data Lake Gentle storage accounts, onto your Azure Normal storage accounts, and you can also create mount points onto those storage accounts. But we’ll see that a little bit later on.

So here, I’ll just click on this just so that I can browse for my file. So my log CSV file I’ll go onto next. And here it’s giving the way that you can now access this file. So, if you’re working in Python Spark, if you’re working in R, if you’re working in Scala, you can just copy this particular statement. So let me copy this. I’ll click on Done, I’ll remove everything in the server and let me place it here. So, we have our databricks file system, and here we have our log CSV file.

So there are some folders in between. We’ve already seen the statement before, wherein we can load a CSV file. The format we are mentioning is CSV, and we are using the Spark context to read our file. So here we can also show the contents of the data frame. So let me run this. We can also do a display of our data frame. Here we can see again that our column names is coming as a row in our data frame. So we can change this. I can take the option. So let me take this. It’s the same URL. So I’ll copy these two statements to create a new data frame. Yeah, I’m mentioning that the header is true. That means the first row is having our column names. Let me run this. And we can now see our data frame being properly displayed. So in this chapter, I want to explain to you the concept on how you can read your CC files. The same thing, but at the same time. We’ve also been introduced onto the databricks file system.

8. Databricks File System

So just want to quickly cover some aspects when it comes on to the data bricks file system so your workspace gets a databricks file system. This is abstraction layer on top of the scalable object storage so only the covers you are getting object storage which is scalable in nature. If you want to interact with that object storage you have the databricks file system. So here you can store your objects using directories and the normal file semantics. These files also persist if the cluster is terminated. So if you terminate your cluster and if you recreate the cluster you can actually have access onto those files. The default storage location is called dBFS root.

Now there are some predefined route locations so we have the file store, this is used for the imported data files, the generic plots, the uploaded libraries, you have the databricks data sets, these are used for some sample public data sets you have user hype warehouse this is the data and the metadata for non external hype tables. So here if I go on to one of my files so there are some magic commands in place to actually look at the database file system so just go on to the cell itself. So this is the magic command and LS is to list all of the contents. So I’ll just run the cell so here you can see the path on to the database file system and what is the name and if you want to create a new directory so here we can create a new directory and then we can again list the contents. So here’s we can see our new directory in place. So just want to give you some more ideas when it comes on to the databricks file system.

9. Lab – The SQL Data Frame

Now in our continuation, working with data frames, let’s again see some commands when it comes to the SQL API that is available on top of your data frames, on top of your RDDs. So again, I’m reading my file here. If you only want to select some columns, you only want to see some columns in that particular particular data frame. So here, let me run this. So here we can see our output in place. So remember that earlier on in our previous chapter, we had created a data frame DF two. So we are reusing that same data frame. So here I am only selecting some of the columns. Now we can also create a data frame which will actually infer the schema. So, if I look at my data frame so, let me do one thing, let me print the schema of the data frame.

So here in terms of the schema, we can see the ID is a string and the time is also a string. But we want the spark to actually infer the schema based on the underlying data. So let me copy this, let me run the cell. Now we can see that the ID is coming up as an integer and the time is coming up has a timestamp if you only want to show the rows based on a particular filter. So this is like also having the where condition in place. We can also use the display command as I’ve shown here. So we can see it here where only these status is equal to succeeded. And then finally, if you want to use the group by statement, that’s something that you can do as well. So here it’s grouped by the status. So again, there are different commands that are available to work with your data frames.

10. Visualizations

In this chapter. I just want to have a quick note when it comes to the visualization that is available by default in the notebooks. So here I am displaying my data frame. So the entire data frame is coming in a tabular format. If I scroll down, I have the different visualizations available here if I click on the bar chart. So by default it’s stacking it up against the different IDs. And here I have the resource group and the resource type. You can expand the plot here by dragging this if you go on to the plot options. So by default it is plotting it against the ID. The keys are the resource group and the resource type. Let’s say you want to stack it against the operation name. You can drag the operation name onto the keys and here you can see all of the operation names. It will go ahead and display it again. So here we have account based on the different operation names. So this is the default visualization that you actually get in the notebooks in as your data bricks.

11. Lab – Few functions on dates

In this chapter, I just want to go through a few functions when it comes to working with dates. So if I go back onto our data frame, if I display it back in the tableau format here, I should be able to see the timestamps. So here we do have a column based on the time. So let me take this first set of statements. So what am I doing? Here I am selecting the column of time. Here. I’m using the year function to display the year aspect of the time. The same goes with the month and the same goes with the day of year. So these are the default functions that are available. So to ensure that I can use these date based functions, I’m using the import statement here.

And then I am selecting all of those different columns. So let’s run this. So here I can see the year, the month and the day of year. If you want to give more meaningful names, you can actually use the alias. We’ve seen this early on to give meaningful names on to the columns in the data frame. Let me run this. Yeah, it’s giving now the different column names. And finally, if you want to convert the date onto a particular format, you can use the two date function. Let me run this. So here you can see all of the different dates in place. So in this chapter, just want to go through some important functions when it comes to working with dates.

Comments
* The most recent comment are at the top

Interesting posts

The Ultimate Guide to Mastering Marketing Automation for Email Wizards

Hey there, email aficionados! Welcome to your new favorite read – the one that’s going to turbocharge your email marketing game. You’re about to dive into the captivating world of marketing automation, a place where efficiency meets effectiveness, letting you boost your campaigns without breaking a sweat. Get ready to discover how automation can not… Read More »

Master YouTube Marketing with These 10 Powerful Steps

Welcome to the dynamic world of YouTube marketing! Whether you’re a seasoned pro or just getting started, harnessing the power of YouTube can significantly boost your brand’s visibility and engagement. With over 2 billion monthly active users, YouTube offers a vast audience for your content. But how do you stand out in such a crowded… Read More »

Instagram Marketing 101: From Profile to Engagement

Hey there, Instagram enthusiast! Whether you’re a newbie or a seasoned social media guru, you probably already know that Instagram is one of the most powerful tools in your marketing arsenal. With over a billion monthly active users, it’s a goldmine for businesses looking to boost their brand, engage with customers, and drive sales. But,… Read More »

SAP Certification Exams: SAP HANA Fundamentals and Applications

Hey there! In our fast-paced digital world, SAP certifications are here to give your career a serious boost, no matter where you’re starting from. Whether you’re just getting your feet wet or you’re already a pro, these certifications validate your skills and give you the recognition you deserve. The whole idea behind the SAP certification… Read More »

Quantum Computing Fundamentals: Qiskit Certification Exam Explained

Ever heard of computers capable of solving problems in minutes that would take regular computers years? That’s the mind-bending promise of quantum computing! It’s a whole new way of using computers, and it’s opening doors in medicine, materials science, and beyond. Intrigued? If you are curious about quantum computing and want to get hands-on experience… Read More »

Cloud-Native Development: CKAD Certification Exam Preparation Guide

In today’s fast-evolving tech landscape, cloud-native development has become a pivotal skill for IT professionals. The Certified Kubernetes Application Developer (CKAD) certification is a highly sought-after credential that validates your ability to design, build, and run applications on Kubernetes. This guide will walk you through everything you need to know to prepare for the CKAD… Read More »

sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |