Azure data lake store Powershell

Provision Data Lake Store with files by using Powershell

In the modern datawarehouse environment a data lake store is often used for storage, especially in big environments. Storage in Azure is not expensive, so think of it as a staging environment for example where you can load all of your data into. In Azure SQL Datawarehouse you can select the data directly from your files in de data lake store by using external tables(Polybase). There are multiple ways to load your data in a data lake store. With this blog I wanted to show you how it can be done by using Powershell.

At first we have to make a connection to Azure. In this example I use a simple login on runtime. However if you want to automate this, you have to change this by using a profile or context file where the credentials are stored.

With the login command, a pop-up will show where you can enter your credentials to login. Next, I set some variables which I need for this script to work.

In my database I have the stored the filepath for my superhero files. This is the location where my local files are stored. In the code above I query my database so I can set the $LocalFilePath variable. So, this is for the source, next up the destination. I have to tell my script in which data lake store I want to store my files, and in which resourcegroup the data lake store is located.

Because I want a archive folder where I can archive the files that where uploaded I have to set some variables for that as well.

And now where set to go. The first step in this script is that I check whether the data lake store exists or not. If it doesn’t exists, I create one from the script with the give name in the variable.

When the data lake store has been created, we can upload the files into it. Because there are multiple files in the folder I have to loop through it. The files will be placed in a import folder by source. The same file will also be placed in a archive folder. In the archive folder there will be a breakdown by Year/Month/Day/Time.

And that’s it! The final result in the data lake store will be:

As you can see, I have run multiple tests on the same day. That’s why you can see all the timestamps as a separate folder in the archive folder. If you are interested in the code, or if you have any questions/remarks/comments, feel free to contact me!

Leave a Reply

Your email address will not be published. Required fields are marked *