Setting up Code Repository for Azure Data Factory v2

Setting up Code Repository for Azure Data Factory v2

In this blog post I want to quick go through one of useful capabilities that Microsoft provided with version 2 of Azure Data Factory.

As a developer I always want to use code repository to keep all my changes, to manage tasks, branches, share the code with a team and simply… keep it in safe place. That works very well for many years for various code of applications as well as database projects (SSDT). But how to use code repository for tools who shares UI via browser as ADFv2 does? Doing a copy of exported ARM templates and committing them manually to code repository isn’t efficient, repeatable and no-errors burdened process, after all.

Attach project to Git repository

We can attach a code repository to the Azure Data Factory v2. You can do that for existing factory or for new one and let’s assume that you already have got a Git repository. Microsoft supports Git in Azure DevOps (formerly VSTS) only as GitHub repositories would be enabled in GA version of factory.

You can start doing it from two places:
1) Main dashboard – “Overview”. Click “Set up Code Repository” button (first from the right)
2) Go to design mode by selecting “Author” button and click “Data Factory” > “Set up Code Repository” (top-left corner)

ADFv2 – Overview dashboard

Set up Code Repository from Author section

When we choose “Azure DevOps Git” in Repository Type – the list of accounts available for you will be automatically filled up:

Repository Settings – Azure DevOps Account step

Then you will see all the project inside the project account you have just selected in the field “Project Name”. In this case I’ve chosen “DataServices”:

Repository Settings – Project name step

Subsequently you must select Git repository name. In this step you can create new repository or use an existing one. In this case, I used an existing repository called “RepoTest1”. Having that, do point out to “Collaboration branch” which is your Azure repository collaboration branch that is used for publishing. By default it is “master”, but you can change it if you want to publish resources from other branch.
In “Root folder” you can put the path which will be used to locate all resources of your Azure Data Factory v2, i.e. pipelines, datasets, connections, etc. Leave it as is or specify if you have more components/parts in the project’s repository.

Import existing Data Factory resources to repository

That option allows you to make a initial commit of your current ADF into selected branch in the repository. Select the option when your repository has just been created or target folder is empty. Otherwise, you will get started with an empty project (synced to repo) whereas your existing project is left unassigned.

Leaving the option selected – you can select which branch will be your target of import:

  • Use Collaboration – the “Collaboration branch” will be used as a target (“master” in this case)
  • Create new – you might create new branch
  • Use Existing – select required branch you want use as a target

In my case – I will create new branch “InitAdfCode”:

Repository Settings – all complete

Once you click button “Save” – new branch is creating and one commit containing all resources (as JSON files) will be made. The following picture shows newly created branch and structure of folders.

Newly-created branch

Structure of folders. Familiar? Certainly.

The latest step is to pick up working branch. As I have just created the new one – I want to us it:

Once confirm that – you will see that UI is connected to the selected branch in GIT repository:

At any time you can change working branch selecting required from the list.

Making changes

Whenever you change any object in ADF v2 – Save button is become enabled. Once you click SAVE – all changes have made so far will be committed (PUSH) to the related folder in GIT repository. The only exception from that rule is when delete objects – the change is pushing automatically to the repo:

Azure DevOps – history of changes pushes by ADF v2

Publish

We are able to publish an edited ADF which generate publish branch which contains ARM templates that can be used for deployment purposes. Beforehand you must merge your changes from working branch to the master (or other branch pointed as collaboration branch).
To achieve that – use action Create pull request [Alt+P] (top-left corner) which does nothing but redirect you (in new browser’s tab) to project in Azure DevOps on a site with New Pull Request. Default direction is a “collaboration branch” (i.e. master) which can be the only source you can publish from. I’m not going to cover that topic in this post at all.

Switch assign repo to a different one

You have might noticed that you can have got only one assigned GIT repository. Which is completely fine (you can juggle within branches). But what if you make a mistake or just change your mind and need to re-assign from one GIT repo to another one? Still you can do that, but in that case having only one place where that action is available: Overview dashboard. Basically, the operation is based on remove Git and assign it again, so firstly you must do:

Detach / remove Git (repo)

Removing Git repo from Azure Data Factory is possible only in one place: Overview dashboard. In top-right corner you will see (or not. I haven’t for first time) small Git icon:

Clicking the icon you will gain access to information about attached GIT repository. At the bottom you’ll find “Remove Git” button. Once finish that – you can repeat the assigning process from the beginning.

Thanks for reading!

Previous Last week reading (2018-09-30)
Next Last week reading (2018-10-07)

About author

Kamil Nowinski
Kamil Nowinski 200 posts

Blogger, speaker. Data Platform MVP, MCSE. Senior Data Engineer & data geek. Member of Data Community Poland, co-organizer of SQLDay, Happy husband & father.

View all posts by this author →

You might also like

Azure Databricks 0 Comments

Azure Databricks – create new workspace and cluster

In this post, I will quickly show you how to create a new Databricks in Azure portal, create our first cluster and how to start work with it. This post

Auditing in Azure SQL Data Warehouse

The first article in a series on “Security Intelligence in Azure PaaS” inspired me to write something on auditing in Azure SQL Data Warehouse. To put it simply for anyone

DevOps 25 Comments

Two methods of deployment Azure Data Factory

Azure Data Factory is a fantastic tool which allows you to orchestrate ETL/ELT processes at scale. This post is NOT about what Azure Data Factory is, neither how to use,

21 Comments

  1. Sachin
    December 13, 12:52 Reply

    The list of Azure DevOps Account is not getting listed for me. Can you tell me where could be the problem

    • Kamil Nowinski
      December 13, 23:41 Reply

      Hi Sachin. I had this issue at the beginning on my private Azure account. Once I connected my Azure DevOps Services (formerly Team Services) to AAD (Azure Active Directory) – Azure DevOps Account should be available on your list. Let me know whether that helped.

      • Paul
        January 02, 03:37 Reply

        Hi Sachin, So far my finding is, you can only use ‘Azure DevOps Git’ if your account is associated with VS account (in general your organization account have it). However, if ‘Azure DevOps Git’ option doesn’t work for you then you can choose ‘GitHub’ option and add your github account and repository name which will work perfect.

        And many thanks @Kamil for the nice post.

  2. Steve
    December 27, 12:51 Reply

    Hi Kamil

    Great article – thanks for publishing & sharing.

    On a separate note I can’t figure how to subscribe to SQL Player so I get notifications of new articles? Is there a way to subscribe?

    Many thanks!

    Steve

    • Kamil Nowinski
      December 28, 22:15 Reply

      Hi Steve. Thank you very much for your kind opinion. The blog is fairly young and there is still no option to subscribes new posts. Certainly, this will be changed pretty soon and I will inform you then.

  3. Marco
    April 16, 16:52 Reply

    Hi Kamil
    Thanks for your article. I have a question I haven’t been able to answer by googling. If I want to integrate Data Factory with azure devops, then you need to create a devops project first. When I try to do it, I see .Net, Node.js, PHP, Java. Static Website, Python, Ruby, Go, C and “bring your own code” options. ¿What option should I go for when creating the project?

    Regards

    • Kamil Nowinski
      April 17, 17:41 Reply

      Hello Marco.
      You’re welcome. This article does not address the deployment of ADF with Azure DevOps.
      There is no dedicated template for ADF yet (as I assumed you stuck there building new pipeline), so you can use an empty one. Adding a task – choose ‘Azure Data Factory’ from Marketplace or create a PowerShell script. I will prepare a post about it soon (hopefully).

  4. Maddy
    April 23, 15:45 Reply

    Hi,

    If I remove git repo setting, does it remove repository also or it just remove the setup which we did ?
    Below warning coming while using remove git option which says it remove associated GIT repository:

    This will remove the GIT repository associated with your data factory. Make sure to Publish all the pending changes to data factory service before removing the GIT associated with your data factory to avoid losing any changes.

    Could you please elaborate what will actually happen when we remove git setting.

    Regards,
    Maddy

    • Kamil Nowinski
      April 23, 20:07 Reply

      Hi Maddy,
      Thank you for visiting my blog.
      When you removed an associated repository – it does remove the configuration of that link. Neither delete action won’t be committed to GIT repository, so don’t worry, you will not lose the code. Therefore, you can set up that repository with the same or other ADF again.
      But I admit that the message is slightly confusing.
      It’s about all the changes that you might have been made (in an opened browser) but not saved (committed) yet.

  5. […] how to set up the code repository for newly-created or existing Data Factory in the post here: Setting up Code Repository for Azure Data Factory v2. I would recommend to set up a repo for ADF as soon as the new instance is created. Your life as a […]

  6. Sunita
    July 30, 15:34 Reply

    Thank you for the article. I’m facing facing an issue while trying to integrate my ADF with Azure DevOps Repo. Once I set up the code repository on ADF UI, I see all the folders created in my Azure DevOps GIT repo (pipelines, datasets, linked sets etc) but on my ADF UI, all my objects disappear (pipelines, datasets etc). When I switch back to DataFactory mode, I’m able to view them. Any thoughts?

    • Kamil Nowinski
      July 30, 16:07 Reply

      Thanks for reading! Sunita, do you have any files in those folders created in GIT repo?
      If so, check the folder in ‘root folder’ field. If there are no files at all – you need to import existing code when setting the repository.

  7. isura
    December 04, 10:39 Reply

    Hi Kamil,,

    Thanks for the useful article.

    Would you recommend to maintain multiple in one repository. for a example, have the ADF & SQL server database project in once VS studio project solution ?

    • Kamil Nowinski
      December 04, 16:01 Reply

      There is no clear recommendation here. Both are correct approaches. If you have ADF and MSSQL database as an one scope of the project – one repo would definitely work (I had that in a few projects). Anyway, this might not be the case when the project (or bigger programme) is split among couple/several teams or smaller projects. It also may be different depends on releases frequency and its management. Sorry, I can’t give you a simple answer. I’d go with a single project if you just started with a new one, have few types of components and your team is highly concentrated on this project as a whole.

  8. Pravish
    April 09, 03:10 Reply

    Hi Kamil,
    Great article and really helpful. I have a situation where we are having different repositories and different projects but these repositories are shared between different projects due to close nit integration. In this case, what is the best method to use these repositories with ADF? Will it be ADF per repository or any other suggestion? Many thanks for your help!

    • Kamil Nowinski
      April 10, 14:17 Reply

      Hi Pravish,
      Thank you. An organisation of repositories, project and/or folders are up to you really. ADF (as well as Azure DevOps) is flexible here. You can have one or many Data Factories within the same project – each ADF in its own folder. Who knows, perhaps I will write a blog post about this topic.

  9. Trent
    August 19, 12:42 Reply

    Hi Kamil,
    I’m wondering if it’s possible to have single ADF with multiple DevOps Repositories. I was wanting to have a Repo per project, but a couple of my concerns are, it looks like one ADF can only be associated to one Repo at a time, meaning my co-worker and I can’t use the same ADF while working on separate projects. It also looks like when Publishing regardless of Repo, ADF publishes to adf_publish, so I’m not sure if that publish from one Repo would completely wipe the publish from another Repo. It’s looking like it would be best to have single Repo, and maybe use folders to segregate Project specific Pipelines, Datasets, Linked Services, triggers, etc. I would love your thoughts, and anyone else’s that has experience with this.

    • Kamil Nowinski
      August 19, 17:40 Reply

      Hi Trent.
      It is not a problem. One DevOps project can have multiple code repositories. Each repo can have multiple ADF folders with its code. In that way, when you publish ADF, adf_publish (default) branch will contain one or multiple folders – one per ADF. So, nothing will be wiped, don’t worry. Just give it a try with test project.

  10. […] to the selected repository. If you are not sure how to achieve that – I described it here: Setting up Code Repository for Azure Data Factory v2. As a developer, you can work with your own branch and can switch ADF between multiple branches […]

Leave a Reply