<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" > <channel> <title>Modern Data & AI</title> <atom:link href="https://www.moderndata.ai/feed/" rel="self" type="application/rss+xml" /> <link>https://www.moderndata.ai/</link> <description>A blog on Power BI & Azure Data Platform</description> <lastBuildDate>Mon, 13 Dec 2021 15:53:54 +0000</lastBuildDate> <language>en-US</language> <sy:updatePeriod> hourly </sy:updatePeriod> <sy:updateFrequency> 1 </sy:updateFrequency> <generator>https://wordpress.org/?v=6.1.7</generator> <image> <url>https://www.moderndata.ai/wp-content/uploads/2021/11/blog-icon.ico</url> <title>Modern Data & AI</title> <link>https://www.moderndata.ai/</link> <width>32</width> <height>32</height> </image> <site xmlns="com-wordpress:feed-additions:1">162855274</site> <item> <title>How to prevent concurrent pipeline execution in Azure Data Factory or Azure Synapse Analytics (design #1)</title> <link>https://www.moderndata.ai/2021/12/how-to-prevent-concurrent-pipeline-execution-in-azure-data-factory-or-azure-synapse-analytics-design-1/?utm_source=rss&utm_medium=rss&utm_campaign=how-to-prevent-concurrent-pipeline-execution-in-azure-data-factory-or-azure-synapse-analytics-design-1</link> <comments>https://www.moderndata.ai/2021/12/how-to-prevent-concurrent-pipeline-execution-in-azure-data-factory-or-azure-synapse-analytics-design-1/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Mon, 13 Dec 2021 06:59:00 +0000</pubDate> <category><![CDATA[Data Platform]]></category> <category><![CDATA[Azure]]></category> <category><![CDATA[Azure Data Factory]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=962</guid> <description><![CDATA[<p>Recently, we had to be creative to design a lock system in Azure Data Factory to prevent concurrent pipeline execution. There are actually two different approaches to this challenge! This blog post will describe the first approach and is co-authored by Laura de Bruin. When and why do you want to create this ‘lock system’? […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/12/how-to-prevent-concurrent-pipeline-execution-in-azure-data-factory-or-azure-synapse-analytics-design-1/">How to prevent concurrent pipeline execution in Azure Data Factory or Azure Synapse Analytics (design #1)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p><strong>Recently, we had to be creative to design a lock system in Azure Data Factory to prevent concurrent pipeline execution. There are actually two different approaches to this challenge! This blog post will describe the first approach and is co-authored by <a href="https://nl.linkedin.com/in/laura-de-bruin-4874aa103" target="_blank" rel="noreferrer noopener">Laura de Bruin</a>.</strong></p> <h2>When and why do you want to create this ‘lock system’?</h2> <p>In many cases, it can lead to unwanted results if multiple pipeline runs execute in parallel. For example, when a pipeline has an hourly scheduled trigger like the screenshot below, and the previous run takes more than an hour to complete, the new instance will be triggered and nothing will prevent it from running in parallel with the already running instance!</p> <figure class="wp-block-image size-full is-resized"><img decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2021/12/image.png" alt="" class="wp-image-969" width="400" srcset="https://www.moderndata.ai/wp-content/uploads/2021/12/image.png 437w, https://www.moderndata.ai/wp-content/uploads/2021/12/image-202x300.png 202w" sizes="(max-width: 437px) 100vw, 437px" /></figure> <p>For <strong>scheduled triggers</strong>, there is nothing out-of-the-box that can help you to prevent concurrent pipeline runs. For <strong>tumbling window triggers</strong> there is a <code>maxConcurrency</code> property, but keep in mind that this will create a queue/backlog of pipeline runs. It will not cancel any pipeline runs. It depends on your use case if you really want that behavior. The <a href="https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger?tabs=data-factory%2Cazure-powershell#tumbling-window-trigger-type-properties" target="_blank" rel="noreferrer noopener">Docs </a>describe it like this:</p> <blockquote class="wp-block-quote"><p>The number of simultaneous trigger runs that are fired for windows that are ready. For example, to back fill hourly runs for yesterday results in 24 windows. If <strong>maxConcurrency</strong> = 10, trigger events are fired only for the first 10 windows (00:00-01:00 – 09:00-10:00). After the first 10 triggered pipeline runs are complete, trigger runs are fired for the next 10 windows (10:00-11:00 – 19:00-20:00). Continuing with this example of <strong>maxConcurrency</strong> = 10, if there are 10 windows ready, there are 10 total pipeline runs. If there’s only 1 window ready, there’s only 1 pipeline run.</p></blockquote> <h2>Introducing two locking system approaches</h2> <p>We designed two different locking system approaches. The <strong>first design</strong> checks the pipeline run history to verify if any of the runs are still in progress. We will explain this design in detail in this blog post.</p> <figure class="wp-block-image size-full is-resized"><img fetchpriority="high" decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-2.png" alt="" class="wp-image-978" width="622" height="162" srcset="https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-2.png 622w, https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-2-300x78.png 300w" sizes="(max-width: 622px) 100vw, 622px" /></figure> <p>The s<strong>econd design</strong> works with a global parameter that holds the ‘lock’. At the start of the pipeline, we verify the value of the lock and only continue if it is not locked (false). We then change the value to locked (true) and after all activities have run successfully we change the value back to not locked (false). If meanwhile any other pipeline starts, it will find the value in locked state (true) and stop execution on the spot. </p> <p><em><strong>Important note</strong>: only the first design will currently work in Azure Synapse Analytics pipelines, as global parameters are not available there! Please help and vote for <a href="https://feedback.azure.com/d365community/idea/eaa47674-0442-ec11-a819-000d3ae2b5ca" target="_blank" rel="noreferrer noopener">this existing idea</a> to urge the development team to add global parameters in Azure Synapse Analytics.</em></p> <figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-design-2-global-parameter-1024x145.png" alt="" class="wp-image-981" width="1024" height="145" srcset="https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-design-2-global-parameter-1024x145.png 1024w, https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-design-2-global-parameter-300x43.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-design-2-global-parameter-768x109.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/12/ADF-ASA-Pipeline-Locking-System-A-design-2-global-parameter.png 1135w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <p>This blog will continue now to describe the first design. An upcoming blog post will explain the second design in more detail.</p> <h3>Design #1: Create the ‘is_pipeline_running’ pipeline</h3> <p><em>Note: to copy the example pipeline below, we assume you have an Azure Key Vault available. If you want an alternative, you can also create variables in the ADF pipeline. But, having those values in the Key Vault makes it easier to deploy your solution to other environments.</em></p> <p>Prerequisites:</p> <ul><li>Make sure that the ADF (or Synapse) resource has read permissions on the secrets in the vault.</li><li>Make sure that the ADF (or Synapse) resource has read permissions on its own resource. This might sound strange, but without explicitly granting this IAM/RBAC the pipeline will not be able to call the ADF REST API.</li><li>If you are in Azure Data Factory: add a Global Parameter for the Key Vault URL <code>keyVaultUrl</code>.</li><li>If you are in Azure Synapse Analytics pipelines: you can’t use global parametes yet, so make sure you replace those in the expressions with a variable or ‘hard-code’ the url.</li></ul> <p>Create a pipeline with the name ‘is_pipeline_running’, and add these parameters:</p> <figure class="wp-block-image size-full"><img decoding="async" width="614" height="232" src="https://www.moderndata.ai/wp-content/uploads/2021/12/image-2.png" alt="" class="wp-image-971" srcset="https://www.moderndata.ai/wp-content/uploads/2021/12/image-2.png 614w, https://www.moderndata.ai/wp-content/uploads/2021/12/image-2-300x113.png 300w" sizes="(max-width: 614px) 100vw, 614px" /></figure> <p>With the following activities:</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="945" height="280" src="https://www.moderndata.ai/wp-content/uploads/2021/12/image-1.png" alt="" class="wp-image-970" srcset="https://www.moderndata.ai/wp-content/uploads/2021/12/image-1.png 945w, https://www.moderndata.ai/wp-content/uploads/2021/12/image-1-300x89.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/12/image-1-768x228.png 768w" sizes="(max-width: 945px) 100vw, 945px" /></figure> <p></p> <ul type="1"><li>Web activity ‘getSubcriptionID’<ul><li>Create a secret in the key vault with your subscription ID value, with name <code>SubscriptionID</code>.</li><li>Get this secret value using a web activity, with this expression for the <em>url </em>property: <code>@concat(pipeline().globalParameters.keyVaultUrl,'secrets/SubscriptionID?api-version=7.0')</code>, and this value for the <em>resource </em>property: <code>https://vault.azure.net</code>.</li></ul></li><li>Web activity ‘getAdfResourceGroupName’.<ul><li>Create a secret in the key vault with the name of the resource group containing the ADF resource, with name <code>adfResourceGroupName</code>.</li></ul><ul><li>Get this secret value using a web activity, with this expression for the <em>url </em>property: <code>@concat(pipeline().globalParameters.keyVaultUrl,'secrets/adfResourceGroupName?api-version=7.0')</code>, and this value for the <em>resource </em>property: <code>https://vault.azure.net</code>.</li></ul></li><li>Web activity ‘Get Pipeline Runs’<ul><li>URL: create the dynamic content to let Azure Data Factory call itself:<br><code>https://management.azure.com/subscriptions/@%7bactivity('getSubscriptionID').output.value%7d/resourceGroups/@%7bactivity('getAdfResourceGroupName').output.value%7d/providers/Microsoft.DataFactory/factories/@%7bpipeline().DataFactory%7d/queryPipelineRuns?api-version=2018-06-01</code><em>.</em></li><li>Body: Use this code to validate if the regarding pipeline is running (status: in progress or queued) since the last QueryRunDays (default 1 day):</li><li><code>{ "lastUpdatedAfter": "@{adddays(utcnow(),int(pipeline().parameters.QueryRunDays))}", "lastUpdatedBefore": "@{utcnow()}", "filters": [ { "operand": "PipelineName", "operator": "Equals", "values": [ "@{pipeline().parameters.PipelineName}" ] }, { "operand": "Status", "operator": "In", "values": [ "InProgress", "Queued" ] } ] } </code><br>The settings of this activity are as follows:</li></ul></li></ul> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="442" height="429" src="https://www.moderndata.ai/wp-content/uploads/2021/12/image-3.png" alt="" class="wp-image-973" srcset="https://www.moderndata.ai/wp-content/uploads/2021/12/image-3.png 442w, https://www.moderndata.ai/wp-content/uploads/2021/12/image-3-300x291.png 300w" sizes="(max-width: 442px) 100vw, 442px" /></figure> <ul><li>Filter Running pipelines<br>The running pipelines must meet the following conditions:<ul><li>The runid is not the same as the runid of the current pipeline run.</li><li>The pipeline contains status ‘InProgress’ or ‘Queued’: <code>@and(not(equals(item().runId,pipeline().parameters.ThisRunId)),or(equals(item().status,'InProgress'),equals(item().status,'Queued')))</code></li></ul></li></ul> <ul><li>If condition<ul><li><code>True</code>: if the previous pipeline is still running, raise an error by creating a lookup activity with this query. <code>RAISERROR('@{concat('Provided pipeline name (',pipeline().parameters.PipelineName,') still has a run in progress or queued given the query range parameters set in the properties table.')}',16,1)</code>.</li><li><code>False</code>: if the previous pipeline is not running there is no action and the next activity of the pipeline could start, hence we do nothing.</li></ul></li></ul> <h3> Design #1: How to use the ‘is_pipeline_running’ pipeline</h3> <p>In the pipeline that needs the ‘lock system’, add an <em>Execute Pipeline</em> activity that calls the <code>is_pipeline_running </code>pipeline. Make sure it is the first activity of the pipeline.</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="319" height="281" src="https://www.moderndata.ai/wp-content/uploads/2021/12/image-4.png" alt="" class="wp-image-974" srcset="https://www.moderndata.ai/wp-content/uploads/2021/12/image-4.png 319w, https://www.moderndata.ai/wp-content/uploads/2021/12/image-4-300x264.png 300w" sizes="(max-width: 319px) 100vw, 319px" /></figure> <p>In this activity, use the following expressions for the pipeline parameters:</p> <ul><li>PipelineName: <code>@pipeline().Pipeline</code></li><li>ThisRunID: <code>@pipeline().RunId</code></li><li>QueryRunDays: optional: this value is by default <code>-1</code> to look at the pipeline runs since the past 1 day.</li></ul> <h3> Design #1: Result</h3> <p>The new pipeline will query the pipeline run history (native ADF API call). When any previous pipeline run is still in progress it will raise an error that will surface in the parent pipeline. By default this will fail the parent pipeline as well, immediately stopping any further execution, and that is exactly the result we are aiming for! <em>If you want, you can catch the activity failure and handle it more graciously.</em></p> <p>Check out the full pipeline JSON below. And remember to look out for the next blog post that will describe the second approach!</p> <script src="https://gist.github.com/DaveRuijter/81c5a3fc1727c05f8072d553ba13701f.js"></script> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/12/how-to-prevent-concurrent-pipeline-execution-in-azure-data-factory-or-azure-synapse-analytics-design-1/">How to prevent concurrent pipeline execution in Azure Data Factory or Azure Synapse Analytics (design #1)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2021/12/how-to-prevent-concurrent-pipeline-execution-in-azure-data-factory-or-azure-synapse-analytics-design-1/feed/</wfw:commentRss> <slash:comments>8</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">962</post-id> </item> <item> <title>How to automatically backup your Azure Data Lake(house)</title> <link>https://www.moderndata.ai/2021/10/how-to-automatically-backup-your-azure-data-lakehouse/?utm_source=rss&utm_medium=rss&utm_campaign=how-to-automatically-backup-your-azure-data-lakehouse</link> <comments>https://www.moderndata.ai/2021/10/how-to-automatically-backup-your-azure-data-lakehouse/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Fri, 22 Oct 2021 04:57:19 +0000</pubDate> <category><![CDATA[Data Platform]]></category> <category><![CDATA[Azure]]></category> <category><![CDATA[Azure Data Lake]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=905</guid> <description><![CDATA[<p>Out of the box, Azure Data Lake Storage Gen2 provides redundant storage. Therefore, the data in your Data Lake(house) is resilient to transient hardware failures within a datacenter through automated replicas. This ensures durability and high availability. In this blog post, I provide a backup strategy on how to further protect your data from accidental […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/10/how-to-automatically-backup-your-azure-data-lakehouse/">How to automatically backup your Azure Data Lake(house)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p>Out of the box, Azure Data Lake Storage Gen2 provides redundant storage. Therefore, the data in your Data Lake(house) is resilient to transient hardware failures within a datacenter through automated replicas. This ensures durability and high availability. In this blog post, I provide a backup strategy on how to further protect your data from accidental deletions, data corruption, or any other data failures. This strategy works for Data Lake as well as Data Lakehouse implementations. It uses native Azure services, no additional tools, software, or licenses are required.</p> <h2>How about the High Availability features?</h2> <p>The Data Lake uses Azure Storage, which stores multiple copies of the data so that it is protected from planned and unplanned events, including transient hardware failures, network or power outages, and massive natural disasters. This redundancy ensures that your storage account meets its availability and durability targets even in the face of such failures. <strong>This redundancy however does not mean the Data Lake is protected against data failures like corruption or accidental deletion.</strong> That is why we need to take additional measures, described in this Data Lake Backup Strategy.</p> <h2>Overview of measures</h2> <p>The Data Lake Backup Strategy uses a combination of techniques and features to get the best functionality and performance for an acceptable cost:</p> <ol><li><a href="https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-container-overview">Soft delete for containers</a></li><li><a href="https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-blob-overview">Soft delete for blobs</a></li><li>Resource lock on the Storage Account</li><li><a href="https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#--query-an-older-snapshot-of-a-table-time-travel">Delta Lake time travel</a></li><li>Self-built automated backup process (copying a part of the Data Lake data to a secondary location)</li><li>Backup ‘vault’ to store subsets of the Data Lake indefinitely</li><li><a href="https://docs.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview">Lifecycle Management Policies</a></li><li><a href="https://docs.microsoft.com/en-us/azure/storage/common/manage-account-default-access-tier">Storage Account access tiers (hot/cool/archive)</a></li></ol> <h2>Additional info / limitations</h2> <ul><li>The strategy implements all measures on the production Data Lake. The other environments (dev/test/acc) will not implement the self-built automated process of copying (a part of) the Data Lake data to a secondary location, to lower costs.</li><li><a href="https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-blob-overview">Soft delete for blobs</a> in accounts that have the hierarchical namespace feature enabled is currently in public preview. The Data Lake has the hierarchical namespace feature enabled. To enroll in this public preview, please see the link to a form on <a href="https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-blob-overview">the Soft delete for blobs page in the Microsoft Docs</a>. <em>Please see the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.</em></li><li>To increase the availability guarantees of the Data Lake, it is possible to have the secondary Storage Account for backups in a different region (e.g. North Europe instead of West Europe).</li><li>This strategy considers that all resources are secured with Azure Virtual Network and are connected via Private Endpoints, driving the solution design of this strategy.</li></ul> <h1>Detailed implementation description</h1> <h2>Soft delete for containers</h2> <p>The first aspect of this strategy will be to enable the <strong>soft delete for container</strong> feature of the Azure Storage Account of our Data Lake (if not already enabled). This will help as a ‘first layer of defense’ to protect the Data Lake. To learn how to enable container soft delete, see <a href="https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-container-enable" target="_blank" rel="noreferrer noopener">Enable and manage soft delete for containers</a>.</p> <p>More info from the Azure Docs: <em>Container soft delete protects your data from being accidentally deleted by maintaining the deleted data in the system for a specified period of time. During the retention period, you can restore a soft-deleted container and its contents to the container’s state at the time it was deleted. After the retention period has expired, the container and its contents are permanently deleted.</em></p> <p><em>When you enable container soft delete, you can specify a retention period for deleted containers that is between 1 and 365 days. The default retention period is 7 days. During the retention period, you can recover a deleted container by calling the <strong>Restore Container</strong> operation.</em></p> <p><em>When you restore a container, the container’s blobs and any blob versions and snapshots are also restored. However, you can only use container soft delete to restore blobs if the container itself was deleted. To restore a deleted blob when its parent container has not been deleted, you must use blob soft delete or blob versioning.</em></p> <p>Storage accounts with a hierarchical namespace enabled for use with Azure Data Lake Storage Gen2 are also supported.</p> <p>I recommend using a short retention period to better understand how the feature will affect your bill. The minimum recommended retention period by Microsoft is <strong>seven days</strong>, which seems good to me.</p> <p>Data in <strong>deleted </strong>containers is billed at the same rate as active data. I don’t expect it will go unnoticed when an entire container of the Data Lake is deleted <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f602.png" alt="😂" class="wp-smiley" style="height: 1em; max-height: 1em;" />, I even don’t expect it to happen at all, so I expect a zero or really low-cost impact of enabling this particular feature for the Data Lake Storage Account.</p> <h2>Soft delete for blobs</h2> <p>And as a companion to the soft delete for containers, we will also enable the soft delete for blobs feature of the Azure Storage Account of our Data Lake (if not already enabled). This will also help as a ‘first layer of defense’ to protect the Data Lake, <strong>especially for the data in the first ‘raw’ zone of the lake (05_store)</strong> as that data is not stored in Delta format and we can’t use the Delta table time travel feature for that data to recover earlier versions. </p> <p>I recommend using a short retention period to better understand how the feature will affect your bill. The minimum recommended retention period by Microsoft is <strong>seven days</strong>, which seems good to me.</p> <p>To learn how to enable blob soft delete, see <a href="https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-blob-enable" target="_blank" rel="noreferrer noopener">Enable and manage soft delete for blobs</a>.</p> <p> More info from the Azure Docs: <em>Blob soft delete protects an individual blob, snapshot, or version from accidental deletes or overwrites by maintaining the deleted data in the system for a specified period of time. During the retention period, you can restore a soft-deleted object to its state at the time it was deleted. After the retention period has expired, the object is permanently deleted.</em></p> <p>All soft deleted data is billed at the same rate as active data. You will not be charged for data that is permanently deleted after the retention period elapses.</p> <p><strong>Soft delete for blobs in accounts that have the hierarchical namespace feature enabled (our Data Lake!) is currently in public preview</strong>. To enroll in this public preview, please see the link to a form on the <a href="https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-blob-overview" target="_blank" rel="noreferrer noopener">Soft delete for blobs page in Microsoft Docs</a>.</p> <p><em>Please see the Supplemental Terms of Use for Microsoft Azure Previews for legal terms that apply to Azure features that are in beta, preview, or otherwise not yet released into general availability.</em></p> <p><strong>Important! This setting probably causes the storage costs to go up.</strong></p> <h2>Delta table time travel</h2> <p>Another ‘first layer of defense’ in our strategy is the time travel feature of the Delta tables in our Data Lake. Except for the first ‘raw’ zone, all zones use Delta tables. With this time travel feature, Delta automatically versions the data that you store, and you can access any historical version of that data within a configured retention period. Please check out the <a href="https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batch#data-retention">Microsoft Docs</a> if you want to learn more about the time travel feature in Delta tables.</p> <p>I recommend using a short retention period to better understand how the feature will affect your bill. Similar to the soft delete for blob retention period, I think seven days seems good to me. The default Delta time travel retention period is 7 days (because the default <code><code>delta.deletedFileRetentionDuration</code> </code>value is 7 days).</p> <p><em>Note: Databricks recommends that you set a retention interval to be at least 7 days because old snapshots and uncommitted files can still be in use by concurrent readers or writers to the table. If <code>VACUUM</code> cleans up active files, concurrent readers can fail or, worse, tables can be corrupted when <code>VACUUM</code> deletes files that have not yet been committed. You must choose an interval that is longer than the longest running concurrent transaction and the longest period that any stream can lag behind the most recent update to the table.</em></p> <p>The command to access a different version of the data is very straightforward. For example, in SQL it’s like this:</p> <pre class="wp-block-code"><code>SELECT count(*) FROM my_table TIMESTAMP AS OF "2021-01-01 01:30:00.000"</code></pre> <h3>Run VACUUM</h3> <p>As general advice, it is smart to run the <code>VACUUM</code> command for Delta tables. Azure Databricks <em>does not</em> automatically trigger <code>VACUUM</code> operations. The <code>VACUUM</code> command recursively travels through the directories associated with the Delta table and removes data files that are no longer in the latest state of the transaction log for the table and are older than the retention threshold.</p> <p>To learn more about how to execute the <code>VACUUM</code> command, you can check out the <a href="https://docs.microsoft.com/en-us/azure/databricks/delta/delta-utility#delta-vacuum" target="_blank" rel="noreferrer noopener">Azure Docs</a>. An example Python script that updates all tables in the Databricks databases matching a specified filter is shown below. <em>Note: this script also executes the <code>COMPUTE STATISTICS</code> command on each table.</em> And it has a commented line of code to show </p> <script src="https://gist.github.com/DaveRuijter/135b11542774894f7132d5d649b9d174.js"></script> <h2><span style="font-size: revert; color: initial;">Resource lock on the Storage Account</span></h2> <p>It is strongly recommended to lock all your storage accounts with an Azure Resource Manager lock to prevent accidental or malicious deletion of the entire resource. Check out the <a href="https://docs.microsoft.com/en-us/azure/storage/common/lock-account-resource?tabs=portal" target="_blank" rel="noreferrer noopener">Azure Docs</a> to learn how to do that.</p> <p>There are two types of Azure Resource Manager resource locks:</p> <ul><li>A <strong>CannotDelete</strong> lock prevents users from deleting a storage account, but permits reading and modifying its configuration.</li><li>A <strong>ReadOnly</strong> lock prevents users from deleting a storage account or modifying its configuration, but permits reading the configuration.</li></ul> <p>I recommend applying the <strong style="font-size: revert; color: initial;">CannotDelete</strong><span style="font-size: revert; color: initial;"> </span>lock, as the other type can cause problems with accessing the Storage Account because the List Keys action will be blocked.</p> <h2>Automated Backup Process – to copy data to a secondary account</h2> <p>This Data Lake Backup Strategy can’t rely on the soft delete and Delta time travel features alone. Those features are great as a ‘first line of defense’ but we want more recovery options.</p> <p>In the future, I expect Microsoft will have native support for features like Blob versioning and snapshots in Azure Data Lake Storage Gen 2 (Storage Account with hierarchical namespace enabled). But for now, those features are not available. </p> <p>Because of that, we currently need our own method of versioning the Data Lake. A ‘simple’ copy of the data stored in a separate location will do fine. We will keep multiple of those copies over time, and automatically remove the oldest copy after a certain period. An efficient method to copy data from one storage account to another – with both accounts securely connected via Azure Virtual Network and Private Endpoints – is to use the AzCopy command-line utility.</p> <p>AzCopy uses <strong>server-to-server</strong> APIs, so data is copied directly between storage servers. These copy operations don’t use the network bandwidth of your computer. Other methods – including the Azure Data Factory copy activity and Azure Databricks notebooks – copy data through the Azure Virtual Machine (or container) in the background and this will severely limit performance and increase cost. The client where you run AzCopy on must have network access to both the source and destination storage accounts, to enable these server-to-server APIs. But, running the AzCopy will not have a substantial impact on the client. We will use the Azure DevOps agent machine as our client.</p> <p>There are multiple ways to authenticate to Azure Storage with the AzCopy utility. We will append a SAS token to the URL of source and destination directories.</p> <pre class="wp-block-code"><code>./azcopy.exe copy $SrcFullPath $DstFullPath --block-blob-tier Cool --recursive --overwrite=ifsourcenewer --log-level=NONE --include-after $IncludeAfterDateTimeISOString</code></pre> <p>Check out t<a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy-copy" target="_blank" rel="noreferrer noopener">his page</a> for more info on the <code>azcopy copy </code>command.</p> <h3>Daily backup</h3> <p>For the Delta tables in our Data Lake, the time travel feature provides a way to restore older data versions, up to a week later. Hence, <strong>no need to have a daily backup for the Delta tables data</strong>. But the ‘raw’ zone in the Data Lake only has the ‘soft delete for blobs’ feature (because it is not stored in Delta tables). Hence, <strong>we need a daily backup of the ‘raw’ zone data</strong>.</p> <p>The AzCopy command has a parameter called <code>--include-after</code>. This will copy only those files modified on or after the given date/time. We will use that to have an <strong>incremental </strong>daily backup, storing only the modified/new files each day, since the day before. This will really help to reduce costs of the data in the backup storage Account, as the ‘raw’ zone in the Data Lake potentially grows quite large (in terms of size and number of files).</p> <p>These daily backups are stored in a separate container in the backup Storage Account, called <code>daily</code>.</p> <h3>Weekly backup</h3> <p>We will also make a copy of the entire Data Lake, on a weekly basis. These weekly backups are stored in a separate container in the backup Storage Account, called <code>weekly</code>. </p> <h3>The Azure DevOps pipelines</h3> <p><strong>Before we dive into the details, I’d like to express a big thank you to Julian Kramer, a great friend at Macaw. He made the YAML pipelines and PowerShell script described here. All credits go to him! It was fun working on these, thanks Julian <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f64f.png" alt="🙏" class="wp-smiley" style="height: 1em; max-height: 1em;" />!</strong></p> <p>We will call the AzCopy utility from a PowerShell script, and that script will be called from an Azure DevOps pipeline job step. This is a convenient way to execute such a script on the Azure DevOps agent machine, and a convenient way to create and manage the schedule(s) of the script.</p> <p>The diagram below shows how the YAML and PowerShell files relate:</p> <figure class="wp-block-image size-full is-style-default"><img loading="lazy" decoding="async" width="881" height="821" src="https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1.png" alt="" class="wp-image-913" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1.png 881w, https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1-300x280.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1-768x716.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1-370x345.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1-570x531.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1-770x718.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/10/Backup-Strategy-DevOps-pipelines-1-622x580.png 622w" sizes="(max-width: 881px) 100vw, 881px" /></figure> <p>It would simply take too much space to post all these files here entirely, so I will just refer to the Gists in my GitHub. These files are environment-specific anyways, but I hope they can serve as an example for you, to <strong>make your own versions</strong> of these YAML pipelines and PowerShell script.</p> <ul><li><a href="https://gist.github.com/DaveRuijter/a3b0ae3771cd8ce43e5aa87b5504bf69" target="_blank" rel="noreferrer noopener">pipeline-backup-daily.yml</a></li><li><a href="https://gist.github.com/DaveRuijter/82f2e9f92ffb88d008af1f1441d7c537" target="_blank" rel="noreferrer noopener">pipeline-backup-weekly.yml</a></li><li><a href="https://gist.github.com/DaveRuijter/25dbba148ae0212d942d602f92d247ad" target="_blank" rel="noreferrer noopener">stage-backup-dls.yml</a></li><li><a href="https://gist.github.com/DaveRuijter/6f90da476d737906afe7533d8a315d8e" target="_blank" rel="noreferrer noopener">job-backup-dls.yml</a></li><li><a href="https://gist.github.com/DaveRuijter/b530c84020950c729cb9e15efbfe04ce" target="_blank" rel="noreferrer noopener">backup-dls.ps1</a></li></ul> <h2>Data lifecycle management policies</h2> <p>Lifecycle management uses rules to automatically move blobs to cooler tiers or to delete them. It’s possible to create a rule in the Azure Portal, using a wizard or by editing the JSON code of the rule(s) directly. If you want to learn more about this feature, please check out <a href="https://docs.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview">Microsoft Docs</a>.</p> <h3>Backup storage account</h3> <p>The policy below is used on the secondary storage account that contains the backup copies of the Data Lake (copied there using the Automated Backup Process). This will apply retention of 60 days to the weekly backups, and retention of 30 days to the daily (incremental) backups.</p> <script src="https://gist.github.com/DaveRuijter/bb85ac7208d9057c69eae9ce7c10358d.js"></script> <h3>Data Lake storage account</h3> <p>The policy below is used on the primary storage account of the Data Lake. This will ensure new data in the <code>dls/05_store/_archive</code> folder of the lake is automatically assigned to the <code>cool</code> access tier.</p> <script src="https://gist.github.com/DaveRuijter/4b443128e3683e9a3cea37ceeea034d5.js"></script> <h2>Storage Account access tiers</h2> <p>Each Azure Storage account has a default access tier, either hot or cool. Data in the cool tier is significantly cheaper. For the primary Data Lake account, we configure the access tier to <strong>hot</strong>. For the secondary backup account, we configure the access tier to <strong>cool</strong>. Learn more in the <a href="https://docs.microsoft.com/en-us/azure/storage/common/manage-account-default-access-tier">Azure Docs</a>.</p> <h2>Backup vault</h2> <p>Besides the weekly/incremental backups, it is also wise to store certain data indefinitely. Remember, the weekly and incremental backups are not stored indefinitely (they will be automatically removed after the retention period, to lower storage costs).</p> <p>For example, when a data source system is becoming deprecated. And the data source system is no longer accessible. The data ingested up until that moment would be a suitable candidate to store in the backup vault. You take a snapshot of the data and store it forever.</p> <p>For this purpose, the backup storage account has a root folder (container) called <code>vault</code> to facilitate the backups that need to be kept indefinitely.</p> <p>If data needs to be copied to the <code>vault</code> location, this can be done <strong>manually</strong>. Either with the tool Microsoft Azure Storage Explorer or by using an Azure Data Factory copy activity.</p> <h2>Cost</h2> <p>This Data Lake Backup Strategy will incur additional costs in your Azure environment, no surprise there I hope <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f60a.png" alt="😊" class="wp-smiley" style="height: 1em; max-height: 1em;" />. I encourage you to have a clear cost baseline before you implement this strategy. Know how large your data </p> <p>Also, be aware that size reductions (in terms of MB) and reductions in the number of files in your Data Lake(house) can really help to limit the (increase of) cost related to the implementation of this strategy. If you have 1TB in your primary Data Lake storage account, this can easily be 2TB, 10TB, or more in your secondary Backup storage account, depending on your retention period configurations, etc. The cost of 1TB might be well within your budget, but will 1+10TB still be in your budget? Keep an eye out for any size or volume reduction that can be applied in your Data Lake(house), to reduce the cost impact of your backup strategy.</p> <p>The areas where (an increase of) costs are to be expected:</p> <ul><li>Having a <strong>secondary Storage Account for the backups</strong>.</li><li><strong>Enabling soft delete for blobs</strong> on your primary Storage Account.</li></ul> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/10/how-to-automatically-backup-your-azure-data-lakehouse/">How to automatically backup your Azure Data Lake(house)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2021/10/how-to-automatically-backup-your-azure-data-lakehouse/feed/</wfw:commentRss> <slash:comments>3</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">905</post-id> </item> <item> <title>Automatic semantic versioning (bonus: with release notes!)</title> <link>https://www.moderndata.ai/2021/10/automatic-semantic-versioning-in-azure-devops-with-release-notes/?utm_source=rss&utm_medium=rss&utm_campaign=automatic-semantic-versioning-in-azure-devops-with-release-notes</link> <comments>https://www.moderndata.ai/2021/10/automatic-semantic-versioning-in-azure-devops-with-release-notes/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Sun, 03 Oct 2021 20:52:00 +0000</pubDate> <category><![CDATA[Data Platform]]></category> <category><![CDATA[Azure DevOps]]></category> <category><![CDATA[DataOps]]></category> <category><![CDATA[Git]]></category> <category><![CDATA[SemVer]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=862</guid> <description><![CDATA[<p>I am a fan of using semantic versioning (a.k.a. SemVer) for data solutions, following the v1.0.0 pattern. It helps in the communication between team members and stakeholders, by limiting ambiguity and misunderstandings related to the version of your solution’s releases. With semantic versioning, the trick is to increment the version according to the changes you […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/10/automatic-semantic-versioning-in-azure-devops-with-release-notes/">Automatic semantic versioning (bonus: with release notes!)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p>I am a fan of using semantic versioning (a.k.a. <a href="http://semver.org" target="_blank" rel="noreferrer noopener">SemVer</a>) for data solutions, following the v1.0.0 pattern. It helps in the communication between team members and stakeholders, by limiting ambiguity and misunderstandings related to the version of your solution’s releases. With semantic versioning, the trick is to increment the version according to the changes you have made since the latest release. Manually keeping track of that is not an easy task, especially for small teams, without the capacity to have somebody dedicated to this administration task. I found a way to make this a lot easier, leaning on the Pull Request description! And as a bonus, we will create some nice release notes automatically <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p> <figure class="wp-block-image size-full is-resized"><img loading="lazy" decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2021/10/Semver.jpg" alt="" class="wp-image-892" width="239" height="116" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/Semver.jpg 478w, https://www.moderndata.ai/wp-content/uploads/2021/10/Semver-300x145.jpg 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/Semver-370x179.jpg 370w" sizes="(max-width: 239px) 100vw, 239px" /></figure> <h2>Using the Pull Request description</h2> <p>My implementation of automatic semantic versioning relies on the data engineer to select the correct type of change in the Pull Request description.</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="158" height="129" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-3.png" alt="Typ of update. Fix, Feature or Big feature." class="wp-image-868"/></figure> <p>These options of course relate to the Patch/Minor/Major of the SemVer pattern. I think these labels are easier to understand for data engineers because we usually talk about creating a bug fix or feature. <em>Feel free to tweak these labels to your liking, but make sure you update the pipeline code as well.</em></p> <h2>Required Azure DevOps extensions</h2> <p>This setup relies on the Azure DevOps extensions listed below. Please review them carefully, and <span style="text-decoration: underline;">only install them in your organization if they pass your internal standards/governance rules</span> regarding extensions.</p> <ol><li><a href="https://marketplace.visualstudio.com/items?itemName=gittools.gittools" target="_blank" rel="noreferrer noopener">GitTools</a> (we the tool GitVersion that is in this library)</li><li><a href="https://marketplace.visualstudio.com/items?itemName=joachimdalen.pull-request-utils" target="_blank" rel="noreferrer noopener">Pull Request Utils</a></li><li><a href="https://marketplace.visualstudio.com/items?itemName=richardfennellBM.BM-VSTS-XplatGenerateReleaseNotes" target="_blank" rel="noreferrer noopener">Generate Release Notes (Crossplatform)</a></li><li><a href="https://marketplace.visualstudio.com/items?itemName=richardfennellBM.BM-VSTS-WIKIUpdater-Tasks" target="_blank" rel="noreferrer noopener">WIKI Updater Tasks</a></li></ol> <h2>Add the Pull Request template</h2> <p>To automatically serve these options for every Pull Request, I use a template (check out the Docs to learn more about <a href="https://docs.microsoft.com/en-us/azure/devops/repos/git/pull-request-templates?view=azure-devops" target="_blank" rel="noreferrer noopener">Pull Request templates</a>). This is the code for my standard Pull Request:</p> <script src="https://gist.github.com/DaveRuijter/e21a8458dcd3ccd76dc68644f980bb69.js"></script> <p>Add this in a <code>.md</code> file in your git repository in a folder called <code>.azuredevops</code>, and it will be automatically picked up by new Pull Requests (Azure DevOps will scan for Markdown files in that folder).</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="306" height="202" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-4.png" alt="Screenshot of the .azuredevops folder in the repo, with the pull request template in it." class="wp-image-869" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-4.png 306w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-4-300x198.png 300w" sizes="(max-width: 306px) 100vw, 306px" /></figure> <h2>Add the Release notes template</h2> <p>The process for automatically adding a release note in the wiki needs a template file. Please see the Markdown file below. Yes, it looks very weird now, with all those mustache brackets, but no worries. All of those {{variable}} thingies will be automatically replaced.</p> <script src="https://gist.github.com/DaveRuijter/48bc085f2936aa4f9e7cbc0f65328102.js"></script> <p>Add this file in your git repository in a folder called <code>.azuredevops</code>, with the name <code>release-notes-template.md</code>. If you use another name or location, remember to change the values in the YAML pipeline as well (we will add that pipeline later). </p> <h2>Add the GitVersion configuration file</h2> <p>The process for semantic versioning (GitVersion tool) relies on a config file. Please see the file below.</p> <script src="https://gist.github.com/DaveRuijter/b130061802b307249888b9f4462a1189.js"></script> <p>Add this to your git repository in the <code>.azuredevops</code> folder, with the name <code>gitversion.yml</code>. If you use another name or location, remember to change the values in the YAML pipeline as well (we will add that pipeline later). </p> <p><strong>If you already have a release number for your project, you can set the initial seed value in the first line of the above file.</strong> Where it now says <code>next-version: 1.0</code>.</p> <h2>Required Azure DevOps permissions and settings</h2> <p>To be able to make this setup work correctly, we need a couple of permissions and settings.</p> <ul id="block-a9029d31-8a5b-40bb-bd88-8fd228a035a0"><li>Make sure that the Build Service user of your project has the <code>Contribute </code>and <code>Create tag</code> permissions on the git code repository that you work from:</li></ul> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="429" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-1024x429.png" alt="" class="wp-image-880" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-1024x429.png 1024w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-300x126.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-768x322.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-1536x643.png 1536w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-370x155.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-570x239.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-770x322.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-1170x490.png 1170w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11-1385x580.png 1385w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-11.png 1942w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <ul id="block-a9029d31-8a5b-40bb-bd88-8fd228a035a0"><li>Make sure that the Build Service user has <code>Contribute</code> permissions on the git repo of your project wiki. You can manage the Wiki security by clicking on the tripple dotted menu of the wiki:</li></ul> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="396" height="264" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-12.png" alt="" class="wp-image-881" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-12.png 396w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-12-300x200.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-12-370x247.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-12-270x180.png 270w" sizes="(max-width: 396px) 100vw, 396px" /></figure> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="927" height="722" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-14.png" alt="" class="wp-image-883" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-14.png 927w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-14-300x234.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-14-768x598.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-14-370x288.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-14-570x444.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-14-770x600.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-14-745x580.png 745w" sizes="(max-width: 927px) 100vw, 927px" /></figure> <ul id="block-a9029d31-8a5b-40bb-bd88-8fd228a035a0"><li>Disable the project setting called <code>Limit job authorization scope to referenced Azure DevOps repositories</code>. See the screenshot below where to find that setting. If it is un-editable, first go to the Organization settings and disable it there, then come back to the project settings and disable it here.</li></ul> <figure class="wp-block-image size-large is-resized is-style-default"><img loading="lazy" decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-1024x712.png" alt="" class="wp-image-879" width="1024" height="712" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-1024x712.png 1024w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-300x209.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-768x534.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-370x257.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-570x396.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-770x535.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-10-834x580.png 834w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-10.png 1056w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <h2>Add the pipeline for Pull Request administration</h2> <p>To pick up the selected type of update from the Pull Request description, I have created the Azure DevOps pipeline shown below (.yml). You will have to add that to your repo in a location of your preference with the name of your preference. I call it <code>pipeline-pullrequest-administration.yml</code>.</p> <script src="https://gist.github.com/DaveRuijter/252f629776dc8f380a10fdd26cf2a2db.js"></script> <p>This YAML pipeline performs the following tasks:</p> <ul><li>Install GitVersion to be used in the last task.</li><li>Retrieve the Pull Request description and store it in a variable to use it in the next task.</li><li>Add a git commit message based on what was selected in the PR description, marking the branch as either a <code>patch</code>, <code>minor </code>or <code>major </code>change. This will be picked up by A) the next task, and by B) in another DevOps YAML pipeline that is triggered right after the merge is completed.</li><li>Determine the correct semantic version (note, this is only the version of the feature branch, it does not have to be the same as the final version).</li></ul> <p>Now that we have this pipeline, let put it to use. First, add a new pipeline in your Azure DevOps project, pointing to the .yml file in your repo. I have called my pipeline <code>Pull Request administration</code>.</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="555" height="229" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-5.png" alt="" class="wp-image-873" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-5.png 555w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-5-300x124.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-5-370x153.png 370w" sizes="(max-width: 555px) 100vw, 555px" /></figure> <p>Secondly, go to the policies page of your main branch:</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="255" height="468" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-6.png" alt="" class="wp-image-874" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-6.png 255w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-6-163x300.png 163w" sizes="(max-width: 255px) 100vw, 255px" /></figure> <p>Add a Build Validation, based on the pipeline you have just added.</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="467" height="697" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-7.png" alt="" class="wp-image-875" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-7.png 467w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-7-201x300.png 201w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-7-370x552.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-7-389x580.png 389w" sizes="(max-width: 467px) 100vw, 467px" /></figure> <p>This will start the Pull Request administration pipeline every time we publish a Pull Request <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" />!</p> <h2>Add the pipeline for release administration</h2> <p>We also need a second Azure DevOps YAML pipeline for our release administration. Please see the script below, save it as a pipeline .yml file in your repo, in your preferred location. I use the name <code>pipeline-release-administration.yml</code>.</p> <script src="https://gist.github.com/DaveRuijter/c72cecb255cd52d73aeae0f67cd0ee1e.js"></script> <p>Please update the repo reference to the correct URL of your git repo of the wiki, not the wiki itself:</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="685" height="156" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-9.png" alt="" class="wp-image-877" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-9.png 685w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-9-300x68.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-9-370x84.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-9-570x130.png 570w" sizes="(max-width: 685px) 100vw, 685px" /></figure> <p>This YAML pipeline performs the following tasks:</p> <ul><li>Job CalculateVersion<ul><li>Install GitVersion to be used in the last task.</li><li>Determine the correct semantic version.</li><li>Update the Build.BuildNumber to use SemVer, as by default it uses FullSemVer (not my preferrence).</li><li>Add git tag for the calculated semantic version (e.g. v2.0.1).</li></ul></li><li>Job CreateReleaseNotes<ul><li>Generates a release notes file</li><li>Publishes the release notes in the project wiki</li></ul></li></ul> <p>Add a new pipeline in your Azure DevOps project, pointing to the .yml file in your repo that you have just uploaded (<code>pipeline-release-administration.yml</code>). I have called my pipeline <code>Release administration</code>. </p> <p>Great, this pipeline will now be automatically triggered for each commit performed on the main or master branch, because of the trigger section in the .yml file! <strong>In other words, after each Pull Request is completed, this pipeline will be started <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f680.png" alt="🚀" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</strong></p> <h2>Bonus: Tags</h2> <p>After every Pull Request completes, a new Tag will be automatically added, pointing to the (git) version of your code.</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="309" height="742" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-15.png" alt="" class="wp-image-885" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-15.png 309w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-15-125x300.png 125w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-15-242x580.png 242w" sizes="(max-width: 309px) 100vw, 309px" /></figure> <h2>Bonus: Release Notes</h2> <p>And, after each Pull Request completes, a note will be automatically prepended in the release notes page in the wiki for the new version. It will have a list of associated Pull Requests (usually just a single request) and the associated work items. And, a download link is added as well. I tried to mimic the standard GitHub release notes pages <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f609.png" alt="😉" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p> <figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="784" height="991" src="https://www.moderndata.ai/wp-content/uploads/2021/10/image-16.png" alt="" class="wp-image-886" srcset="https://www.moderndata.ai/wp-content/uploads/2021/10/image-16.png 784w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-16-237x300.png 237w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-16-768x971.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-16-370x468.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-16-570x720.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-16-770x973.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/10/image-16-459x580.png 459w" sizes="(max-width: 784px) 100vw, 784px" /></figure> <h2>Limitations and considerations</h2> <ul><li>You <strong>can’t use the <code>Squash merge</code> type</strong>, when completing the Pull Request to your main branch. We need the commit messages of the feature branch (with <code>+semver:</code>) to be transferred to the main branch.</li></ul> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/10/automatic-semantic-versioning-in-azure-devops-with-release-notes/">Automatic semantic versioning (bonus: with release notes!)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2021/10/automatic-semantic-versioning-in-azure-devops-with-release-notes/feed/</wfw:commentRss> <slash:comments>30</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">862</post-id> </item> <item> <title>How to colorize each Azure Databricks workspace differently 🎨</title> <link>https://www.moderndata.ai/2021/02/how-to-colorize-each-azure-databricks-workspace-differently/?utm_source=rss&utm_medium=rss&utm_campaign=how-to-colorize-each-azure-databricks-workspace-differently</link> <comments>https://www.moderndata.ai/2021/02/how-to-colorize-each-azure-databricks-workspace-differently/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Thu, 18 Feb 2021 19:31:53 +0000</pubDate> <category><![CDATA[Generic]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=838</guid> <description><![CDATA[<p>If you’re working with multiple Azure Databricks workspaces at the same time it can be very hard to keep those workspaces apart. Personally I struggled to keep track of the dev / tst / acc / prd workspaces. Luckily, Henko had the answer: color them differently! Of course, it would help a lot if the […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/02/how-to-colorize-each-azure-databricks-workspace-differently/">How to colorize each Azure Databricks workspace differently 🎨</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p><strong>If you’re working with multiple Azure Databricks workspaces at the same time it can be very hard to keep those workspaces apart. Personally I struggled to keep track of the dev / tst / acc / prd workspaces. Luckily, Henko had the answer: color them differently!</strong></p> <p>Of course, it would help a lot if the Azure Databricks workspace URL was not some random number that’s impossible to remember…</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="148" src="https://www.moderndata.ai/wp-content/uploads/2021/02/image-1024x148.png" alt="" class="wp-image-841" srcset="https://www.moderndata.ai/wp-content/uploads/2021/02/image-1024x148.png 1024w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-300x43.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-768x111.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-370x53.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-570x82.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-770x111.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-600x87.png 600w, https://www.moderndata.ai/wp-content/uploads/2021/02/image.png 1149w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <p> Or, when the name of the workspace would not be cut-off…</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="208" src="https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-1024x208.png" alt="" class="wp-image-843" srcset="https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-1024x208.png 1024w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-300x61.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-768x156.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-370x75.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-570x116.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-770x157.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-1170x238.png 1170w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2-600x122.png 600w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-2.png 1503w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <h4>The solution!</h4> <p>While sharing this daily struggle with <a href="https://www.moderndata.ai/twitterhenko" target="_blank" rel="noreferrer noopener">Henko</a>, he came up with a creative solution: colorize the browser page based on its URL!</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="529" src="https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-1024x529.png" alt="" class="wp-image-844" srcset="https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-1024x529.png 1024w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-300x155.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-768x397.png 768w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-1536x794.png 1536w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-370x191.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-570x294.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-770x398.png 770w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-1170x604.png 1170w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-1123x580.png 1123w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3-600x310.png 600w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-3.png 1566w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure> <p><a href="https://chrome.google.com/webstore/detail/urlcolors/jjccpcminoppplpmcfghflolejbdkekm" target="_blank" rel="noreferrer noopener">This browser plugin called URLColors</a> automatically adds a colorized border around the URLs you specify. Works in both Edge and Chrome.</p> <blockquote class="wp-block-quote"><p>Each set of keyword options is on a new line with the format for each being:<br><strong>keyword, color, [flash], [timer], [border width], [opacity]</strong><br>Where keyword and color are required. If you type in ‘flash’ as the third word, then the box will blink at interval set by the optional parameter ‘timer’. You can also specify custom border width and opacity values that will override the default values above.</p><cite>from the plugin settings</cite></blockquote> <p>This is an example to colorize 4 different Azure Databricks environments:<br><code>https://adb-<number for PRD>.azuredatabricks.net/, red<br>https://adb-<number for ACC>.azuredatabricks.net/, purple<br>https://adb-<number for TST>.5.azuredatabricks.net/, blue<br>https://adb-<number for DEV>.azuredatabricks.net/, green</code></p> <p>And, there are two settings you can customize to your preference, for the default border width and opacity. I like a 6px border and 0.8 opacity:</p> <figure class="wp-block-image size-large is-style-default"><img loading="lazy" decoding="async" width="746" height="240" src="https://www.moderndata.ai/wp-content/uploads/2021/02/image-4-edited.png" alt="" class="wp-image-846" srcset="https://www.moderndata.ai/wp-content/uploads/2021/02/image-4-edited.png 746w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-4-edited-300x97.png 300w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-4-edited-370x119.png 370w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-4-edited-570x183.png 570w, https://www.moderndata.ai/wp-content/uploads/2021/02/image-4-edited-600x193.png 600w" sizes="(max-width: 746px) 100vw, 746px" /></figure> <p><strong>Don’t forget to share this tip with your colleagues!</strong></p> <hr class="wp-block-separator"/> <p>PS. Have you checked these other amazing browser extentions?</p> <ul><li><a href="https://microsoftedge.microsoft.com/addons/detail/microsoft-editor-spellin/hokifickgkhplphjiodbggjmoafhignh" target="_blank" rel="noreferrer noopener">Microsoft Editor: Spelling & Grammar Checker </a></li><li><a href="https://microsoftedge.microsoft.com/addons/detail/dark-reader/ifoakfbpdcdoeenechcleahebpibofpc" target="_blank" rel="noreferrer noopener">Dark Reader</a></li></ul> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2021/02/how-to-colorize-each-azure-databricks-workspace-differently/">How to colorize each Azure Databricks workspace differently 🎨</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2021/02/how-to-colorize-each-azure-databricks-workspace-differently/feed/</wfw:commentRss> <slash:comments>1</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">838</post-id> </item> <item> <title>How to remove a Service Principal from all Power BI workspaces in one go</title> <link>https://www.moderndata.ai/2020/09/how-to-remove-a-service-principal-from-all-power-bi-workspaces-in-one-go/?utm_source=rss&utm_medium=rss&utm_campaign=how-to-remove-a-service-principal-from-all-power-bi-workspaces-in-one-go</link> <comments>https://www.moderndata.ai/2020/09/how-to-remove-a-service-principal-from-all-power-bi-workspaces-in-one-go/#respond</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Tue, 22 Sep 2020 18:04:57 +0000</pubDate> <category><![CDATA[Power BI]]></category> <category><![CDATA[PowerShell]]></category> <category><![CDATA[Service Principal]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=810</guid> <description><![CDATA[<p>In addition to adding a Service Principal to all Power BI workspaces, it is also useful to have a script for removing, right 😃? Check out the RemoveServicePrincipalFromPowerBIWorkspaces.ps1 gist on GitHub. It’s also framed below (same script). Script details This script will prompt for the (correct) ObjectId of the Service Principal (the one from “Enterprise […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/how-to-remove-a-service-principal-from-all-power-bi-workspaces-in-one-go/">How to remove a Service Principal from all Power BI workspaces in one go</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p>In addition to <em>adding </em>a Service Principal to all Power BI workspaces, it is also useful to have a script for <strong>removing</strong>, right <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f603.png" alt="😃" class="wp-smiley" style="height: 1em; max-height: 1em;" />? </p> <p>Check out the <strong><a href="https://gist.github.com/DaveRuijter/37597d6f370db7a776f9af04585fed13" target="_blank" rel="noreferrer noopener">RemoveServicePrincipalFromPowerBIWorkspaces.ps1</a></strong> gist on GitHub. It’s also framed below (same script).</p> <h4>Script details</h4> <p>This script will prompt for the (correct) ObjectId of the Service Principal (the one from “Enterprise applications” in Azure Active Directory), and it will prompt for the credentials of a Power BI Service Administrator.</p> <p>Before you run the script, you can specify:</p> <ul><li>If you want the Service Principal to be removed from workspaces in shared or premium capacity or both.</li></ul> <h4>Important notes/disclaimers</h4> <p>Before running this script, make sure you read this first:</p> <ul><li>Note: this script only works with <a href="https://docs.microsoft.com/en-us/power-bi/collaborate-share/service-create-the-new-workspaces?WT.mc_id=DP-MVP-5003585" target="_blank" rel="noreferrer noopener">v2 workspaces</a> (you can’t add a Service Principal to a v1 workspace).</li></ul> <script src="https://gist.github.com/DaveRuijter/37597d6f370db7a776f9af04585fed13.js"></script> <p></p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/how-to-remove-a-service-principal-from-all-power-bi-workspaces-in-one-go/">How to remove a Service Principal from all Power BI workspaces in one go</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2020/09/how-to-remove-a-service-principal-from-all-power-bi-workspaces-in-one-go/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">810</post-id> </item> <item> <title>My backpack for the Microsoft Ignite 2020 virtual conference</title> <link>https://www.moderndata.ai/2020/09/my-backpack-for-the-microsoft-ignite-2020-virtual-conference/?utm_source=rss&utm_medium=rss&utm_campaign=my-backpack-for-the-microsoft-ignite-2020-virtual-conference</link> <comments>https://www.moderndata.ai/2020/09/my-backpack-for-the-microsoft-ignite-2020-virtual-conference/#respond</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Tue, 22 Sep 2020 14:17:09 +0000</pubDate> <category><![CDATA[Data Platform]]></category> <category><![CDATA[Power BI]]></category> <category><![CDATA[Ignite]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=818</guid> <description><![CDATA[<p>The Microsoft Ignite 2020 virtual conference is starting today! It is online and free to attend. If you have not registered, do it quickly! As there are over 800 sessions, it can be overwhelming to build your schedule and find the things interesting for you. I want to emphasize the option to fill your virtual […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/my-backpack-for-the-microsoft-ignite-2020-virtual-conference/">My backpack for the Microsoft Ignite 2020 virtual conference</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <figure class="wp-block-image"><img loading="lazy" decoding="async" width="1347" height="250" src="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1.png" alt="" class="wp-image-822" srcset="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1.png 1347w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-300x56.png 300w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-1024x190.png 1024w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-768x143.png 768w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-370x69.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-570x106.png 570w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-770x143.png 770w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-1170x217.png 1170w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-My-backpack-for-the-Microsoft-Ignite-2020-virtual-conference-1-600x111.png 600w" sizes="(max-width: 1347px) 100vw, 1347px" /></figure> <p>The Microsoft Ignite 2020 virtual conference is starting today! It is online and free to attend. If you have not registered, <a href="https://msignite.eventcore.com/" target="_blank" rel="noopener noreferrer">do it quickly</a>! As there are over 800 sessions, it can be overwhelming to build your schedule and find the things interesting for you. I want to emphasize the option to <strong>fill your virtual backpack.</strong></p> <blockquote class="wp-block-quote"><p>Adding items to your backpack is the quickest way to store the sessions, speakers, partners, and attendee profiles you find interesting. Be sure to come back after the event and download your backpack to get refreshed links to presentations, recordings and more.</p></blockquote> <p>Below are the items I’ve added to my backpack so far, as these events/videos look interesting to me. If you are interested in data and analytics topics, this might help you to fill your backpack!</p> <p>I also want to challenge you to pick a couple of extra sessions that are out of your usual ‘zone’. An event like Ignite is the perfect opportunity to learn and discover!</p> <p>If you have questions for the speakers, find the associated ‘Ask the Experts’ session and get them answered there! </p> <p>There are also ‘Table Talks’, lasting 30 minutes that are conversational and can be large (up to 300 people!) and interactive. I’m one of the hosts of this exciting talk: <a href="https://myignite.microsoft.com/sessions/f55df338-fe4d-4994-a05b-64da8389b9cc" target="_blank" rel="noreferrer noopener">What are the possibilities of Data & Machine Learning?</a>, but I hear it is already fully booked.</p> <p>And don’t forget to complete at least one collection in the <a href="https://docs.microsoft.com/en-us/learn/certifications/microsoft-ignite-cloud-skills-challenge-2020-free-certification-exam?WT.mc_id=DP-MVP-5003585" target="_blank" rel="noreferrer noopener">Microsoft Ignite Cloud Skills Challenge</a>, to get a free certification exam! #keeplearning</p> <h2>Live sessions you can join (multiple air times)</h2> <p>Most of the sessions below have a live event, and then a couple of events that replay the same (recorded) session in different time zones.</p> <hr class="wp-block-separator"/> <p>Tuesday, September 22 at 6:40 PM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/8941fc9b-abb8-42d6-9dc3-cbbd9b691854">Microsoft Power Platform: Fill the App Gap and Supercharge Organizational Agility</a></h4> <p>Join James Phillips to learn how the Power Platform gives you a comprehensive, “no cliffs” platform for modern application development that creates a step change in organizational agility.</p> <p><strong>Speakers: </strong>James Phillips, Charles Lamanna, Kim Manis, Julie Strauss</p> <hr class="wp-block-separator"/> <p>Tuesday, September 22 at 7:30 PM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/08249e43-c5d0-4140-ab0d-2782dc694a55">Invent with Purpose on Azure with Julia White and Friends</a></h4> <p>Microsoft Azure gives you the freedom to build, manage, and run cloud native and hybrid applications on a massive, global cloud using your favorite tools and frameworks. Join Julia White, Corporate Vice President, Microsoft Azure, Erin Chapple, Corporate Vice President, Microsoft Azure Compute, and Rohan Kumar, Corporate Vice President, Azure Data, as they demonstrate the latest innovations. Come see how Azure is redefining hybrid, empowering developers to be more productive than ever before, enabling limitless data and analytics, and democratizing AI.</p> <p><strong>Speakers: </strong>Julia White, Erin Chapple, Rohan Kumar</p> <hr class="wp-block-separator"/> <p>Tuesday, September 22 at 9:15 PM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/d0c6bcf9-3189-46c6-b70e-a831b3cd8a9f">What’s new in Azure Cognitive Services</a></h4> <p>Do you want to find anomalies in data to quickly identify and troubleshoot issues? Or analyze how many people are keeping their six feet of social distancing in your store? From the introduction of the new Metrics Advisor service to the public preview of our new spatial analysis feature in Computer Vision, this session will provide an overview of the new services and features in Cognitive Services.</p> <p><strong>Speakers: </strong>Cory Clarke, Seth Juarez</p> <hr class="wp-block-separator"/> <p>Tuesday, September 22 at 10:45 PM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/120e39ad-7ead-48c0-b3ed-bb268ff4fc47">Azure Databricks – What’s new!</a></h4> <p>In this fast-paced roadmap session, we will discuss the best of what is new for Azure Databricks and how it can accelerate your Apache Spark big data analytics and machine learning workloads. We will give you the inside peek at our roadmap and reserve lots of time for your feedback and questions.</p> <p><strong>Speakers: </strong>Kyle Weller</p> <hr class="wp-block-separator"/> <p>Wednesday, September 23 at 8:15 AM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/1c8f52b1-6beb-4dd9-8ebb-132662b7dab2">Building systems of insights for enterprise scale with Power BI and Azure</a></h4> <p>Bringing business intelligence to thousands of users in the enterprise requires scaling for the largest datasets in the organization without compromising performance or security. Join us to learn about how Power BI deeply integrates with Azure Synapse Analytics and how you can easily leverage your investments in Azure to bring insights to those who need it the most.</p> <hr class="wp-block-separator"/> <p>Wednesday, September 23 at 8:30 PM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/30d7b8c5-a6ff-4649-9e51-085cd2b3d4d0">Building real-time enterprise analytics solutions with Azure Synapse Analytics</a></h4> <p>In this demo rich session, we’ll showcase the latest Azure Synapse Analytics capabilities for developing end-to-end solutions for real-time analytics, data warehousing, and machine learning</p> <p><strong>Speakers: </strong>Saveen Reddy</p> <hr class="wp-block-separator"/> <p>Wednesday, September 23 at 9:15 PM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/b749ff04-036d-413f-8933-a5813a25345e">Ask the Expert: Building real-time enterprise analytics solutions with Azure Synapse Analytics</a></h4> <p>In this demo rich session we’ll showcase the latest Azure Synapse Analytics capabilities for developing end-to-end solutions for real-time analytics, data warehousing, and machine learning</p> <p><strong>Speakers: </strong>Saveen Reddy</p> <hr class="wp-block-separator"/> <p>Thursday, September 24 at 12:15 AM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/d40e4b0f-c832-422f-949a-f82309253704">The Data Behind Space Exploration with NASA</a></h4> <p>Data science is a field that influences all aspects of our scientific community, including space exploration. Inspired by NASA, we’ve developed new content for analyzing space rocks and determining the likelihood of a successful rocket launch into space. Join this session to explore this data and to perhaps develop your own space exploration study using Visual Studio Code and a bit of Python. This session recommends a basic understanding of Visual Studio Code and Python programming.</p> <p><strong>Speakers: </strong>Sarah Guthals</p> <hr class="wp-block-separator"/> <p>Thursday, September 24 at 4:15 AM CEST</p> <h4><a href="https://myignite.microsoft.com/sessions/ed379101-19c5-4944-83b6-84ccf42bafe6">Rap with Rohan</a></h4> <p>Join Rohan Kumar, Corporate Vice President of Azure Data Engineering, for his “ask me anything” session covering all things Data & AI.</p> <p><strong>Speakers: </strong>Rohan Kumar</p> <hr class="wp-block-separator"/> <h2>Pre-recorded sessions for on demand viewing</h2> <h4><a href="https://myignite.microsoft.com/sessions/919ad4b1-0841-427e-9978-22eb9ffa4d9b">Advancing Power BI Premium for the enterprise analytics market and beyond</a></h4> <p>As the landscape for enterprise analytics and BI continues to evolve, Power BI continues helping organizations discover insights hidden in their data. Join us in this session to learn about new capabilities coming to Power BI Premium aimed at providing additional flexibility, scale and performance.</p> <p><strong>Speakers: </strong>David Magar</p> <hr class="wp-block-separator"/> <h4><a href="https://myignite.microsoft.com/sessions/30a265a5-29f0-4e91-85c6-f3f6f98b41c4">Connecting Power Apps, Automate, Virtual Agent, and BI with the Common Data Service</a></h4> <p>Data is central to transforming any business process, from action to automation to interaction to analysis. The Common Data Service provides a low code data platform which allows your data to permeate all of your apps, flows, agents, and dashboards – and in this session, we’ll show you how.</p> <p><strong>Speakers: </strong>Ryan Jones</p> <hr class="wp-block-separator"/> <h4><a href="https://myignite.microsoft.com/sessions/96e559e7-2f66-485b-b91f-086a55428f0b">Enhance your BI solutions with AI</a></h4> <p>Leveraging augmented analytics in your BI solutions does not have to require statistical expertise or a data scientist. Join this session to take a look at the latest AI capabilities coming to Power BI and learn how you can get started with them in just a couple of clicks. We will show how you can easily transform your reports into rich storytelling, exploration and insights experiences.</p> <p><strong>Speakers: </strong>Justyna Lucznik</p> <hr class="wp-block-separator"/> <h4><a href="https://myignite.microsoft.com/sessions/d784f75a-f081-468d-84b2-d500142c5998">Enterprise-grade ETL and data preparation using dataflows</a></h4> <p>Preparing data for business intelligence, app development, and process automation is a significant, labor-intensive and time-consuming challenge for business today. Dataflows provide enterprise-grade self-service data preparation capabilities, enabling business analysts and citizen developers to easily process and unify their data and store it in Azure-based data-lake storage and the Common Data Service. Join us to learn about dataflows, see demos of upcoming capabilities and product roadmap.</p> <p><strong>Speakers: </strong>Ben Sack, Miguel Llopis</p> <hr class="wp-block-separator"/> <h4><a href="https://myignite.microsoft.com/sessions/5e60285e-200a-4ea7-8ae5-28b71df3179b">Gain insights into usage and performance of your Power BI reports to increase adoption and reduce costs</a></h4> <p>Whether you are a report creator, part of central BI development team or administrator, you are designing, building, delivering, and managing BI applications to support top business priorities. This session focuses on capabilities in Power BI that will help you gain insights into usage and performance of your Power BI reports.</p> <p><strong>Speakers: </strong>Adam Saxton</p> <hr class="wp-block-separator"/> <h4><a href="https://myignite.microsoft.com/sessions/982ada7a-b541-4918-88e2-73f3053381e1">How Delta Lake with Azure Databricks can accelerate your big data workloads by 100x</a></h4> <p>Delta Lake empowers you to build reliable Data Lakes at scale. Come learn how you can leverage the new and advanced features of Delta Lake on Azure Databricks to easily transform your big data analytics and machine learning workloads. We will discuss Apache Spark optimizations like file compaction, Z-Order partitioning, schema evolution, unified batch and streaming, time travel, and how you can set expectations on data quality.</p> <p><strong>Speakers: </strong>Kyle Weller</p> <hr class="wp-block-separator"/> <h4><a href="https://myignite.microsoft.com/sessions/4f09286d-523e-4d07-a211-589340de6515">Managing your ML lifecycle with Azure Databricks and Azure Machine Learning</a></h4> <p>Machine learning development has new complexities beyond software development. There are a myriad of tools and frameworks which make it hard to track experiments, reproduce results, and deploy machine learning models. Learn how you can accelerate, collaborate and manage your end-to-end machine learning lifecycle on Azure Databricks using MLflow and Azure ML to reliably build, share, and deploy machine learning applications using Azure Databricks.</p> <p><strong>Speakers: </strong>Premal Shah</p> <hr class="wp-block-separator"/> <h4><a href="https://myignite.microsoft.com/sessions/5dba4a7d-89d7-44ad-8cbf-b362d6bea0e9">Running cost effective big data workloads with Azure Synapse and Azure Data Lake Storage</a></h4> <p>Learn how you can migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse and HDInsight with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions.</p> <p><strong>Speakers: </strong>James Baker, Michael Rys</p> <hr class="wp-block-separator"/> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/my-backpack-for-the-microsoft-ignite-2020-virtual-conference/">My backpack for the Microsoft Ignite 2020 virtual conference</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2020/09/my-backpack-for-the-microsoft-ignite-2020-virtual-conference/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">818</post-id> </item> <item> <title>How to add a Service Principal to all Power BI workspaces in one go</title> <link>https://www.moderndata.ai/2020/09/how-to-add-a-service-principal-to-all-power-bi-workspaces-in-one-go/?utm_source=rss&utm_medium=rss&utm_campaign=how-to-add-a-service-principal-to-all-power-bi-workspaces-in-one-go</link> <comments>https://www.moderndata.ai/2020/09/how-to-add-a-service-principal-to-all-power-bi-workspaces-in-one-go/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Mon, 21 Sep 2020 18:05:00 +0000</pubDate> <category><![CDATA[Power BI]]></category> <category><![CDATA[PowerShell]]></category> <category><![CDATA[Service Principal]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=796</guid> <description><![CDATA[<p>I’ve created a PowerShell script that will add a given Service Principal to all (configured) Power BI workspaces. This can be useful or even required for all kinds of scenarios, but I recently needed such a script for my BPAA solution as it needs to talk to the XMLA endpoint of all data models in […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/how-to-add-a-service-principal-to-all-power-bi-workspaces-in-one-go/">How to add a Service Principal to all Power BI workspaces in one go</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p>I’ve created a PowerShell script that will add a given Service Principal to all (configured) Power BI workspaces. This can be useful or even required for all kinds of scenarios, but I recently needed such a script for <a href="https://www.moderndata.ai/2020/09/check-the-quality-of-all-power-bi-data-models-at-once-with-best-practice-analyzer-automation-bpaa/" target="_blank" rel="noreferrer noopener">my BPAA solution</a> as it needs to talk to the XMLA endpoint of all data models in the Power BI Service and to be able to do that, the Service Principal needs to be a member of the workspaces.</p> <h4>Script details</h4> <p>Check out the <strong><a href="https://gist.github.com/DaveRuijter/51d4c8cfcb966b9124a52d45d997bbed" target="_blank" rel="noreferrer noopener">AddServicePrincipalToPowerBIWorkspaces.ps1</a></strong> gist on GitHub. It’s also framed below (same script).</p> <p>This script will prompt for the (correct) ObjectId of the Service Principal (the one from “Enterprise applications” in Azure Active Directory), and it will prompt for the credentials of a Power BI Service Administrator.</p> <p>Before you run the script, you can specify:</p> <ul><li>If you want the Service Principal to be added to workspaces in shared or premium capacity or both.</li><li>The type of role the Service Principal will get in all the workspaces. </li><li>If you want to force update that in case the Service Principal is already a member.</li></ul> <h4>Important notes/disclaimers</h4> <p>Before running this script, make sure you read these notes/disclaimers first:</p> <ul><li>The given Service Principal will have permissions to access the data models in the workspaces it is added to. Please be incredibly careful and handle the secret of the Service Principal with care. Consider storing the details of the Service Principal, including the value of the secret in a private password manager or (Azure Key) vault. </li><li>Tip: consider removing the Service Principal directly after you are finished with the task that requires the Service Principal to be a member of the workspaces.<a href="https://gist.github.com/DaveRuijter/37597d6f370db7a776f9af04585fed13" target="_blank" rel="noreferrer noopener"> I have a script to remove a Service Principal from all Power BI workspaces.</a></li><li>Note: this script only works with <a href="https://docs.microsoft.com/en-us/power-bi/collaborate-share/service-create-the-new-workspaces?WT.mc_id=DP-MVP-5003585" target="_blank" rel="noreferrer noopener">v2 workspaces</a> (you can’t add a Service Principal to a v1 workspace).</li></ul> <script src="https://gist.github.com/DaveRuijter/51d4c8cfcb966b9124a52d45d997bbed.js"></script> <p></p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/how-to-add-a-service-principal-to-all-power-bi-workspaces-in-one-go/">How to add a Service Principal to all Power BI workspaces in one go</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2020/09/how-to-add-a-service-principal-to-all-power-bi-workspaces-in-one-go/feed/</wfw:commentRss> <slash:comments>7</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">796</post-id> </item> <item> <title>Check the quality of all Power BI data models at once with Best Practice Analyzer Automation (BPAA)</title> <link>https://www.moderndata.ai/2020/09/check-the-quality-of-all-power-bi-data-models-at-once-with-best-practice-analyzer-automation-bpaa/?utm_source=rss&utm_medium=rss&utm_campaign=check-the-quality-of-all-power-bi-data-models-at-once-with-best-practice-analyzer-automation-bpaa</link> <comments>https://www.moderndata.ai/2020/09/check-the-quality-of-all-power-bi-data-models-at-once-with-best-practice-analyzer-automation-bpaa/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Sat, 12 Sep 2020 18:25:56 +0000</pubDate> <category><![CDATA[Power BI]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=761</guid> <description><![CDATA[<p>Ever since the XMLA endpoint came available for data models in the Power BI service, I’ve been thinking about a way to use the Best Practice Analyzer of Tabular Editor to check all the existing data models in the Service in a single batch. And now it’s ready! I have created an automated solution for […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/check-the-quality-of-all-power-bi-data-models-at-once-with-best-practice-analyzer-automation-bpaa/">Check the quality of all Power BI data models at once with Best Practice Analyzer Automation (BPAA)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[<p><strong>Ever since the XMLA endpoint came available for data models in the Power BI service, I’ve been thinking about a way to use the Best Practice Analyzer of Tabular Editor to check all the existing data models in the Service in a single batch. And now it’s ready! I have created an automated solution for it, and I’m calling it BPAA (for Best Practice Analyzer Automation).</strong></p> <p><span style="font-size: 10pt;"><em><strong>tl;dr</strong><span class="su-tooltip" title="" data-close="no" data-behavior="hover" data-my="bottom center" data-at="top center" data-classes="su-qtip qtip-dark su-qtip-size-default" data-title="" data-hasqtip="0"><sup>ⓘ</sup></span></em></span><br /> <span style="font-size: 10pt;"><em>Use this <a href="https://github.com/DaveRuijter/BestPracticeAnalyzerAutomation" target="_blank" rel="noreferrer noopener" aria-label="undefined (opens in a new tab)">URL</a> to go to the repo in GitHub and download the BPAA.ps1 script. Before you run it, check the prerequisites below.</em></span></p> <p>The foundation of the solution is in the amazing Tabular Editor. It has a built-in capability to check a model for best practices. Let me quote how Daniel Otykier – the creator of Tabular Editor – introduces this on <a href="https://www.moderndata.ai/tabulareditor" target="_blank" rel="nofollow noopener noreferrer">his website</a>:</p> <blockquote><p><em>A tool that lets you define global or model-specific rules using a simple expression language. At any time, you can check whether objects in your model satisfy the rules. For example, you can create rules to check if naming conventions are kept, if metadata properties are set up correctly, if columns containing numeric values are hidden, if visible objects are exposed in perspectives, etc. The sky is the limit.</em></p></blockquote> <p>Running this on a single model is helpful, but I quickly saw the potential of verifying how best practices are applied to all the models in the Power BI service! Okay, not all of them, just the ones in your tenant <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f604.png" alt="😄" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p> <h4>Best Practice Analyzer Automation (BPAA)</h4> <p>Whether or not best practices are applied in a data model can have a significant impact on model size, performance, usability, manageability, transferability, and even correctness of the insights given by the model. It’s not an exact science of course, and it remains subjective and depending on various other factors, but I think we can all agree we should try to stick to best practices wherever we can, right?</p> <p>It is essential to have a (better) understanding of data model characteristics to be able to:</p> <ul> <li>Identify models that can be improved (performance, size, usability, manageability, transferability, correctness).</li> <li>Identity creators that you can assist and educate, so they can create higher quality models.</li> <li>Improve utilization of dedicated capacity in Power BI Premium, as a decrease in model size or decrease in refresh duration has a direct positive effect on the availability of memory and cores in your dedicated capacity. <em>Of course, it is also important for models on shared capacity. Imagine the resources freed up by optimizing all those hundreds of thousands of models in shared capacity. Less visible and indirect, but this will benefit all of us.</em></li> </ul> <p>Instead of manually running the Best Practice Analyzer of Tabular Editor on each data model in the Power BI Service, I wanted to automate the entire process. Because that’s what we do, right, automating things <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f600.png" alt="😀" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p> <p>The solution I’ve created takes care of authentication, setup, configuration, parameterization, and aggregation of the results. Leaving time for you to focus on the important things: analysis of where and why people are not following best practices!</p> <h4>Best Practice Rules</h4> <p>I’ve mentioned a couple of times now that there are ‘best practices’ for data models. They are captured on this <a href="https://www.moderndata.ai/TabularEditorBestPracticeRulesGitHub" target="_blank" rel="nofollow noopener noreferrer">Best Practice Rules repository</a> that serves as a public collection of rules that people use in Tabular Editor and that the community can contribute to. I simply automatically download the Power BI specific rules file and use it in my automation script.</p> <p>Some example rules:</p> <ul> <li>Disable auto date/time</li> <li>Remove unused measures and columns</li> <li>Avoid division (use DIVIDE function instead)</li> <li>Do not use floating point data types</li> <li>Hide foreign key columns</li> <li>Naming Convention</li> </ul> <h4>BPAA PowerShell script</h4> <p>I’ve created a single PowerShell script that does all the heavy lifting. It downloads the portable edition of Tabular Editor and the library of commands to talk to the Power BI Service. Then it loops over the workspaces and executes the Best Practice Analyzer on each dataset. It stores the test results locally as .trx files (a standard xml-based output also used by Microsoft MSTest, a command-line utility that runs unit tests). And the metadata of the workspaces and datasets is also stored locally in JSON format.</p> <h4>Power BI report with a calculated score per data model</h4> <p>Of course, I’ve created a Power BI report on top of these files <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f60e.png" alt="😎" class="wp-smiley" style="height: 1em; max-height: 1em;" />. The start page provides an overview of the tests run, with a calculated score per data model and the aggregated average score.</p> <p>The score is based on an arbitrary deduction of points for each unit test that failed, using the severity from the best practice rules. This is executed in Power Query. Each data model starts with 10 points. High severity failures deduct 1.0 points each, medium severity failures deduct 0.2 point each, and low severity failures deduct 0.05 points each (these values are configurable with parameters). Each data model ends up with a score on this 0 – 10 scale. Then, in DAX, this is converted to be displayed on a 5-stars scale because it looks better <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f929.png" alt="🤩" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p> <p>To be able to quickly check why a data model scored low, a tooltip page shows the unit tests that failed and some basic info. More details are available with a drillthrough action. Feel free to extend and customize the report to your preferences. This is a screenshot of the first page, with the tooltip:</p> <p><img loading="lazy" decoding="async" class="alignnone wp-image-791 " src="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA.png" width="814" height="464" srcset="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA.png 1606w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-300x171.png 300w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-1024x583.png 1024w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-768x438.png 768w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-1536x875.png 1536w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-370x211.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-570x325.png 570w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-770x439.png 770w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-1170x667.png 1170w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-1018x580.png 1018w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Check-the-quality-of-all-Power-BI-data-models-at-once-with-Best-Practice-Analyzer-Automation-BPAA-600x342.png 600w" sizes="(max-width: 814px) 100vw, 814px" /></p> <h4>Prerequisites</h4> <p>To run my solution, you need to take care of a couple of things upfront:</p> <ol> <li>You need a Service Principal (e.g. App Registration) in Azure Active Directory. Nothing special, just add a secret to it and write down: <ul> <li>ClientId</li> <li>Secret</li> <li>TenantId</li> </ul> </li> <li>Download the latest version of <a href="https://www.moderndata.ai/latestBPAA" target="_blank" rel="nofollow noopener noreferrer">the BPAA solution from the repo in GitHub</a>.</li> <li>Unzip the BPAA solution to a directory of choice.</li> <li>Add the Service Principal of step 1 to all Power BI workspaces, and make sure it has the ‘member’ or ‘admin’ role. If you have more than a dozen workspaces you would want to script this! No worries, I’ve added a script for that in <a href="https://www.moderndata.ai/BPAA" target="_blank" rel="nofollow noopener noreferrer">the BPAA repo</a>. You can find it in the directory of step 3. Note: you need the correct ObjectId of the Service Principal! You can find it in the “Enterprise applications” screen in Azure Active Directory (not the App Registrations screen!).</li> <li>The solution only works on v2 workspaces. So, this might be an appropriate time to upgrade those workspaces?</li> <li>The solution only works on workspaces in Premium capacity (it needs the XMLA endpoint). You could spin up an Azure Power BI Embedded (A sku) for the duration of this exercise, and connect the workspaces to that capacity. When you’re done you connect them back to shared capacity and suspend the premium capacity.</li> </ol> <h4>Important notes/disclaimers</h4> <p>Before running this script, make sure you read these notes/disclaimers first:</p> <ul> <li>The script will activate all datasets in the Power BI premium capacity/capacities you have running. In case you have overcommitted the capacity (the size of all the uploaded datasets combined exceeds the memory of your capacity), this will result in a lot of dataset evictions. Especially for larger datasets, this might have a noticeable impact on performance of the dataset(s) or reports connected to that dataset (for example in a load time before a report is completely rendered, because the model was evicted and needs to be loaded into memory again). Please consider running the script on a date and time that will have the least impact on daily operations.</li> <li>The Service Principal will have permissions to access the data models of all Power BI workspaces in your tenant. Please be incredibly careful and handle the secret with care. Consider storing the details of the Service Principal, including the value of the secret in a private password manager or (Azure Key) vault.</li> <li>Tip: consider removing the Service Principal directly after you are finished with the BPAA script. <a href="https://gist.github.com/DaveRuijter/37597d6f370db7a776f9af04585fed13" target="_blank" rel="noreferrer noopener">I have a script to remove a Service Principal from all Power BI workspaces.</a></li> <li>Be very considerate in how you interpret, use, and share the results of the best practice analyzer script. The report that comes with the solution will contain a (arbitrary) score per data model owner and please note that this does not reflect the real quality of the data model. It’s just an indication! Not entirely following best practices does not make a model a bad model! Please use the results of this script to help grow the Power BI community in your organization in a positive way and to make people aware of these modelling best practices.</li> </ul> <h4>How to run the solution</h4> <p>Running the script is easy! Now go for it:</p> <ol> <li>Run the <strong>BPAA.ps1</strong> script that is part of the solution! There are a couple of ways to do this. Perhaps you can right-click on the BPAA.ps1 file and select ‘Run with PowerShell’? Or you start PowerShell from the Windows Start menu, and provide the path to the .ps1 file and press enter. Or, you run the script in Windows PowerShell ISE / Visual Studio Code.<br /> <img loading="lazy" decoding="async" class="alignnone wp-image-785 " src="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA.png" width="551" height="221" srcset="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA.png 885w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-300x120.png 300w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-768x308.png 768w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-370x148.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-570x229.png 570w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-770x309.png 770w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-600x241.png 600w" sizes="(max-width: 551px) 100vw, 551px" /></li> <li>Provide the properties of the Service Principal to the script.<br /> <img loading="lazy" decoding="async" class="alignnone wp-image-786 " src="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1.png" width="698" height="240" srcset="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1.png 1169w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1-300x103.png 300w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1-1024x352.png 1024w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1-768x264.png 768w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1-370x127.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1-570x196.png 570w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1-770x265.png 770w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-1-600x206.png 600w" sizes="(max-width: 698px) 100vw, 698px" /></li> <li>Then, after the script is finished, it will open the Power BI template report that is part of the BPAA solution. It will prompt you to accept the parameter values. Click Load.<br /> <img loading="lazy" decoding="async" class="alignnone wp-image-789 " src="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3.png" width="627" height="349" srcset="https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3.png 1049w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-300x167.png 300w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-1024x570.png 1024w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-768x428.png 768w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-370x206.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-570x317.png 570w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-770x429.png 770w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-1042x580.png 1042w, https://www.moderndata.ai/wp-content/uploads/2020/09/Pasted-into-Verify-all-Power-BI-data-models-with-Best-Practice-Analyzer-Automation-BPAA-3-600x334.png 600w" sizes="(max-width: 627px) 100vw, 627px" /></li> </ol> <p>After the data load, the report will give an overview of the test results!</p> <h4>Roadmap / to do</h4> <p>I’ve got a couple of items on my mind to add to the solution. Let me know what you’d like to see added in the comments below.</p> <ol> <li>Check if TE is already installed and available via program files</li> <li>Embed option to start/suspend A sku</li> <li>Embed option to move all non-premium workspaces to a given Premium capacity during script execution</li> <li>Embed logic to add the service principal in all workspaces in the Power BI Service (is now a separate script)</li> <li>Add support to specify a local BPA rules file (instead of Url)</li> </ol> <p>Want to contribute? Or report an issue?<br /> Please do it via GitHub: <a href="https://www.moderndata.ai/BPAA" target="_blank" rel="nofollow noopener noreferrer">https://www.moderndata.ai/BPAA</a>.</p> <h4>Links to more background information</h4> <ul> <li><a href="https://docs.microsoft.com/en-us/power-bi/developer/embedded/embedded-faq#what-object-id-is-the-service-principal-object-id?WT.mc_id=DP-MVP-5003585" target="_blank" rel="noopener noreferrer">What object ID is the service principal object ID?</a></li> <li><a href="https://docs.microsoft.com/en-us/powershell/power-bi/overview?WT.mc_id=DP-MVP-5003585view=po" target="_blank" rel="noopener noreferrer">Microsoft Power BI Cmdlets for Windows PowerShell and PowerShell Core</a></li> <li><a href="https://docs.microsoft.com/en-us/power-bi/admin/service-premium-connect-tools?WT.mc_id=DP-MVP-5003585" target="_blank" rel="noopener noreferrer">Dataset connectivity with the XMLA endpoint</a></li> <li><a href="https://docs.microsoft.com/en-us/power-bi/admin/troubleshoot-xmla-endpoint?WT.mc_id=DP-MVP-5003585" target="_blank" rel="noopener noreferrer">Troubleshoot XMLA endpoint connectivity</a></li> <li><a href="https://docs.microsoft.com/en-us/power-bi/admin/service-premium-service-principal?WT.mc_id=DP-MVP-5003585" target="_blank" rel="noopener noreferrer">Automate Premium workspace and dataset tasks with service principals</a></li> </ul> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/09/check-the-quality-of-all-power-bi-data-models-at-once-with-best-practice-analyzer-automation-bpaa/">Check the quality of all Power BI data models at once with Best Practice Analyzer Automation (BPAA)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2020/09/check-the-quality-of-all-power-bi-data-models-at-once-with-best-practice-analyzer-automation-bpaa/feed/</wfw:commentRss> <slash:comments>6</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">761</post-id> </item> <item> <title>Working with Azure icons in draw.io (diagrams.net)</title> <link>https://www.moderndata.ai/2020/08/working-with-azure-icons-in-draw-io-diagrams-net/?utm_source=rss&utm_medium=rss&utm_campaign=working-with-azure-icons-in-draw-io-diagrams-net</link> <comments>https://www.moderndata.ai/2020/08/working-with-azure-icons-in-draw-io-diagrams-net/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Wed, 05 Aug 2020 18:21:00 +0000</pubDate> <category><![CDATA[Generic]]></category> <category><![CDATA[Architecture]]></category> <category><![CDATA[Azure]]></category> <category><![CDATA[Diagram]]></category> <category><![CDATA[diagrams.net]]></category> <category><![CDATA[draw.io]]></category> <category><![CDATA[GitHub]]></category> <category><![CDATA[Icons]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=693</guid> <description><![CDATA[<p>Every now and then I need to draw a diagram for a solution or platform architecture, and enjoy doing that! I usually spend more time on them then planned 🙄. There are lots of tools to create these diagrams, and lately I have been primarily using draw.io and I love it 🤩. Want to know […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/08/working-with-azure-icons-in-draw-io-diagrams-net/">Working with Azure icons in draw.io (diagrams.net)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p><strong>Every now and then I need to draw a diagram for a solution or platform architecture, and enjoy doing that! I usually spend more time on them then planned <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f644.png" alt="🙄" class="wp-smiley" style="height: 1em; max-height: 1em;" />. There are lots of tools to create these diagrams, and lately I have been primarily using draw.io and I love it <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f929.png" alt="🤩" class="wp-smiley" style="height: 1em; max-height: 1em;" />. Want to know how I make sure I have the latest Azure icons to work with? Read on!</strong></p> <p class="has-text-color has-background has-very-dark-gray-color has-very-light-gray-background-color"><strong>tl;dr</strong><span id="su_tooltip_673f04fa3eb7f_button" class="su-tooltip-button su-tooltip-button-outline-yes" aria-describedby="su_tooltip_673f04fa3eb7f" data-settings='{"position":"top","behavior":"hover","hideDelay":0}' tabindex="0"><sup>ⓘ</sup></span><span style="display:none;z-index:100" id="su_tooltip_673f04fa3eb7f" class="su-tooltip" role="tooltip"><span class="su-tooltip-inner su-tooltip-shadow-no" style="z-index:100;background:#505050;color:#F3F3F3;font-size:14px;border-radius:0px;text-align:left;max-width:300px;line-height:1.25"><span class="su-tooltip-title"></span><span class="su-tooltip-content su-u-trim">used by someone who wrote a large post / article / whatever, to show a brief summary of their post as it might be too long</span></span><span id="su_tooltip_673f04fa3eb7f_arrow" class="su-tooltip-arrow" style="z-index:100;background:#505050" data-popper-arrow></span></span><br>Use this <a aria-label="undefined (opens in a new tab)" href="https://app.diagrams.net/?splash=0&libs=general&clibs=Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/1.%20Azure%20Icon%20Set;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/2.%20Azure%20Docs" target="_blank" rel="noreferrer noopener">URL</a> to directly open draw.io with the essential Azure icon libraries loaded and ready to use! Check <a aria-label="undefined (opens in a new tab)" href="https://github.com/DaveRuijter/diagrams.net" target="_blank" rel="noreferrer noopener">my GitHub repo</a> for all the libraries.</p> <p>Before I continue, did you know that the open source draw.io will be rebranded to diagrams.net? It’s because of the .io domain. You can find a link to more info on the ins-and-outs at the end of this blog post.</p> <h4>Online gem</h4> <p>It always takes more time to get a diagram correct then I want it to. I search way too long for icons. Sometimes I even end up creating them myself. There are some collections available online, but I can’t always see when they have been updated, and usually they are incomplete, so my search continues… </p> <p>But recently I stumbled upon an online <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f48e.png" alt="💎" class="wp-smiley" style="height: 1em; max-height: 1em;" />: the largest single collection of Azure icons available (or at least that I could find)! The collection combines multiple online sources into a single collection. You can browse through it online and download individual icons or the whole shebang. And the best thing is it’s up-to-date and will hopefully continue to be so (you never know of course)! It’s created by <strong>Ben Coleman</strong>, and you can find the web interface here: <a href="https://code.benco.io/icon-collection/">https://code.benco.io/icon-collection/</a>. His repo with the icons and even the tools used to create and manage this collection are available on GitHub: <a aria-label="undefined (opens in a new tab)" href="https://github.com/benc-uk/icon-collection" target="_blank" rel="noreferrer noopener">https://github.com/benc-uk/icon-collection</a>.</p> <h4>How to use the icon collection in draw.io (diagrams.net)</h4> <p>I’ve made it easy for you: I’ve created a couple of custom libraries with all the icons! I’ve uploaded them to my GitHub, and you can use them in a couple of ways. </p> <p>The first and most simple option is to just start the diagrams.net app with an URL with the correct libraries included as URL parameters:</p> <ul><li>Click <a href="https://app.diagrams.net/?splash=0&libs=general&clibs=Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/1.%20Azure%20Icon%20Set;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/2.%20Azure%20Docs">here</a> to open app.diagrams.net <strong>with the essential libraries</strong>.</li><li>Click <a aria-label="undefined (opens in a new tab)" href="https://app.diagrams.net/?splash=0&libs=general&clibs=Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/1.%20Azure%20Icon%20Set;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/2.%20Azure%20Docs;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/3.%20Azure%20Cloud%20Design%20Studio%20Set;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/4.%20Azure%20Patterns%20A-C;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/5.%20Azure%20Patterns%20D-L;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/6.%20Azure%20Patterns%20M-S;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/7.%20Azure%20Patterns%20T-Z;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/8.%20Azure%20%26%20Microsoft%20misc;Uhttps://raw.githubusercontent.com/DaveRuijter/diagrams.net/master/9.%20Logos%20%26%20brands" target="_blank" rel="noreferrer noopener">here</a> to open app.diagrams.net <strong>with all 9 libraries</strong> <br><em>(this will take a minute to load in the app, give it some time!)</em>.</li></ul> <p>Alternatively, you can open draw.io (diagrams.net) and follow these steps to include the libraries manually.</p> <p>First, go to <a aria-label="undefined (opens in a new tab)" href="https://github.com/DaveRuijter/diagrams.net#links-to-the-raw-library-files-to-add-them-to-the-application-manually" target="_blank" rel="noreferrer noopener">my GitHub repo README</a>, check the list of URLs for the libraries, and copy the URL of the library you want to use:</p> <figure class="wp-block-image size-large is-resized img-border"><img decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1.png" alt="" class="wp-image-724" width="550" srcset="https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1.png 1006w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1-600x286.png 600w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1-300x143.png 300w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1-768x366.png 768w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1-370x177.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1-570x272.png 570w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_26_49-DaveRuijter_diagrams.net-and-19-more-pages-MVP-_-Thuis-Microsoft-Edge-1-770x367.png 770w" sizes="(max-width: 1006px) 100vw, 1006px" /></figure> <p class="has-text-color has-very-dark-gray-color">Then, click on the ‘File’ menu. Click on ‘Open Library from’. Click on ‘URL…’.</p> <figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--689x1024.png" alt="Navigate to 'open library from url' dialog" class="wp-image-719" width="400" srcset="https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--689x1024.png 689w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--600x892.png 600w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--202x300.png 202w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--768x1142.png 768w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--370x550.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--570x847.png 570w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--770x1145.png 770w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53--390x580.png 390w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-16_15_53-.png 834w" sizes="(max-width: 689px) 100vw, 689px" /><figcaption>Navigate to ‘open library from url’ dialog.</figcaption></figure> <p>Then, past in the URL of the library and click ‘Open’:</p> <figure class="wp-block-image size-large is-resized"><img loading="lazy" decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-15_52_24-Untitled-Diagram-diagrams.net_.png" alt="" class="wp-image-720" width="350" height="185"/><figcaption>Paste the ULR and click Open.</figcaption></figure> <p>And, after a few seconds (depending on your bandwidth), the library is ready to use:</p> <figure class="wp-block-image size-large is-resized"><img decoding="async" src="https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-17_00_44-.png" alt="" class="wp-image-738" width="400" srcset="https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-17_00_44-.png 517w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-17_00_44--195x300.png 195w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-17_00_44--370x570.png 370w, https://www.moderndata.ai/wp-content/uploads/2020/08/2020-08-05-17_00_44--376x580.png 376w" sizes="(max-width: 517px) 100vw, 517px" /><figcaption>(if you know how to display the library name correctly using this approach, please share it in a reply)</figcaption></figure> <p>Repeat this for all the libraries that you want to include.</p> <h4>Other info</h4> <ul><li>An overview of all keyboard shortcuts for draw.io (diagrams.net): <a href="https://app.diagrams.net/shortcuts.svg">https://app.diagrams.net/shortcuts.svg</a></li><li>Did you know draw.io is <a aria-label="undefined (opens in a new tab)" href="https://www.diagrams.net/blog/move-diagrams-net" target="_blank" rel="noreferrer noopener">slowly being renamed to diagrams.net</a>?</li><li>Did you know you can even <a aria-label="undefined (opens in a new tab)" href="https://www.diagrams.net/blog/embed-diagrams-vscode" target="_blank" rel="noreferrer noopener">create draw.io diagrams in VS Code</a>?</li><li>Supported URL parameters for diagrams.net: <a href="https://desk.draw.io/support/solutions/articles/16000042546">https://desk.draw.io/support/solutions/articles/16000042546</a></li><li>Documentation on draw.io libraries: <a href="https://github.com/jgraph/drawio-libs">https://github.com/jgraph/drawio-libs</a></li></ul> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/08/working-with-azure-icons-in-draw-io-diagrams-net/">Working with Azure icons in draw.io (diagrams.net)</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2020/08/working-with-azure-icons-in-draw-io-diagrams-net/feed/</wfw:commentRss> <slash:comments>14</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">693</post-id> </item> <item> <title>Automatic TeslaCam And Sentry Mode Video Processing In Azure – Part 3: TeslaUSB Setup</title> <link>https://www.moderndata.ai/2020/02/automatic-teslacam-and-sentry-mode-video-processing-in-azure-part-3-teslausb-setup/?utm_source=rss&utm_medium=rss&utm_campaign=automatic-teslacam-and-sentry-mode-video-processing-in-azure-part-3-teslausb-setup</link> <comments>https://www.moderndata.ai/2020/02/automatic-teslacam-and-sentry-mode-video-processing-in-azure-part-3-teslausb-setup/#comments</comments> <dc:creator><![CDATA[Dave Ruijter]]></dc:creator> <pubDate>Sun, 02 Feb 2020 07:30:39 +0000</pubDate> <category><![CDATA[Data Platform]]></category> <category><![CDATA[Azure]]></category> <category><![CDATA[AzureStorage]]></category> <category><![CDATA[RaspberryPi]]></category> <category><![CDATA[Tesla]]></category> <category><![CDATA[TeslaCam]]></category> <category><![CDATA[TeslaUSB]]></category> <guid isPermaLink="false">https://www.moderndata.ai/?p=599</guid> <description><![CDATA[<p>This is part 3 of my series on ‘Automatic TeslaCam And Sentry Mode Video Processing In Azure’. Check out the overview post, or the 2nd post about how to configure the Azure resources. In this post, I describe how to configure a RaspberryPi to run the TeslaUSB project, to automatically upload your TeslaCam videos to […]</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/02/automatic-teslacam-and-sentry-mode-video-processing-in-azure-part-3-teslausb-setup/">Automatic TeslaCam And Sentry Mode Video Processing In Azure – Part 3: TeslaUSB Setup</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></description> <content:encoded><![CDATA[ <p><strong>This is part 3 of my series on ‘Automatic TeslaCam And Sentry Mode Video Processing In Azure’. Check out <a href="https://www.moderndata.ai/2020/01/automatic-teslacam-and-sentry-mode-video-processing-in-azure-part-1/">the overview post</a>, or <a href="https://www.moderndata.ai/?p=596">the 2</a><a href="https://www.moderndata.ai/?p=596"><sup>nd</sup></a><a href="https://www.moderndata.ai/?p=596"> post</a> about how to configure the Azure resources. In this post, I describe how to configure a RaspberryPi to run the TeslaUSB project, to automatically upload your TeslaCam videos to Azure Storage! When in doubt, drop a question in the comments!</strong> </p> <p>So, you got yourselve the RaspberryPi Zero W and other hardware? And you have the Azure resources ready? Grab a cup of coffee and put your geek mode on, as we are about to flash a Rasbian Linux image on your RaspberryPi Zero W, then SSH into it to set up an RClone configuration <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f913.png" alt="🤓" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p> <p>It’s not as difficult as it sounds.</p> <h4>From zero to hero</h4> <p>For the first steps of this ‘tutorial’, you can almost completely follow the ‘OneStepSetup’ of the TeslaUSB project: <a href="https://github.com/marcone/teslausb/blob/main-dev/doc/OneStepSetup.md">https://github.com/marcone/teslausb/blob/main-dev/doc/OneStepSetup.md</a>. The important part is, <strong>you must specify ‘none’ as the archive method </strong>in the config file, which will configure the pi as a wifi-accessible USB drive, but will not set up the archive scripts yet.</p> <p>Note: as we have not created the IFTTT webhook yet, we will add that later in the config file. (This way you will also experience how to access and update the TeslaUSB project config after it is up-and-running and ‘deployed’ to your Tesla<sup>1</sup>).</p> <p>After the pi is up-and-running in your Tesla (or still at your desk, that’s also fine), SSH into it. Use your favorite SSH client (maybe Putty). If you are running an up-to-date version of Windows 10, you can simply open command prompt and use the ssh command:</p> <pre class="wp-block-code"><code>ssh pi@teslausb.local</code></pre> <h2>Configure RClone</h2> <p> Now, let’s setup/configure RClone on the pi. Execute the following commands:</p> <pre class="wp-block-code"><code>sudo -i /root/bin/remountfs_rw curl https://rclone.org/install.sh | sudo bash rclone config</code></pre> <p>Now go through the wizard and specify your new remote called ‘teslausbarchive’, and pick ’22’ to choose Azure Blob Storage as the type. Specify the name and key of your Azure Storage Account, and skip the SASurl and Emulator settings questions. Also, skip the advanced settings.</p> <p>Then check the configuration and confirm with ‘y’. Press ‘q’ to quit.</p> <p>Execute the following command to edit the TeslaUSB project settings:</p> <pre class="wp-block-code"><code>nano /root/teslausb_setup_variables.conf</code></pre> <p>Change the ARCHIVE_SYSTEM line to rclone (from none):</p> <pre class="wp-block-code"><code>export ARCHIVE_SYSTEM=rclone</code></pre> <p>Add these two lines to the config file, but with your values!</p> <pre class="wp-block-code"><code>export RCLONE_DRIVE=teslausbarchive export RCLONE_PATH=teslacam</code></pre> <p>I had to update the date of the pi to get the Azure data sync successfully (might not be necessary on your system):</p> <pre class="wp-block-code"><code>date -s "$(wget -qSO- --max-redirect=0 google.com 2>&1 | grep Date: | cut -d' ' -f5-8)Z"</code></pre> <p>Run the TeslaUSB setup by executing this line (choose yes at question):</p> <pre class="wp-block-code"><code>/root/bin/setup-teslausb</code></pre> <h2>Go go go, take your tesla on a test drive</h2> <p>Now, after one last reboot, the pi is ready! Put your pi in the Tesla if you have not done so already. And go drive a block and save some TeslaCam footage (press the camera icon on the screen in your Tesla). After returning home, the footage should be uploaded automatically to your Azure Storage Account.</p> <h4>TeslaUSB tips</h4> <p>Some general tips regarding the TeslaUSB project:</p> <ol><li>Change the password of the pi user.</li><li>If the upload of files is very slow, consider putting an (extra) WiFi access point closer to your Tesla.</li><li>To power the pi, make sure you use a USB cable that transmits data (not only power). Make sure you put the cable in the data enabled USB port.</li><li>If you use a USB hub in the Tesla, make sure the pi is connected to a USB port that is enabled for data (not only power).</li></ol> <h4>TeslaUSB troubleshooting</h4> <pre class="wp-block-code"><code>tail -n 50 /mutable/archiveloop.log tail -n 50 /boot/teslausb-headless-setup.log</code></pre> <p>or (from <a href="https://github.com/marcone/teslausb/issues/35">https://github.com/marcone/teslausb/issues/35</a>):</p> <pre class="wp-block-code"><code>sudo -i /root/bin/setup-teslausb selfupdate /root/bin/setup-teslausb diagnose > /mutable/diagnostics.txt</code></pre> <h4>TeslaUSB reconfig</h4> <pre class="wp-block-code"><code>sudo -i nano /root/teslausb_setup_variables.conf /root/bin/setup-teslausb selfupdate</code></pre> <div style="height:28px" aria-hidden="true" class="wp-block-spacer"></div> <hr class="wp-block-separator"/> <p><sup>1</sup>The actual reason is I have already settled for a specific order of blog posts, and the IFTTT blog post is now scheduled as blog post #5 in the series <img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f601.png" alt="😁" class="wp-smiley" style="height: 1em; max-height: 1em;" />.</p> <p>The post <a rel="nofollow" href="https://www.moderndata.ai/2020/02/automatic-teslacam-and-sentry-mode-video-processing-in-azure-part-3-teslausb-setup/">Automatic TeslaCam And Sentry Mode Video Processing In Azure – Part 3: TeslaUSB Setup</a> appeared first on <a rel="nofollow" href="https://www.moderndata.ai">Modern Data & AI</a>.</p> ]]></content:encoded> <wfw:commentRss>https://www.moderndata.ai/2020/02/automatic-teslacam-and-sentry-mode-video-processing-in-azure-part-3-teslausb-setup/feed/</wfw:commentRss> <slash:comments>7</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">599</post-id> </item> </channel> </rss>