Comprehensive Azure Data Factory validation rules, activity nesting limitations, linked service requirements, and edge-case handling guidance
This skill inherits all available tools. When active, it can use any tool Claude has access to.
MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (\) in file paths, NOT forward slashes (/).
Examples:
D:/repos/project/file.tsxD:\repos\project\file.tsxThis applies to:
NEVER create new documentation files unless explicitly requested by the user.
Azure Data Factory has STRICT nesting rules for control flow activities. Violating these rules will cause pipeline failures or prevent pipeline creation.
Four control flow activities support nested activities:
| Parent Activity | Can Contain | Notes |
|---|---|---|
| ForEach | If Condition | β Allowed |
| ForEach | Switch | β Allowed |
| Until | If Condition | β Allowed |
| Until | Switch | β Allowed |
| Parent Activity | CANNOT Contain | Reason |
|---|---|---|
| If Condition | ForEach | β Not supported - use Execute Pipeline workaround |
| If Condition | Switch | β Not supported - use Execute Pipeline workaround |
| If Condition | Until | β Not supported - use Execute Pipeline workaround |
| If Condition | Another If | β Cannot nest If within If |
| Switch | ForEach | β Not supported - use Execute Pipeline workaround |
| Switch | If Condition | β Not supported - use Execute Pipeline workaround |
| Switch | Until | β Not supported - use Execute Pipeline workaround |
| Switch | Another Switch | β Cannot nest Switch within Switch |
| ForEach | Another ForEach | β Single level only - use Execute Pipeline workaround |
| Until | Another Until | β Single level only - use Execute Pipeline workaround |
| ForEach | Until | β Single level only - use Execute Pipeline workaround |
| Until | ForEach | β Single level only - use Execute Pipeline workaround |
Validation Activity:
The ONLY supported workaround for prohibited nesting combinations:
Instead of direct nesting, use the Execute Pipeline Activity to call a child pipeline:
{
"name": "ParentPipeline_WithIfCondition",
"activities": [
{
"name": "IfCondition_Parent",
"type": "IfCondition",
"typeProperties": {
"expression": "@equals(pipeline().parameters.ProcessData, 'true')",
"ifTrueActivities": [
{
"name": "ExecuteChildPipeline_WithForEach",
"type": "ExecutePipeline",
"typeProperties": {
"pipeline": {
"referenceName": "ChildPipeline_ForEachLoop",
"type": "PipelineReference"
},
"parameters": {
"ItemList": "@pipeline().parameters.Items"
}
}
}
]
}
}
]
}
Child Pipeline Structure:
{
"name": "ChildPipeline_ForEachLoop",
"parameters": {
"ItemList": {"type": "array"}
},
"activities": [
{
"name": "ForEach_InChildPipeline",
"type": "ForEach",
"typeProperties": {
"items": "@pipeline().parameters.ItemList",
"activities": [
// Your ForEach logic here
]
}
}
]
}
Why This Works:
| Resource | Limit | Notes |
|---|---|---|
| Activities per pipeline | 80 | Includes inner activities for containers |
| Parameters per pipeline | 50 | - |
| ForEach concurrent iterations | 50 (maximum) | Set via batchCount property |
| ForEach items | 100,000 | - |
| Lookup activity rows | 5,000 | Maximum rows returned |
| Lookup activity size | 4 MB | Maximum size of returned data |
| Web activity timeout | 1 hour | Default timeout for Web activities |
| Copy activity timeout | 7 days | Maximum execution time |
{
"name": "ForEachActivity",
"type": "ForEach",
"typeProperties": {
"items": "@pipeline().parameters.ItemList",
"isSequential": false, // false = parallel execution
"batchCount": 50, // Max 50 concurrent iterations
"activities": [
// Nested activities
]
}
}
Critical Considerations:
isSequential: true β Executes one item at a time (slow but predictable)isSequential: false β Executes up to batchCount items in parallelbatchCount is 50 regardless of settingβ CANNOT use Set Variable inside ForEach with isSequential: false
Append Variable with array type, or use sequential execution{
"type": "AzureBlobStorage",
"typeProperties": {
"connectionString": {
"type": "SecureString",
"value": "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<key>"
}
}
}
β οΈ Limitations:
{
"type": "AzureBlobStorage",
"typeProperties": {
"sasUri": {
"type": "SecureString",
"value": "https://<account>.blob.core.windows.net/<container>?<SAS-token>"
}
}
}
Critical Requirements:
folderPath must be absolute path from container level{
"type": "AzureBlobStorage",
"typeProperties": {
"serviceEndpoint": "https://<account>.blob.core.windows.net",
"accountKind": "StorageV2", // REQUIRED for service principal
"servicePrincipalId": "<client-id>",
"servicePrincipalCredential": {
"type": "SecureString",
"value": "<client-secret>"
},
"tenant": "<tenant-id>"
}
}
Critical Requirements:
accountKind MUST be set (StorageV2, BlobStorage, or BlockBlobStorage){
"type": "AzureBlobStorage",
"typeProperties": {
"serviceEndpoint": "https://<account>.blob.core.windows.net",
"accountKind": "StorageV2" // REQUIRED for managed identity
},
"connectVia": {
"referenceName": "AutoResolveIntegrationRuntime",
"type": "IntegrationRuntimeReference"
}
}
Critical Requirements:
accountKind MUST be specified (cannot be empty or "Storage")| Issue | Cause | Solution |
|---|---|---|
| Data Flow fails with managed identity | accountKind empty or "Storage" | Set accountKind to StorageV2 |
| Secondary endpoint doesn't work | Using account key auth | Not supported - use different auth method |
| SAS token expired during run | Token expiry too short | Extend SAS token validity period |
| Cannot access $logs container | System container not visible in UI | Use direct path reference |
| Soft-deleted blobs inaccessible | Service principal/managed identity | Use account key or SAS instead |
| Private endpoint connection fails | Wrong endpoint for Data Flow | Ensure ADLS Gen2 private endpoint exists |
{
"type": "AzureSqlDatabase",
"typeProperties": {
"server": "<server-name>.database.windows.net",
"database": "<database-name>",
"authenticationType": "SQL",
"userName": "<username>",
"password": {
"type": "SecureString",
"value": "<password>"
}
}
}
Best Practice:
{
"type": "AzureSqlDatabase",
"typeProperties": {
"server": "<server-name>.database.windows.net",
"database": "<database-name>",
"authenticationType": "ServicePrincipal",
"servicePrincipalId": "<client-id>",
"servicePrincipalCredential": {
"type": "SecureString",
"value": "<client-secret>"
},
"tenant": "<tenant-id>"
}
}
Requirements:
db_datareader, db_datawriter, etc.{
"type": "AzureSqlDatabase",
"typeProperties": {
"server": "<server-name>.database.windows.net",
"database": "<database-name>",
"authenticationType": "SystemAssignedManagedIdentity"
}
}
Requirements:
Server=tcp:<server>.database.windows.net,1433;
Database=<database>;
Encrypt=mandatory; // Options: mandatory, optional, strict
TrustServerCertificate=false;
ConnectTimeout=30;
CommandTimeout=120;
Pooling=true;
ConnectRetryCount=3;
ConnectRetryInterval=10;
Critical Parameters:
Encrypt: Default is mandatory (recommended)Pooling: Set to false if experiencing idle connection issuesConnectRetryCount: Recommended for transient fault handlingConnectRetryInterval: Seconds between retries| Issue | Cause | Solution |
|---|---|---|
| Serverless tier auto-paused | Pipeline doesn't wait for resume | Implement retry logic or keep-alive |
| Connection pool timeout | Idle connections closed | Add Pooling=false or configure retry |
| Firewall blocks connection | IP not whitelisted | Add Azure IR IPs or enable Azure services |
| Always Encrypted fails in Data Flow | Not supported for sink | Use service principal/managed identity in copy activity |
| Decimal precision loss | Copy supports up to 28 precision | Use string type for higher precision |
| Parallel copy not working | No partition configuration | Enable physical or dynamic range partitioning |
{
"source": {
"type": "AzureSqlSource",
"partitionOption": "PhysicalPartitionsOfTable" // or "DynamicRange"
},
"parallelCopies": 8, // Recommended: (DIU or IR nodes) Γ (2 to 4)
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage",
"type": "LinkedServiceReference"
}
}
}
Partition Options:
PhysicalPartitionsOfTable: Uses SQL Server physical partitionsDynamicRange: Creates logical partitions based on column valuesNone: No partitioning (default)Staging Best Practices:
| Transformation | Limitation |
|---|---|
| Lookup | Cache size limited by cluster memory |
| Join | Large joins may cause memory errors |
| Pivot | Maximum 10,000 unique values |
| Window | Requires partitioning for large datasets |
accountKind is setbatchCount β€ 50 if parallel executionCRITICAL: Always run automated validation before committing or deploying ADF pipelines!
The adf-master plugin includes a comprehensive PowerShell validation script that checks for ALL the rules and limitations documented above.
Location: ${CLAUDE_PLUGIN_ROOT}/scripts/validate-adf-pipelines.ps1
Basic usage:
# From the root of your ADF repository
pwsh -File validate-adf-pipelines.ps1
With custom paths:
pwsh -File validate-adf-pipelines.ps1 `
-PipelinePath "path/to/pipeline" `
-DatasetPath "path/to/dataset"
With strict mode (additional warnings):
pwsh -File validate-adf-pipelines.ps1 -Strict
The automated validation script checks for issues that Microsoft's official @microsoft/azure-data-factory-utilities package does NOT validate:
Activity Nesting Violations:
Resource Limits:
Variable Scope Violations:
Dataset Configuration Issues:
Copy Activity Validations:
GitHub Actions example:
- name: Validate ADF Pipelines
run: |
pwsh -File validate-adf-pipelines.ps1 -PipelinePath pipeline -DatasetPath dataset
shell: pwsh
Azure DevOps example:
- task: PowerShell@2
displayName: 'Validate ADF Pipelines'
inputs:
filePath: 'validate-adf-pipelines.ps1'
arguments: '-PipelinePath pipeline -DatasetPath dataset'
pwsh: true
Use the /adf-validate command to run the validation script with proper guidance:
/adf-validate
This command will:
When creating or modifying ADF pipelines:
accountKind for managed identity)Example Validation Response:
β INVALID PIPELINE STRUCTURE DETECTED:
Issue: ForEach activity contains another ForEach activity
Location: Pipeline "PL_DataProcessing" β ForEach "OuterLoop" β ForEach "InnerLoop"
This violates Azure Data Factory nesting rules:
- ForEach activities support only a SINGLE level of nesting
- You CANNOT nest ForEach within ForEach
β
RECOMMENDED SOLUTION:
Use the Execute Pipeline pattern:
1. Create a child pipeline with the inner ForEach logic
2. Replace the inner ForEach with an Execute Pipeline activity
3. Pass required parameters to the child pipeline
Would you like me to generate the refactored pipeline structure?
Official Microsoft Learn Resources:
Last Updated: 2025-01-24 (Based on official Microsoft documentation)
This validation rules skill MUST be consulted before creating or modifying ANY Azure Data Factory pipeline to ensure compliance with platform limitations and best practices.