Tuesday, May 17, 2011

Retrieving SSIS Error Column Names

I recently coded a data load for a client and they requested that I include the column names on which the package errored.  The solution I came up with consisted of using the lineageids from the package’s XML.  I’ll take you through the code I used.
The first thing I did is load the document and namespace where package path is the location of the package.
doc.Load(Variables.PackagePath);

XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("DTS", "www.microsoft.com/SqlServer/Dts");


Then I retrieved each node with a lineageid and name using an xpath query.

foreach (XmlNode childnode in doc.SelectNodes("//*[@lineageId != '' and @name != '']"))


I then pulled out the TaskName, ColumnName, and LineageID.  I included the TaskName because LineageIDs are not unique within a package.


XmlNode ExecutableNode = childnode.SelectSingleNode("ancestor::DTS:Executable[1]", nsmgr);
TaskName = ExecutableNode.SelectSingleNode("DTS:Property[@DTS:Name='ObjectName']", nsmgr).InnerText;
ColumnName = childnode.Attributes["name"].Value;
LineageID = Convert.ToInt32(childnode.Attributes["lineageId"].Value);


I then further deduped within the task by only inserting unique taskname, columnname, and lineageid pairs using a hashtable


DistinctColumnKey =LineageID+ColumnName+TaskName;
if (!DistinctColumn.ContainsKey(DistinctColumnKey))
{
DistinctColumn.Add(DistinctColumnKey,DBNull.Value);
ColumnNamesBuffer.AddRow();
ColumnNamesBuffer.LineageID = LineageID;
ColumnNamesBuffer.ColumnName = ColumnName;
ColumnNamesBuffer.TaskName = TaskName;
}


I usually insert the column names into a cache connection manager and then use throughout the package.  Here’s the entire script.


public override void CreateNewOutputRows()
{
Int32 LineageID;
String ColumnName;
String TaskName;
XmlDocument doc = new XmlDocument();
Hashtable DistinctColumn = new Hashtable();
String DistinctColumnKey;

doc.Load(Variables.PackagePath);

XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("DTS", "www.microsoft.com/SqlServer/Dts");

foreach (XmlNode childnode in doc.SelectNodes("//*[@lineageId != '' and @name != '']"))
{


XmlNode ExecutableNode = childnode.SelectSingleNode("ancestor::DTS:Executable[1]", nsmgr);
TaskName = ExecutableNode.SelectSingleNode("DTS:Property[@DTS:Name='ObjectName']", nsmgr).InnerText;
ColumnName = childnode.Attributes["name"].Value;
LineageID = Convert.ToInt32(childnode.Attributes["lineageId"].Value);
DistinctColumnKey =LineageID+ColumnName+TaskName;
if (!DistinctColumn.ContainsKey(DistinctColumnKey))
{
DistinctColumn.Add(DistinctColumnKey,DBNull.Value);
ColumnNamesBuffer.AddRow();
ColumnNamesBuffer.LineageID = LineageID;
ColumnNamesBuffer.ColumnName = ColumnName;
ColumnNamesBuffer.TaskName = TaskName;
}



}


}





3 comments:

  1. Hi, I think this is what I'm looking for, but too new to know! Is this a script task or script component? Whichever it is, where does it go? What is a cache connection manager? How/where is that set up? Once it is loaded with this data, how do I get it out? Thank you.

    ReplyDelete
  2. Hi Angelia ,
    Can you pls share the package example .

    Regards,
    Val

    ReplyDelete
  3. ColumnNamesBuffer doesnot contain in context...what is this error? plz tell

    ReplyDelete