Posted tagged ‘ICC’

ICC Lists: Importing existing lists

October 6, 2013

It can be time consuming in IBM Configuration Manager to add value after value to a list. The great news is that there is an easy way to import your lists from system CSV exports or spreadsheets. All we need to do is generate an XML file similar to the following format for all the values and hey presto Bob’s your Uncle!

<valueList>

<value sortIndex=”3″>

<!–[CDATA[value3]]>

<!–[CDATA[This is the third item in the list.]]>

<!–[CDATA[3]]>

</value>

<value sortIndex=”1″>

<!–[CDATA[value0]]>

<!–[CDATA[This is the first item in the list.]]>

<!–[CDATA[0]]>

</value>

<value sortIndex=”2″>

<!–[CDATA[value2]]>

<!–[CDATA[This is the second item in the list.]]>

<!–[CDATA[2]]>

</value>

</valueList>

More details can be found in the IBM Content Collector v3 Infocenter

Below is a short video to demonstrate the steps required.

The power of ICC Lists and Regular Expressions!

October 6, 2013

IBM Content Collector has had the Lists feature from it’s early versions. It provides the ability from a task route to lookup a value in the List specified and on a match then return the corresponding result. This functionality can be applied to a decision point were a rule can be ran against the list or it can be applied to the property of a new document.

Lists

More can be found here in the IBM Content Collector Infocenter.

To add more power to this feature there is the ability to apply regular expressions meaning that your lookup Lists can be dynamic. Rather than listing all the possibilities a single regular expression can be substituted to achieve a more flexible and dynamic lookup.

Below is an example of how to build regular expressions into your List lookup:

– Previously, we saw how to add a dynamic metadata lookup to setting a property value:

dynamiclookup

– If we turn our attention to the Lists section in ICC Configuration Manager

lsits

– We can add new entry as shown

joint

– To explain the regular expression we are reading a string that is being passed in – for example a folder structure in a shared mailbox

^ = signifies the start of the full string

$ = symbolizes the end of the full string

.* = any number and type of characters

\\ = represents a back slash or in our case a new folder

E = End of the fixed string

Out fixed string is “\Joint Mortgage Agreement\”

We can see the return value below as “Joint” – this could be passed to a rule or be used to set a property. The only limitation I have come across is it can’t be passed from setting a property value to a rule – it is either one or the other.

– But what happens if different people have named the folders differently over time?

Eg, Joint Mortgage Application or Joint Application or Joint Mortgage or Joint Applications or Joint Mortgages

Never mind all the types that can occur in older systems that don’t have mandatory fields or drop downs set for Document Categories.

We can make the foregoing more generic to ensure a higher match rate by doing the following:

lists2

– By adding our /* we have catered for all the possible variations of folder or category name. Not full proof but it should catch the majority.

So the key takeaway is that Lists are very powerful but by adding regular expressions into your list lookup values you can add even more flexibility – so don’t think of your list as purely static values (although they can be used for this).

To give an indication of the overhead, I have ran up to 6 list lookups when setting document properties with up to 5000 regex values in the lists with little impact to performance.

Finally, I would like to thank Dan Small from IBM for all his help on this and hopefully I won’t forget all my regular expressions in the next few weeks! 🙂

Making the switch from navigating to searching with help from ICC and Regex.

September 29, 2013

Many companies are still making that transition from shared drives (who has never had a S:\ or H:\drive!?!?!?) to Document Management System (DMS) or full blown Enterprise Content Management (ECM) systems. There are many reasons for making the switch from overloaded hardware to new business demands but a key point in many of these systems is how the end user uses the system.

Lots of shared network drives are a prime example of content chaos with no naming or folder standardization and users left to create their own folders. However, some more well thought out network shares have a semi-structured foldering system with maybe  a base template of a folder structured which is copied and pasted for new projects, claims, matters or applications.

filesystem

 

Whatever the structure or lack of it on a shared file system it is generally a case of users having to browse for the content they are after. What happens if you can’t find that vital document you worked on 3 months ago? You always have the option to search but then you are presented with the dozens if not hundreds of hits the full text search brings back. I think of this type of use case as discovery – users are having to discover what they are looking for rather than being able to pinpoint it straight away. More on this topic here at a previous post: Difference between search and discovery.

With this in mind it is important for our end users to realize that any migration to a new DMS or ECM system demands a different way of working – hopefully a smarter and more efficient way of working. Although, sometimes DMS or ECM system are implemented badly and mimic the folder browsing approach which seems crazy in today’s world with the content explosion. Saying that I am sure there are cases for the old style folder browsing such as case management solutions that have adhoc document collections.

We have established the source system disadvantages, the benefits are new target system will bring and determined that we have a semi-structured foldering system which could be used to place some categorization and property values to our content in the new system. Up steps IBM Content Collector (ICC)!

I am no expert with ICC but I love it’s module design and flexibility it provides for ingesting content from a variety of sources to a repository. You don’t need to be a programming genius to achieve some great results but how do we determine index information based on folder names in document file paths? In short we are looking for patterns in a string and what better way that using Regular Expressions….groan I hear you sigh! I was never a fan of Regular Expressions mainly because it looked like hieroglyphics however after spending sometime on a number of projects and getting into the weeds I have changed my mind and realize how powerful they can be. Saying that I will likely forget everything I have learnt in a couple of months.

Below is a screenshot of how to build Regular Expressions into your ICC Task Route. I haven’t detailed the Regular Expressions used as that is a topic all on its own but will post again on typical expressions and how they can be combined with ICC Lists to provide some powerful lookups.

regex

What HASH do you prefer?

September 29, 2013

De-duplication of files is a common function of ECM systems but how does it work?

You can have two files that have exactly the same content but potentially different file names yet systems are able to determine that these are duplicates and to act appropriately. In many cases we don’t want the same content duplicated as it doesn’t lend to effective storage management. In the email world we can even utilize the compound model which splits the email from the file attachment and de-duplication can happen at both levels – on the email and on the file.

The technique used to make these comparisons is known as cryptographic hash algorithms or ‘hashing’. There are two main types of hash algorithmic:

1. MD5 – has been available for many years and hence is wide spread in the industry today. It is frequently used for checking data’s integrity similar to our de-duplication discussion. The one flaw that MD5 has in today’s world is that it isn’t as secure (128 bit) as the more recent standards due to a flaw being discovered in the algorithm.

2. SHA – SHA1 was the original hash function design by the National Security Agency which was more secure (160 bit) than MD5. It was consequently updated to create SHA2 and more recently SHA3.

The general guideline when it comes to hash keys is to use SHA2 since it is the most secure. This does apply to security focused use cases such as saving a password but the reality for many systems focusing on de-duplication is to use the original MD5 hash algorithm.

ICC: Script Connector Implementation Guide

March 2, 2013

The script connector is a great addition to the already flexible ICC. The following guide which has now been published will help with any PoC or onsite implementations demanding that extra level of integration especially outside the IBM ECM suite.

IBM Content Collector Script Connector Implementation Guide

One of the key features which I have already been looking at is the ability to call web services and query databases.

File System to Document Management System

January 11, 2013

It is common to be asked to demonstrate how a company make the transition from a clunky file system to a enterprise worthy Document Management System, DMS.

On a number of occasions I have seen clients use folder names to describe their contents. When taking existing data from the file system and loading this into an ECM repository I have used IBM Content Collector, ICC. The file system connector in ICC allows regular expressions to be used to extract details of the files and their locations and then assign this to properties/attributes in the repository.

The following link has proved invaluable for this in extracting the folder names for a lengthy

http://pic.dhe.ibm.com/infocenter/email/v3r0m0/index.jsp?topic=%2Fcom.ibm.content.collector.doc%2Fexpression_editor%2Fr_afu_regular_expression_samples.htm

Here is an example to help clarify: Thanks for Dan Small for this help on this.

regex

IBM Content Collector for File Systems enhancements

October 17, 2012

I have worked a lot with IBM Content Collector, ICC over the past few years and love the product. However, one area that could be improved is the File System collector. There are a number of features that is lacks to allow it to compete with the other file system archiving solutions on the market. These features all revolve around Windows reparse points.

What are Windows reparse points:?

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365503(v=vs.85).aspx

Now we know what reparse points are here are my top 3 list of the features that are missing:

  1. File system stub is a URL link and does not launch the original documents in it’s native application
  2. The file name of the stub changes hence breaking any shortcut links that may have been linked
  3. Finally, thumbnails are not supported so once the file is archived the thumbnail is lost

These are just suggestions to enhance the product although I believe the use case for ICC for file systems isn’t for true archiving but for bulk load of data into an ECM repository. IBM’s true file system archiving solution would be Tivoli HSM which does support windows reparse points.

IoD2011: ICC Product Update – Dana Morris

October 25, 2011

Looking forward to the roadmap details but some general snippets of interest first:

New seamless retrieval discussed – see my previous post:

http://jamesjallen.me/2011/06/30/icc-transparent-retrieval/

Sharepoint Best Practices – new ICC deployment approach to SP front end server.

Big focus on SP – improved stubs, integration with metadata.

Delete in P8 – clear up stubs in SP now

Maintain original dates and icon for filesystems – previously the modified dates changed.

ICC 2.2 FP1 support for:

> P8 5.0 with Content Search Services

> Support for SQL server 2008 R2

> Option to disable icon changes in LN email stubbing

Roadmap:

ICC 2.2 FP2: support for p8 5.1 with CSS, IE9, Improved SP columns eg, calculated / multivalue columns, CM8 Text Indexer support for CSLD legacy itemtypes

2012 Q2 – release planned, Connectors to IBM Connections, Expand SP Support – seamless checkin, automatic stub deletion for emails & files, improved historical monitoring, improve performance for mailboxes

2013 2nd half – might be the next release

Questions:

Around Quickr and ST integration – ST there is a partner (but there is a undocument switch to write ST chats out to local filesystem as text 😉 Quickr – no plans as discussion suggested it would integrate into Connections! WOW 🙂

Single search interface – plans to merge EDM into Outlook and Notes local search.

P8 5.1 Content Search Services: IBM want to own support for indexing as in past issues occur and they rely on Verity which they cannot influence the turn around. CSS built on text indexing engine Apache Lucene (Used in dozens of IBM products). Benefits – larger index sizez, monitoring, thru’ put, scaling all better.

A new email data storage model based on xml files (sounds bit like CM8 NSE indexing )

2.2.0.2 benefits:

> Reduce storage costs

> Reduced load on FileNet P8 CE servers

> Improved indexing throughputs (1.25 – 1.5 times performance / multi lingual)

> Improved monitoring.

 

ICC Fix Pack 4 is out

August 8, 2011

ICC2.1.1 FP4 is now available on FixCentral.

http://www-933.ibm.com/support/fixcentral/

Mixture of Domino and Exchange APAR fixes as well as some for the FileNet repository. Full list here:

https://www-304.ibm.com/support/docview.wss?uid=swg24030471

ICC & Transparent Retrieval

June 30, 2011

Transparent retrieval is a new feature delivered in ICC 2.2 which I had totally overlooked until now.
In short this functionality means a users can open an archive email stub in their mailfile and as the stub is opening it recalls the complete email from your archive repository to display it fully on screen. To the end user there is no difference to opening a normal email! Pretty cool and something I know a lot of clients ask for.

So were do I get this fantastic new option:

 

So is there a down side? Well sort of. Similar to the offline repository functionality in the Outlook Extension a small client program needs to be installed on Lotus Notes clients to enable this functionality. This shouldn’t be a problem to those organisations that have well establish packing and deployment procedures. In the long run definitely a feature I would consider enabling.