Posted tagged ‘regex’

The power of ICC Lists and Regular Expressions!

October 6, 2013

IBM Content Collector has had the Lists feature from it’s early versions. It provides the ability from a task route to lookup a value in the List specified and on a match then return the corresponding result. This functionality can be applied to a decision point were a rule can be ran against the list or it can be applied to the property of a new document.

Lists

More can be found here in the IBM Content Collector Infocenter.

To add more power to this feature there is the ability to apply regular expressions meaning that your lookup Lists can be dynamic. Rather than listing all the possibilities a single regular expression can be substituted to achieve a more flexible and dynamic lookup.

Below is an example of how to build regular expressions into your List lookup:

– Previously, we saw how to add a dynamic metadata lookup to setting a property value:

dynamiclookup

– If we turn our attention to the Lists section in ICC Configuration Manager

lsits

– We can add new entry as shown

joint

– To explain the regular expression we are reading a string that is being passed in – for example a folder structure in a shared mailbox

^ = signifies the start of the full string

$ = symbolizes the end of the full string

.* = any number and type of characters

\\ = represents a back slash or in our case a new folder

E = End of the fixed string

Out fixed string is “\Joint Mortgage Agreement\”

We can see the return value below as “Joint” – this could be passed to a rule or be used to set a property. The only limitation I have come across is it can’t be passed from setting a property value to a rule – it is either one or the other.

– But what happens if different people have named the folders differently over time?

Eg, Joint Mortgage Application or Joint Application or Joint Mortgage or Joint Applications or Joint Mortgages

Never mind all the types that can occur in older systems that don’t have mandatory fields or drop downs set for Document Categories.

We can make the foregoing more generic to ensure a higher match rate by doing the following:

lists2

– By adding our /* we have catered for all the possible variations of folder or category name. Not full proof but it should catch the majority.

So the key takeaway is that Lists are very powerful but by adding regular expressions into your list lookup values you can add even more flexibility – so don’t think of your list as purely static values (although they can be used for this).

To give an indication of the overhead, I have ran up to 6 list lookups when setting document properties with up to 5000 regex values in the lists with little impact to performance.

Finally, I would like to thank Dan Small from IBM for all his help on this and hopefully I won’t forget all my regular expressions in the next few weeks! 🙂

Making the switch from navigating to searching with help from ICC and Regex.

September 29, 2013

Many companies are still making that transition from shared drives (who has never had a S:\ or H:\drive!?!?!?) to Document Management System (DMS) or full blown Enterprise Content Management (ECM) systems. There are many reasons for making the switch from overloaded hardware to new business demands but a key point in many of these systems is how the end user uses the system.

Lots of shared network drives are a prime example of content chaos with no naming or folder standardization and users left to create their own folders. However, some more well thought out network shares have a semi-structured foldering system with maybe  a base template of a folder structured which is copied and pasted for new projects, claims, matters or applications.

filesystem

 

Whatever the structure or lack of it on a shared file system it is generally a case of users having to browse for the content they are after. What happens if you can’t find that vital document you worked on 3 months ago? You always have the option to search but then you are presented with the dozens if not hundreds of hits the full text search brings back. I think of this type of use case as discovery – users are having to discover what they are looking for rather than being able to pinpoint it straight away. More on this topic here at a previous post: Difference between search and discovery.

With this in mind it is important for our end users to realize that any migration to a new DMS or ECM system demands a different way of working – hopefully a smarter and more efficient way of working. Although, sometimes DMS or ECM system are implemented badly and mimic the folder browsing approach which seems crazy in today’s world with the content explosion. Saying that I am sure there are cases for the old style folder browsing such as case management solutions that have adhoc document collections.

We have established the source system disadvantages, the benefits are new target system will bring and determined that we have a semi-structured foldering system which could be used to place some categorization and property values to our content in the new system. Up steps IBM Content Collector (ICC)!

I am no expert with ICC but I love it’s module design and flexibility it provides for ingesting content from a variety of sources to a repository. You don’t need to be a programming genius to achieve some great results but how do we determine index information based on folder names in document file paths? In short we are looking for patterns in a string and what better way that using Regular Expressions….groan I hear you sigh! I was never a fan of Regular Expressions mainly because it looked like hieroglyphics however after spending sometime on a number of projects and getting into the weeds I have changed my mind and realize how powerful they can be. Saying that I will likely forget everything I have learnt in a couple of months.

Below is a screenshot of how to build Regular Expressions into your ICC Task Route. I haven’t detailed the Regular Expressions used as that is a topic all on its own but will post again on typical expressions and how they can be combined with ICC Lists to provide some powerful lookups.

regex

File System to Document Management System

January 11, 2013

It is common to be asked to demonstrate how a company make the transition from a clunky file system to a enterprise worthy Document Management System, DMS.

On a number of occasions I have seen clients use folder names to describe their contents. When taking existing data from the file system and loading this into an ECM repository I have used IBM Content Collector, ICC. The file system connector in ICC allows regular expressions to be used to extract details of the files and their locations and then assign this to properties/attributes in the repository.

The following link has proved invaluable for this in extracting the folder names for a lengthy

http://pic.dhe.ibm.com/infocenter/email/v3r0m0/index.jsp?topic=%2Fcom.ibm.content.collector.doc%2Fexpression_editor%2Fr_afu_regular_expression_samples.htm

Here is an example to help clarify: Thanks for Dan Small for this help on this.

regex