Posted tagged ‘filesystems’

Making the switch from navigating to searching with help from ICC and Regex.

September 29, 2013

Many companies are still making that transition from shared drives (who has never had a S:\ or H:\drive!?!?!?) to Document Management System (DMS) or full blown Enterprise Content Management (ECM) systems. There are many reasons for making the switch from overloaded hardware to new business demands but a key point in many of these systems is how the end user uses the system.

Lots of shared network drives are a prime example of content chaos with no naming or folder standardization and users left to create their own folders. However, some more well thought out network shares have a semi-structured foldering system with maybe  a base template of a folder structured which is copied and pasted for new projects, claims, matters or applications.

filesystem

 

Whatever the structure or lack of it on a shared file system it is generally a case of users having to browse for the content they are after. What happens if you can’t find that vital document you worked on 3 months ago? You always have the option to search but then you are presented with the dozens if not hundreds of hits the full text search brings back. I think of this type of use case as discovery – users are having to discover what they are looking for rather than being able to pinpoint it straight away. More on this topic here at a previous post: Difference between search and discovery.

With this in mind it is important for our end users to realize that any migration to a new DMS or ECM system demands a different way of working – hopefully a smarter and more efficient way of working. Although, sometimes DMS or ECM system are implemented badly and mimic the folder browsing approach which seems crazy in today’s world with the content explosion. Saying that I am sure there are cases for the old style folder browsing such as case management solutions that have adhoc document collections.

We have established the source system disadvantages, the benefits are new target system will bring and determined that we have a semi-structured foldering system which could be used to place some categorization and property values to our content in the new system. Up steps IBM Content Collector (ICC)!

I am no expert with ICC but I love it’s module design and flexibility it provides for ingesting content from a variety of sources to a repository. You don’t need to be a programming genius to achieve some great results but how do we determine index information based on folder names in document file paths? In short we are looking for patterns in a string and what better way that using Regular Expressions….groan I hear you sigh! I was never a fan of Regular Expressions mainly because it looked like hieroglyphics however after spending sometime on a number of projects and getting into the weeds I have changed my mind and realize how powerful they can be. Saying that I will likely forget everything I have learnt in a couple of months.

Below is a screenshot of how to build Regular Expressions into your ICC Task Route. I haven’t detailed the Regular Expressions used as that is a topic all on its own but will post again on typical expressions and how they can be combined with ICC Lists to provide some powerful lookups.

regex

IBM Content Collector for File Systems enhancements

October 17, 2012

I have worked a lot with IBM Content Collector, ICC over the past few years and love the product. However, one area that could be improved is the File System collector. There are a number of features that is lacks to allow it to compete with the other file system archiving solutions on the market. These features all revolve around Windows reparse points.

What are Windows reparse points:?

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365503(v=vs.85).aspx

Now we know what reparse points are here are my top 3 list of the features that are missing:

  1. File system stub is a URL link and does not launch the original documents in it’s native application
  2. The file name of the stub changes hence breaking any shortcut links that may have been linked
  3. Finally, thumbnails are not supported so once the file is archived the thumbnail is lost

These are just suggestions to enhance the product although I believe the use case for ICC for file systems isn’t for true archiving but for bulk load of data into an ECM repository. IBM’s true file system archiving solution would be Tivoli HSM which does support windows reparse points.

IoD2011: ICC Product Update – Dana Morris

October 25, 2011

Looking forward to the roadmap details but some general snippets of interest first:

New seamless retrieval discussed – see my previous post:

http://jamesjallen.me/2011/06/30/icc-transparent-retrieval/

Sharepoint Best Practices – new ICC deployment approach to SP front end server.

Big focus on SP – improved stubs, integration with metadata.

Delete in P8 – clear up stubs in SP now

Maintain original dates and icon for filesystems – previously the modified dates changed.

ICC 2.2 FP1 support for:

> P8 5.0 with Content Search Services

> Support for SQL server 2008 R2

> Option to disable icon changes in LN email stubbing

Roadmap:

ICC 2.2 FP2: support for p8 5.1 with CSS, IE9, Improved SP columns eg, calculated / multivalue columns, CM8 Text Indexer support for CSLD legacy itemtypes

2012 Q2 – release planned, Connectors to IBM Connections, Expand SP Support – seamless checkin, automatic stub deletion for emails & files, improved historical monitoring, improve performance for mailboxes

2013 2nd half – might be the next release

Questions:

Around Quickr and ST integration – ST there is a partner (but there is a undocument switch to write ST chats out to local filesystem as text 😉 Quickr – no plans as discussion suggested it would integrate into Connections! WOW 🙂

Single search interface – plans to merge EDM into Outlook and Notes local search.

P8 5.1 Content Search Services: IBM want to own support for indexing as in past issues occur and they rely on Verity which they cannot influence the turn around. CSS built on text indexing engine Apache Lucene (Used in dozens of IBM products). Benefits – larger index sizez, monitoring, thru’ put, scaling all better.

A new email data storage model based on xml files (sounds bit like CM8 NSE indexing )

2.2.0.2 benefits:

> Reduce storage costs

> Reduced load on FileNet P8 CE servers

> Improved indexing throughputs (1.25 – 1.5 times performance / multi lingual)

> Improved monitoring.