Manipulating text data in Excel

One of the things that we frequently experience with our clients are users that are a little overwhelmed when they need to adjust text strings to get information either restructured, parsed differently or split up into different columns.  The classical example is when someone has downloaded a list of names from an HR database and the name is imported as “Last Name, First Name” and the user wants it “First Name Last Name”, sometimes in the same cell (sometimes it needs to be parsed into two columns).  This happens with all kinds of data that is downloaded from accounting systems, CRM systems, HR systems, etc.   As long as the data is delimited, such as with a coma, asterisk, blank space, colon, whatever, you can use Excel’s text functions to get the data manipulated the way you want it.

There are five different functions that you should familiarize yourself with–LEFT(), RIGHT(), FIND() LEN() and MID().  Here is how each of them work:

LEFT() – This has only two possible parameters: LEFT(source, # characters).  The source is the text cell to be parsed, and the # characters are the number of characters you want returned beginning from the left most character.  For example, at cell D10 you have the word MOUSE.  The following statement =LEFT(D10,4) would return MOUS, which is the left most four characters.

RIGHT() – This works exactly the same as the LEFT(), just beginning from the opposite side of the text string.  So RIGHT(D10,2), using the same example as above would return SE (the right-most two characters).

FIND() – Like the previous two functions, this function only requires two parameters: FIND(character(s) to be found, source string).  The first parameter needs to be enclosed with double quotes and represents the string to be found.  The second parameter is the cell address of the text string to be searched.  What FIND returns is the position number within our string of the first occurance of the character(s) being sought.  Though this function only requires two parameters, there is a third parameter that is optional, which we will discribe once we have gone through the basics.  So in from our example above, FIND(“O”,D10) would return the number 2.  The letter “O” in MOUSE, is the second character in our simple character string (or text string). 

LEN() – Using the LEN() function just returns the length of the string at that cell.  If we have “The quick brown fox jumped over the lazy dog” in cell B2, then =LEN(B2) will return 44 which is the number of characters in our text string. 

MID() – The MID function has three required parameters: MID(source,starting position, number of characters).  So this function will extract a number of characters from within a text string.  Using the same example, MID(D10,2,1) would return O, which is the second character in our MOUSE string.  We asked the function to start at the second character, which is “O”, with the “2″ at the second parameter, and asked for one character with the “1″ at the third parameter.  If we had MID(D10,2,2), the function would return OU, MID(D10,2,3) would return OUS and so on and so forth. 

So how do all these pieces play together you may wonder.  Well lets start with our original example, you have received a list from your IT department with all the employee names and it is formated with Last Name, First name and you want to parse the names into separate columns (maybe your are going to use the data in a mail merge application later and want the first name in the saluation or whatever the requirement).  So let’s assume that the Last Name, First Name data is in column D, beginning at row 10 and we want the last name in column E and first name column F.  So just call IT and ask them to run another query (just kidding).  You will use the comma as the delimiter.  Typically when names are provided this way, such as “Dean, Jimmy” there is also a blank space to contend with after the comma.  Here we go:

Step 1: Parse out the last name — Using the MID and FIND functions together, we just need to know where the comma is in our string and then subtract one from the number returned from FIND, and that is the number of characters needed in the last name.  The last name starts at the first character so we only have to provide the number “1″ as the correct start location.  With our example text at D10, we want the last name at E10, which would be:

=MID(D10,1,FIND(“,”,D10)-1)

Breaking that down, assuming our first text string is Dean, Jimmy at D10, we tell the MID function to start extracting at the first character, which is the “D” in Dean.  The FIND function is going to look for the first occurance of a comma in the string and return the numeric position in the string, which would be then number 5 (“D” is one, “e” is two, “a” is three, “n” is four and “,” is five).  You will note that we subtracted 1 from the FIND result with FIND(“,”,D10)-1 which is going to return one less than the string position where the comma is located or the number four.  Excel simplifies our MID function to:

MID(D10,1,4) = Dean

Voila, we now have the last name parsed into column E.  Now in order to get the first name into column F, we need to introduce one more function, which is LEN().  The LEN() requires only one parameter, which is the string address.  Our example string–Dean, Jimmy–is eleven characters long (do not forget the blank space after the comma); therefore LEN(D10) would return 10.  Now we have everything we need to parse out the first name.

At F10, we will add the following function =MID(D10,FIND(“,”,D10)+2,LEN(D10).  I know that looks ugly, but hang with me for a minute.  Again, the FIND(“,”,D10) is going to return 5–the position of the comma–and then we need to add 2 to that position to get to our starting point for the string extraction, which is character string number seven (“D” is one, “e” is two, “a” is three, “n” is four, ”,” is five, ” ” or blank space is six, and “J” is seven, which is where we want to start).  The LEN(10) is going to return the total length of our string, which is okay.  We do not have to have the precise number of characters to the end of our string.  We know that the total lenght will get us to the end of the string.  Excel simplifies our second function to:

MID(D10,7,11) = Jimmy

Now that we have the last name and first name parsed into separate columns, you just need to copy and paste the data down through all the rows and your entire list parsed.

A little more complex example:

Now that we are through the basics, you should be aware that you can nest these funtions together for more complex strings.  For example, when can receive a report from our ERP system that includes the assembled part number, a lot number, and the quantity produced.  In this example, the ERP system uses “*” to delimit the three pieces of information, which would look like 7053312-007*154789513*376 for material number 7053312-007, lot number 154789513, with a finish quantity of 376.  Since we are in FP&A, we are just interested in the part number of quantity completed (the Quality department can worry about the lot number).  Our string starts at D10 again and we want the material numbers starting at E10 and the quantity at F10.  In our example, the material number and lot numbers can vary in length, so we will need a little logic to parse our the quantity correctly.

The material number is just like when we parsed “Dean, Jimmy” since we can start at the first character, just our delimiter is now “*” not “,”.  Just use the mid and find functions together to parse out the material number as follows:

MID(D10,1,FIND(“*”,D10)-1) = 7053312-007

Getting to the quantity is a little more complicated.  The FIND function actually has an optional third parameter that have not yet talked about.  The optional third parameter tells the FIND function where to start looking for our FIND.  If it is omitted, then the FIND function begins looking at the first character in our string.  Since we are looking for the second “*”, which is where the quantity starts, our FIND needs to start just past the first “*” inorder to find the second “*” correctly.  In order to do that, we are going to use the third option and actually nest togther two find function.  The formula we will need at F10 for the quantity will be MID(D10,FIND(“*”,D10,FIND(“*”,D10)+1)+1,LEN(D10)), whew! 

The only part that should be new to you is the nested FIND, which is “FIND(“*”,D10,FIND(“*”,D10)+1)”.  Excel will execute the “FIND(“*”,D10)+1″ part first, which is the third parameter.  Remember our string is “7053312-007*154789513*376″ so the “FIND(“*”,D10)+1″ will return thirteen (“7″ is one, “0″ is two, “5″ is three, “3″ is four, “3″ is five, “1″ is six, “2″ is seven, “-” is eight, “0″ is nine, “0″ is ten, “7″ is eleven and the “*” is twelve, plus one is thirteen.  The thirteenth position in our string is where we want to start the search for the second “*”.  Our formula at F10 has now been simplified to MID(D10,FIND(“*”,D10,13)+1,LEN(D10)).  With the third parameter equal to thirteen, we start searching there for the second “*”, which returns twenty-two (“1″ is thirteen, “5″ is fourteen, etc. to the second “*” which is twenty-two, plus one is twenty-three.  The twenty-third position is where our quantity starts.  Again, by just supplying the total length of the string in LEN(D10) we do not have to be concerned with having the precise length of the substring required.  So the remainder simplies to:

MID(D10,23,25) = 376

Hope you found that helpful. If you have a minute give us some feedback about improving the text parsing explanation.

Resource Planning Solutions

www.rpscgi.com

info@rpscgi.com

Advertisement

2 Responses to “Manipulating text data in Excel”

  1. Leonard Says:

    Thanks for the useful information. I stumbled onto this and will pass the web address to other EXCEL users.

    • Chase Morrison Says:

      Thank you for the feedback. FYI — In about two weeks or so we are going to be releasing a Tip sheet for Excel users. It is going to contain a bunch of useful accounting & financial tips for users looking to bring more capability to their Excel workbooks. If you like we could send you an early addition. Just let us know.

      Regards,
      Chase Morrison
      info@rpscgi.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.