iMacros Add-on for Firefox


The iMacros add-on for FireFox can be used to automate the task of data collection when you're doing form searching on the web. (As well as any other tasks you want to perform with the FireFox browser). A site like the New York State Department of State's Corporation and Business Entity Database , only allows users to run a search for one business at a time. If you've got a few hundred businesses to look up this is really a problem. Install iMacros, and then follow these steps.

1. The iMacro add-on should appear as a small icon with yellow cog in the toolbar. If it doesn't go to the menu button (the three bars to the right) select customize . . . add-ons and drag it up. Click on the icon and the controls for iMacro should appear on the left. Begin on the site you want to run searches on.

2. Click the 'Rec' tab and then press 'Record'. You'll see the URL referenced in the macro as it's recording.

3. In this example you set the STATUS TYPE to 'ALL', and then SEARCH TYPE to 'CONTAINS'. Enter the name of the business you're searching for and then copy it. Press 'SEARCH DATABASE'.

4. Go to File . . . Save Page as , paste in the business name, and then save the file.

5. Paste in the URL of the State Department's site and repeat steps 3 and 4 again.

6. Stop the macro, and then click the Manage tab and choose Edit Macro.

7. You should now have a macro that looks like this:

VERSION BUILD=8940826 RECORDER=FX TAB T=1 URL GOTO=http://www.dos.ny.gov/corps/bus_entity_search.html TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_search_type CONTENT=%CONTAINS TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Taco<SP>Bell TAG POS=1 TYPE=DIV ATTR=ID:Content TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_name_type CONTENT=%% TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Taco<SP>Bellc TAG POS=1 TYPE=DIV ATTR=ID:Content TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:form1 ATTR=* TAB T=2 SAVEAS TYPE=CPL FOLDER=* FILE=+_{{!NOW:yyyymmdd_hhnnss}} URL GOTO=http://www.dos.ny.gov/corps/bus_entity_search.html TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_name_type CONTENT=%% TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_search_type CONTENT=%CONTAINS TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Burger<SP>King TAG POS=1 TYPE=P ATTR=TXT:1.<SP>Business<SP>Entity<SP>Name*: TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:form1 ATTR=* TAB T=3 SAVEAS TYPE=CPL FOLDER=* FILE=+_{{!NOW:yyyymmdd_hhnnss}}

Now we just need to modify this macro so we can run searches for many more businesses. This is accomplished by copying the macro text into Excel and then copying & pasting the portion for the second part several times, stopping to re-copy the original and the new copies several times until you quickly accumulate a large amount on the clipboard, and can paste a lot of code quickly.

9. Go to your list of business names and put seven lines between them. Paste them in the second column in Excel so they line up with the rows ending CONTENT=Burger<SP>King. Remove Burger<SP>King, and then make sure that any spaces in your list are replaced with <SP> .

10. Find the lines which read:

SAVEAS TYPE=CPL FOLDER=* FILE=+_{{!NOW:yyyymmdd_hhnnss}}

. . . and replace them with;

SAVEAS TYPE=HTM FOLDER=C:\fastfood FILE=+{{!NOW:ddmmyyyyhhnnss}}

Be sure that you have the specified subfolder created on your C drive.

11. Now you want to move the references to the tab numbers so they come after SAVEAS TYPE and before URL GOTO, and make each tab reference to T=1. [The NYS Department of State automatically puts the search results on the second tab.]

12. Copy the code from Excel into a text editor, whether it be NotePad or NotePad ++ , and remove the tabs so the data from Column B comes together with that from Column A.

13. Copy the code into the iMacros Editor and then Save & Close.

14. You should end up with a macro that looks like this:

VERSION BUILD=8940826 RECORDER=FX TAB T=1 URL GOTO=http://www.dos.ny.gov/corps/bus_entity_search.html TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_search_type CONTENT=%CONTAINS TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Taco<SP>Bell TAG POS=1 TYPE=DIV ATTR=ID:Content TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_name_type CONTENT=%% TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Taco<SP>Bell TAG POS=1 TYPE=DIV ATTR=ID:Content TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:form1 ATTR=* SAVEAS TYPE=HTM FOLDER=C:\fastfood FILE=+{{!NOW:ddmmyyyyhhnnss}} TAB CLOSE TAB T=1 URL GOTO=http://www.dos.ny.gov/corps/bus_entity_search.html TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_name_type CONTENT=%% TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_search_type CONTENT=%CONTAINS TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Burger<SP>King TAG POS=1 TYPE=P ATTR=TXT:1.<SP>Business<SP>Entity<SP>Name*: TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:form1 ATTR=* SAVEAS TYPE=HTM FOLDER=C:\fastfood FILE=+{{!NOW:ddmmyyyyhhnnss}} TAB CLOSE TAB T=1 URL GOTO=http://www.dos.ny.gov/corps/bus_entity_search.html TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_name_type CONTENT=%% TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_search_type CONTENT=%CONTAINS TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Mcdonald's TAG POS=1 TYPE=P ATTR=TXT:1.<SP>Business<SP>Entity<SP>Name*: TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:form1 ATTR=* SAVEAS TYPE=HTM FOLDER=C:\fastfood FILE=+{{!NOW:ddmmyyyyhhnnss}} TAB CLOSE TAB T=1 URL GOTO=http://www.dos.ny.gov/corps/bus_entity_search.html TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_name_type CONTENT=%% TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_search_type CONTENT=%CONTAINS TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Wendy's TAG POS=1 TYPE=P ATTR=TXT:1.<SP>Business<SP>Entity<SP>Name*: TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:form1 ATTR=* SAVEAS TYPE=HTM FOLDER=C:\fastfood FILE=+{{!NOW:ddmmyyyyhhnnss}} TAB CLOSE TAB T=1 URL GOTO=http://www.dos.ny.gov/corps/bus_entity_search.html TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_name_type CONTENT=%% TAG POS=1 TYPE=SELECT FORM=NAME:form1 ATTR=ID:p_search_type CONTENT=%CONTAINS TAG POS=1 TYPE=INPUT:TEXT FORM=NAME:form1 ATTR=ID:p_entity_name CONTENT=Smashburger TAG POS=1 TYPE=P ATTR=TXT:1.<SP>Business<SP>Entity<SP>Name*: TAG POS=1 TYPE=INPUT:SUBMIT FORM=NAME:form1 ATTR=* SAVEAS TYPE=HTM FOLDER=C:\fastfood FILE=+{{!NOW:ddmmyyyyhhnnss}}

15. When the macro is run, it will save htm files of each search results page at C:\fastfood. Use Bulk Rename Utility to change the extensions to .txt.

16. Combine the text files using the command explained on this site. Just CTRL + SHIFT in the fastfood subfolder, and in the command prompt enter: for %f in (*.txt) do type "%f" >> output.txt

17. Now you've a got text file with the URLs for lots of business entities. You can extract these using and Excel and then convert them to text files for further analysis, as shown in last night's tip. More on this tomorrow night . . .