Demos
How to extract e-mail addresses from websites
25min
scenario from different websites you will extract all email addresses using a third party web scraping api tool the extracted information will be moved to a spreadsheet software in this case βsmartsheetβ let's build this process! in this scenario we have three elements third party api webscraping tool https //app scrapingbee com/ https //app scrapingbee com/ third party spreadsheet software https //app smartsheet com/ https //app smartsheet com/ postman procesio https //procesio app/ https //procesio app/ scrapingbee using a free account to obtain your api key step 1 go to https //app scrapingbee com/ https //app scrapingbee com/ , log into your account and from your dashboard make sure you are able to obtain your api key this api key will be used later when creating an rest api credential in procesio smartsheet step 1 go to https https //app smartsheet com/ , and log in from the home screen click on create step 2 select the sheet type to be of type grid, once selected name the sheet and hit ok step 3 in smartsheet a sheet of type grid will open in this sheet we will only keep the first two columns (remove columns 3 6, by selecting them, right click and hitting delete column) step 4 obtain your api key from smartsheet by clicking account > personal settings > api access > generate new access token once api key generated, keep it at hand for later use postman we will use postman in order to obtain the column ids of the smartsheet, spreadsheet software in which we will write the extracted e mail addresses from the websites step 1 open postman, by using smartsheet api documentation we will add the following get request in postman end point https //api smartsheet com/2 0/sheets/{sheetid} request method get body "{"version" 1}" body format application/json headers name direction type value authorization in string the bearer token you kept at hand from smartsheet step 4,(eg bearer ) the sheet id can be obtained from smartsheet by navigating to browse > sheets > right click on the sheet > properties step 2 from the response obtain the column ids for the two columns in which you will write the data highlighted with red in below code box keep this two ids at hand as we will use them in procesio { "id" , "name" "procesio scrapping", "version" 85, "totalrowcount" 1, "accesslevel" "owner", "effectiveattachmentoptions" \[ "link", "egnyte", "dropbox", "evernote", "file", "onedrive", "google drive", "box com" ], "ganttenabled" false, "dependenciesenabled" false, "resourcemanagementenabled" false, "resourcemanagementtype" "none", "cellimageuploadenabled" true, "usersettings" { "criticalpathenabled" false, "displaysummarytasks" true }, "userpermissions" { "summarypermissions" "admin" }, "hassummaryfields" false, "permalink" "https //app smartsheet com/sheets/mw7mm3vxpmhjxcxc3pcchpmwh7xjgprvvrrvq5v1", "createdat" "2021 12 08t09 50 32z", "modifiedat" "2021 12 17t09 54 35z", "ismultipicklistenabled" true, "columns" \[ { "id" 8570585025406852, "version" 0, "index" 0, "title" "primary column", "type" "text number", "primary" true, "validation" false, "width" 150 }, { "id" 407810700797828, "version" 0, "index" 1, "title" "email", "type" "text number", "validation" false, "width" 220 } ], "rows" \[ { "id" 7053594140272516, "rownumber" 1, "expanded" true, "createdat" "2021 12 17t09 54 35z", "modifiedat" "2022 01 06t16 57 07z", "cells" \[ { "columnid" 8570585025406852 }, { "columnid" 407810700797828 } ] } ] } procesio (create process that extracts e mail addresses) there are multiple actions involved in extracting e mails from a website the complete process should look like this step 1 from within procesio go to "credential manager", press the add new button and use these values name scrapingbee credential type rest api configuration step 2 press the next step button and use these values url https //app scrapingbee com method get test endpoint /api/v1 authentication method api key authentication key api key value your api key obtained at scrapingbee step 1 header quer y para y para meters step 3 press the save button step 4 from within procesio, "credentials manager, press the add new button and use these values name smartsheet credential type rest api configuration url https //api smartsheet com https //api smartsheet com method get test endpoint /2 0/sheets authentication method api key authentication key authorization value bearer your api key obtained at smartsheet step 4 header query parameters please note that, for the smartsheet api, the field value should always contain bearer, followed by a space before the actual value of the key step 5 add a foreach action, name it's node foreach site configure the action by (select action) configure field for each item , by typing iterator (this will automatically add a variable of type string to named iterator to the field configure field in list by pressing add variable button > create new variable, urllist once the variable is created select it to this field variable name variable type list value default value set as urllist string β \["https //www realitatea net/contact","https //procesio com"] input the url default value, contains the list of websites through which we, will iterate for extracting e mail addresses configure field action timeout , by typing 600 (this impies that after 600 seconds, the process will timeout) step 6 inside the foreach frame previosly named "foreach site", add a call api action and in the side panel set the node name to scrapingbee step 7 in select api configuration select scrapingbee step 8 press the configure request button and add these values verb get endpoint /api/v1 query params key value url iterator form configure request, for the status output field, add a variable named stsout of type integer, by clicking the add variable button > create new variable once the variable is created select it to this field form configure request, for the body output field, add a variable named htmlobjectout of type object, by clicking the add variable button > create new variable once the variable is created select it to this field variable name variable type single value set as stsout integer β output htmlobjectout object β output please note that the key field is case sensitive and type url (in no caps) and in the value field add the variable iterator, from the + icon step 9 inside the foreach frame previously named "foreach site" , add a map process data action configure the action by (select action) adding in the right input field, the variable htmlobjectout adding in the left input field a new variable named htmlstring by hitting the add variable button > create new variable once the variable is created select it to this field variable name variable type single value set as htmlstring string β output step 10 inside the foreach frame previously named "foreach site" , add a regex extract action (contained in platform actions under the beta folder) configure the action by adding in the field string to extract from, the variable htmlstring adding to the field regex expression (\[ \\ 0 9a za z ]+@(\[\\ 0 9a za z]+\\ )+\[a za z]{2,6}) adding to the field lists of results a new variable named emaillist by hitting the add variable button > create new variable once the variable is created select it to this field variable name variable type list value set as emaillist string β output step 11 inside the foreach frame previously named "foreach site" , add a concatenate lists action (contained in platform actions under the beta folder) configure the action by adding in the field first list, a new variable named masteremaillist by hitting the add variable button > create new variable once the variable is created select it to this field variable name variable type default value list value set as masteremaillist string \[] β output adding in the field second list, the variable emaillist adding in the field concatenated list, the variable masteremaillist step 12 outside the foreach, on the canvas, add a deduplicate action (contained in platform actions under the beta folder) configure the action by (select action) adding in the field list to process, the variable masteremaillist adding to the field duplicate identification json configuration , {"matchtype" "exact"} adding to the field deduplicated list, the variable masteremaillist step 13 add a second foreach action on canvas configure the action by (select action) configure field for each item , by typing iterator2 (this will automatically add a variable of type string to named iterator2 to the field configure field in list by pressing add variable button and adding the variable masteremaillist configure field action timeout , by typing 600 (this impies that after 600 seconds, the process will timeout) step 14 inside the second foreach frame, add an add operation action configure the action by (select action) adding in the field first number , a new variable named increment by hitting the add variable button > create new variable once the variable is created select it to this field variable name variable type default value single value increment integer 0 β adding in the field second number, the value 1 adding in the field result, the variable increment step 14 inside the second foreach frame, add an call api action and in the side panel set the node name to smartsheet step 15 in select api configuration select smartsheet step 16 press the configure request button and add these values verb post endpoint /2 0/sheets/{your sheet id}/rows {your sheet id} was obtained in postman step1 body \[ { "totop"\ true, "cells" \[ { "columnid" your column id, "value" <%increment%> }, { "columnid"\ your column id, "value" "<%iterator2%>" } ] } ] your column id is obtained in postman step 2 (posting below a screenshot of how the body of the call api looks in procesio as observed besides the column id(s), the body contains two variables increment & iterator2 additional hit, for inserting in the body of a call api action, one ca press insert key step 17 make sure that all your actions are connected in a sequence save the process, validate & run step 18 in smartsheet, check for new e mail address entries