ML
    • Recent
    • Categories
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Using CURL and Screen Scraping To Track Topic Performance in NodeBB

    Scheduled Pinned Locked Moved IT Discussion
    nodebbmangolassicurlbash
    1 Posts 1 Posters 895 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scottalanmillerS
      scottalanmiller
      last edited by

      If you have ever worked with forum screen scraping, it can be a handy way to gather data about different things. One way that you might use this tool is to track something like views on a thread that you have been watching or to track comments or such. Using curl, the standard such tool on Linux and a few simple line REGEX and editing tools like grep and cut we can pretty easily grab this information from a NodeBB site like MangoLassi.

      Here is an example command that will handle the NodeBB redirects from the RESTful interface and trim the output so make it easy to simply return a numerical value from the XML that is parsed.

      curl --location --referer ";auto" --netrc -s -D - http://mangolassi.it/topic/8000 | grep human-readable-number | grep -v topic | cut -d'>' -f2 | cut -d'<' -f1
      

      Because MangoLassi uses URL redirects, you cannot use a vanilla CURL statement for this, but fixing this is easy. Using grep to get to the line that we want is not great because there is no good taxonomy here to refer to the views field, but screen scraping is a quick and dirty business anyway. So this works. A couple of cut commands trim us down to what we are looking for.

      The "8000" provided here is an example. Replace "8000" with the number of the thread in which you are interested. The return of this command is just a number, but it represents the number of views of that specific thread.

      1 Reply Last reply Reply Quote 2
      • 1 / 1
      • First post
        Last post