What do you search for?

Cultivate

A lot of shortlinks and redirects

A lot of my searches return shortlinks and redirects. Many results redirect because you’ve seemingly indexed old links and sites have since moved domain or changed URL structures. Even more redirect from HTTP to HTTPS because findx doesn’t seem to follow redirects and update your index accordingly. (Maybe the index has no concept of a canonical link?)

I’m also seeing a lot of shortlinks such as j.mp, tinyurl.com, t.co, and eepurl.com that never held any individual pages and have always just served as redirects.

This is a bad experience. The URL shown underneath the search result isn’t the actual URL of the destination page. It’s harder to verify which publication/website you’re about to visit.

Comments

  • We certainly do follow redirects and update the index based on that - and we do support canonical too.

    I found no results from t.co, around 3000 from j.mp, 2500 from tinyurl but quite a few from eepurl, which we had not blocked (thanks for reporting it).

    We are trying to keep our index clean by blocking sites like url shorteners, proxies etc. as I 100% agree that it gives a bad user experience. If you find others that we don't block yet, please do report them!

    Thanks.


    Best Regards,

    Brian Rasmusson
    Founder and CEO
    Privacore ApS

    Ahlgade 21, 1 - DK4300 Holbaek - Denmark
    br@privacore.com - https://www.privacore.com
    Office: +45 7199 3134 - Mobile: +45 3161 6263

    https://www.findx.com - keep searching, in private
  • There are also tons of duplicated links for www. and non-www. variants, as well as with and without a trailing forward slash E.g. example.com/article/2/ and then immediately below it the next result will be example.com/article/2 (no trailing slash).

    I’ve also seen tons of these: https://www.findx.com/search?q=FastMail

    The first result is for their canonical domain (.com), followed by an old domain (.fm) that their business haven’t used for years. The old does a 301 redirect to the new domain.


    Here are two old domains that I’ve used for my own blog’s domain name. The first has redirected to the second and later the current domain for more than 1.5 years, and both redirect to the current domain (ctrl.blog) for the last 7 months using standard HTTP 301 Moved Permanently redirects. The oldest domain was probably in use before you started indexing the web (?). Notably, even Google includes these old domains in their index. Not sure what is up with that. Microsoft Bing, however, knows their stuff and don’t include any of the old domains and have indexed everything under their new URL.
  • Here is another kind of example of duplicated content that redirect to the same location:

    https://www.findx.com/search?q="Recover mistyped traffic: Redirect .htm to .html

    The two first URLs point to the same page, but none of the results are the actual canonical link (as indicated by the final destination of the redirects and by rel="canonical" link plus the website’s sitemap).
  • Thank you very much for the examples.

    It is a "feature" of the original Gigablast code, on which we have built Findx. When it sees a 301 redirect, which is not a simple redirect like www to non-www, it stores the redirected-to page content under the old URL.

    We have discussed it internally before, and haven't found a good reason why it would have been implemented like that. It is something we are going to fix very soon.

    Regards,
    Brian
  • no1no1
    edited August 2017
    Gigablast’s implementation only makes sense for 302, 303, and 307s. 301 and 308s shouldn’t behave like that. Maybe their implementation can’t differentiate between temporary and permanent redirects?
  • no1no1
    edited December 2017
    My overall impression of Findx is that I see more redirects ranking high than actual end-destinations. Many websites have switched to HTTPS in the last year, and Findx sends users to their old HTTP pages first — which requires additional redirects and slows down page loading time. Somewhere around 25 % of all top one million (Alexa ranked) websites have switched to HTTPS just in the last 12 months, so it affects a huge number of websites. They all redirect to the new location with permanent redirect codes. Permanent redirects are supposed to be permanent.

    Plus, of course the issues with old domains and link structures as discussed earlier persist. But these are all just the same issue.
  • We're on it, fixing the 301 issue as I write this.

    If you still see a lot of URL shortener pages, please let me know as I thought we had blocked and cleaned up all the common ones by now.
  • https://www.findx.com/search?q=Gizmodo&type=web

    Lists "gizmodo.it" which redirects to "gizmodo.com", but "gizmodo.com" is nowhere to be seen in the results.
  • edited January 8
    Gizmodo issue fixed. The system we based Findx on had a pretty aggressive adult detection which we recently improved a lot. Some data was not updated after that, and Gizmodo was still (wrongly) detected as adult. Data has been refreshed, and Gizmodo.com pops up as first result when searching for Gizmodo now.

    (https redirect still to be updated)
  • Thanks for updating me.

    So results that are temporary or permanent redirects to adult content get around your adult content filter? That sounds like an exploitable bug.
  • No, its two different things. Adult detection has been vastly improved, but everything was not updated on production. It is now. The previous adult detection detected Gizmodo.com as adult, the new one does not.

    Remaining problem with Gizmodo is that we list the http version, which redirects to https. A fix for this is in place, but requires a massive update of our index, which will take some time.
Sign In or Register to comment.